70 Years of AI History in 10 Minutes

A summary to the updated zero-th chapter of Mostly Harmless AI v2.

May 18, 2026

Raphael, “The School of Athens” (1509–1511), Apostolic Palace — Plato points up to the eternal forms (the rule-followers); Aristotle’s palm presses down to the empirical world (the pattern-finders).

Every post on the blog this month is on the theme of agent reliability, anchored on the second edition of Mostly Harmless AI — 50% off during early access — where the history below is the full ~8,000-word opening chapter, with 70+ references and all the scenes this post had to cut. You can also read the whole book online for free. More at the end.

Seventy years ago, two men sat in two different rooms and disagreed about what a thinking machine should look like. Neither of them has been proven right. Both have been proven half-right, several times, in alternation, for the whole of my lifetime and most of yours.

I think the entire history of AI is that one argument, still going.

The first camp wanted to build minds out of rules. Feed the machine enough knowledge in a logical enough form, and reasoning falls out of the logic. It called itself a lot of things over the decades — symbolic AI, knowledge-based systems, good old-fashioned AI — but its home is rationalism. The second camp wanted to build minds out of examples. Feed the machine enough data, in any messy form whatsoever, and behavior falls out of the statistics. It also kept renaming itself — connectionism, machine learning, deep learning — but its home is empiricism. Same goal, a machine that does what intelligent people do. Sixty years of disagreement about how.

Here’s the ending, spoiled early, because this isn’t a thriller. The argument did not produce a winner. It produced a marriage. The chatbots, the image generators, the agents writing code while you sleep — none of them is one side beating the other. They’re both sides, finally forced to share a workshop. Let me walk you through how we got there. Fast.

Both seeds, one summer

They were planted within five years of each other. In 1943, McCulloch and Pitts wrote down a neuron as a weighted sum with a threshold — twelve pages, the seed of the empiricist branch. In 1950, Turing refused to define thinking and proposed a behavioral test instead, a question both camps could chase. In 1956, ten people spent a summer at Dartmouth, coined the phrase artificial intelligence, and planned to crack language and reasoning in a few months. (We are still working on it.) In 1957, Rosenblatt built the Perceptron, the first machine that learned from examples, and the New York Times announced it would soon walk, talk, and be conscious of its own existence.

Two foundational myths, in the ground, in the same decade. The rest is which one got watered.

The rationalists win the first round

And they win it convincingly. In the 1950s and 60s compute is tiny and data, in the sense of millions of labeled examples, does not exist.

What you can do is write a program that does something specific and inspect every step of it. So the symbolic camp gets the better results and the better tools. Newell and Simon’s theorem-prover. McCarthy’s LISP. Weizenbaum’s ELIZA — four pages of pattern-matching that understood nothing, and that people confided in anyway. (Hold onto ELIZA. The field will relearn that exact lesson about six more times.) Winograd’s SHRDLU, fluent and thoughtful inside a closed world of colored blocks.

The catch was always the world. SHRDLU’s blocks could all be known, listed, reversed. The real world has rain, and grandparents, and the smell of coffee, and you cannot list it. In the closed world of symbols, symbols were enough. The next decade was about discovering, painfully, that the world is not closed.

The cost of winning too hard

In 1969, Minsky and Papert published Perceptrons and proved a single-layer network can’t compute XOR.

The proof was correct. It was also narrow — they admitted multi-layer networks could do it, nobody just knew how to train them yet. But the field was hungry for a verdict, and it read the book as one. Funding for neural networks collapsed. Rosenblatt died two years later in a boating accident, on his 43rd birthday. The algorithm that would resurrect his branch didn’t arrive at scale until 1986. Seventeen years of silence.

Modern AI runs on the work of people who weren’t born when Minsky and Papert published. The reason their work came so late is that the field they’d return to had been kept near-dead for two decades. The symbolic camp’s victory was real. The field paid for it. It will pay that bill again.

The rationalist trap

Through the 1970s and 80s the symbolic branch found something that made money: expert systems.

MYCIN matched infectious-disease specialists. XCON saved DEC tens of millions a year. The thesis was clean and seductive — intelligence is rules plus facts; hire the expert, extract the rules, ship the system. And these systems were legible. You could read every rule, audit the reasoning, fix the wrong line. (Your favorite large language model cannot do this. We’ll come back to that another day.)

Two problems killed it. Common sense turns out to be unrepresentable in rules — birds fly, except penguins, except baby penguins, except dead ones — and the rules contradict each other faster than you can write them. And then there’s Cyc: in 1984 Doug Lenat set out to hand-encode all of common-sense knowledge, estimated ten years, and is still at it forty-two years later. It is the most thoroughly humbling monument in the history of cognitive science.

By the late 80s the money dried up and the Second AI Winter set in. The field was tired of the rationalists.

The empirical rebellion

We’re on 1986: backpropagation, in Nature, multi-layer networks are finally trainable.

Then the empiricist branch spends fifteen years not scoring one big win but a thousand small ones. Support vector machines. Random forests. Boosting. Statistical methods quietly eating one application after another, including the symbolic camp’s home turf — language, where IBM’s speech team found that every time they fired a linguist, the system improved.

Why now? Three things are moving together, slowly. Compute grew. The internet started producing data in volumes nobody had imagined. And the methods were simple enough to scale with both.

In 2019 Richard Sutton would name this The Bitter Lesson: across seventy years, the general method that scales with compute beats the clever hand-engineered one, every time. It’s bitter because it tells researchers their hard-won taste gets steamrolled by someone with more GPUs. It is mostly right.

The thing that complicates it is the thing symbolic AI was good at all along — but I’m getting ahead of myself.

The earthquake

Now jump to September 2012.

AlexNet — eight layers, two gaming GPUs, a couple of training tricks — drops the ImageNet error rate ten points below the nearest hand-engineered system. A ten-point gap isn’t an improvement. It’s a different category of result. Within six months every computer-vision lab on Earth has pivoted. AlexNet is, by a wide margin, the single most consequential paper in modern AI.

Then it cascades, almost too fast to track. Sequence-to-sequence translation. GANs. Atari from raw pixels. In 2016 AlphaGo beats Lee Sedol at a game with more board positions than there are atoms in the universe — and almost nobody notices that inside it is a deep network (empiricist) wrapped around a tree search (symbolic). The marriage is already there, in 2016, hiding in plain sight. In 2017, “Attention Is All You Need” introduces the Transformer, and every model in your chat window today descends from that one paper.

The crown jewel nobody talks about

The most consequential AI system of the modern era is not a chatbot. It doesn’t write poems. It’s in the bloodstream of structural biology.

Predicting a protein’s 3D shape from its amino-acid sequence is a fifty-year-old problem. The rationalist approach — simulate the physics — was beautiful and almost completely intractable. For twenty years the field’s hardest benchmark plateaued at a score around 40. In 2020, DeepMind’s AlphaFold 2 scored above 92 on that exact tier. The grand challenge was, for practical purposes, solved. Hassabis and Jumper got the 2024 Nobel in Chemistry for it — the only AI work so far to produce a Nobel-level scientific breakthrough.

Read the citation. It isn’t about AI as a technology. It’s about a problem that got finished while the people whose careers were defined by it slept. The chatbots get the headlines. The image generators get the lawsuits. The protein folder got the world. Remember that the next time someone wants to tell AI is ChatGPT.

The synthesis

Now the marriage. The word agent did not come from machine learning. It came from classical, symbolic AI in the 1970s and 80s: a system that perceives its environment, deliberates, picks an action, acts, observes, loops. The architecture was right. The brain was missing. Pure symbolic computation could never model a world with grandparents and coffee in it, so the agent shell sat there for decades, structurally correct and operationally empty. Cyc, again, is the long sad proof.

The empiricists borrowed the same word in the 2000s — in reinforcement learning, an agent is a learned policy. DQN was an agent. AlphaGo was an agent. A new brain, slotted into the old shell. Spectacular, and narrow. An AlphaGo cannot make you a sandwich.

In 2024 the cognition slot gets filled a third time, by a general-purpose reasoning language model. The shell is still the seventy-year-old symbolic frame: perceive, deliberate, pick an action with a name and a meaning — read_file, run_tests, send_email — act, observe, loop. The brain is now an LLM. From the empiricist side the system inherits flexibility: it has read enough of the world that you don’t have to tell it what a file is, or what an angry customer sounds like. From the symbolic side it inherits structure: the actions have names, the consequences are bounded, the trajectory is auditable. The model can hallucinate; the system can’t run rm -rf unless somebody wired that action in and granted it.

The 1970s symbolic agent could never reason. The 2010s RL agent could never generalize. The 2026 agent does both — badly, often clumsily, but for the first time at the same time. You can watch this happen most clearly in software development right now: a language model at the core, a harness of tools around it, a test suite as the verifier, a human reviewing the diff. All four layers, on a laptop, today. Software is the canary. The same pattern is already moving toward research, then education, then everything whose feedback loops are fast enough.

So here is the closing claim, the techno-pragmatist version of the whole story. The seventy-year argument did not produce a winner. It produced three layers: a learning substrate that absorbed the written record of humanity, a symbolic shell that makes it accountable, and a human frame that decides what the whole thing is for. The first two are engineering. The third is the only one that was ever really about us.

The synthesis exists. What we do with it is still up to us.

Until next time, stay curious.

This post is the speedrun — the book’s ~8,000-word opening chapter compressed down to its spine. The full version in the second edition of Mostly Harmless AI has every scene with its characters, the eras this post skipped, 70+ references, and the agentic stack the rest of the book then takes apart mechanism by mechanism. It’s the book I wish someone had handed me when I was trying to make sense of the noise — and it’s 50% off while it’s in early access. You can also read the whole thing online for free in a custom reader I built and am rather proud of: dark mode, font controls, progress tracking, offline support, the works.

Get it (50% off)

And if you want the whole catalog of everything I’ve written, plus everything I’m going to write, that’s the Compendium. One purchase, in perpetuity.

Discussion about this post

Ready for more?