A Brief History of Artificial Intelligence

Or How We Taught Machines to Think (And What's Next)

Jul 22, 2025

Note: This article is based on the Prologue and Introduction chapters of my in-progress book Mostly Harmless AI, which deals with how to harness the power of Artificial Intelligence for good. You can get the early draft at a 50% discount in the link below.
Get Mostly Harmless AI (50% off)

For centuries, we humans have been captivated by the wild idea of a thinking machine. This isn't some modern tech obsession, even if it seems nowadays no one talks about anything else. No, the ancient dream of thinking machines goes way back, whispered in myths about automatons and golems in religions galore, and later, famously brought to life (or at least, cleverly faked) by feats like the Mechanical Turk.

One of the most fascinating aspects of the history of Artificial Intelligence has been this often dramatic back-and-forth between two core, seemingly antagonistic approaches to building intelligent machines. On one hand we have logic and rules (what’s often called symbolic AI), and the other hand, data and patterns (or statistical AI). In a deep way, this mirrors that age-old philosophical tug-of-war between rationalism (figuring things out through pure reason) and empiricism (learning from experience).

In this article, I want to explore the history of AI from this lens of rationalism (or symbolic, rule-based AI) versus empiricism (data-driven, statistical AI). Come with me into this deep dive to learn how these seemingly opposite philosophies have shaped AI’s past, defined its present, and are now finally starting to team up for its future.

Prologue

The AI dream didn't kick off with microchips or code. Nope, it began way before, with grand philosophical ambitions and some seriously imaginative leaps, all thanks to incredibly brilliant and diverse minds.

Our journey begins in the 17th century, at the hand of Gottfried Wilhelm Leibniz. By this point in his life, Leibniz was already a superstar: a philosopher, mathematician, logician, and diplomat, who invented (or discovered) calculus independently of Newton. He actually invented the notation that we use today, with the integral symbol and the upper and lower limits.

Leibniz was a true polymath, totally immersed in the Enlightenment's big project of organizing all knowledge and reason. His drive wasn't just academic; he genuinely believed that logic could solve every human argument. He imagined a world where disagreements weren't settled by yelling or endless debates, but by calm, undeniable calculations. Inspired by how algebra and calculus, by means of clever notation, could make even the trickiest problems appear simple, Leibniz fueled his grand dream of universal computation in a time where even the simplest calculation machines where considered a marvel.

What if, he mused, we could formalize all human reasoning in a similar way? He dreamed up a characteristica universalis—a universal language for thought—and a calculus ratiocinator—a mechanical way to reason with it. In this, Leibniz was a rationalist: he believed the all human thought was a grand, logical machine. Unknowingly, he was laying the intellectual groundwork for logic, which paved the way for symbolic AI centuries later.

Fast forward to 19th-century England, we find Lady Ada Lovelace. The daughter of the famously rebellious poet Lord Byron, Ada was a formidable brain, tutored in math and science by some of the most prominent thinkers of her time. By the time she met Charles Babbage, the great inventor, he was working on his Analytical Engine, an abstract machine that could, in principle, do anything a modern computer can do. Ada was already known for her sharp mind and amazing mathematical insights, but she also had this poetic and imaginative side that shaped her view of technology.

While Babbage saw his Analytical Engine mostly as the ultimate number-cruncher, Ada's mind took flight beyond mere arithmetic. She famously wrote that the Engine "might compose elaborate and scientific pieces of music, or in any other extent, generate new content." She was more than a century ahead of Generative AI, dreaming of the days machines would usher a new era of synthetic creativity.

As a side note, Charles Babbage would never finish constructing an actual, physical embodiment of his Analytical Engine. He kept imagining improvements over improvements, never quite settling on something that he could actually construct and use. It would have been the first true computer, but it forever remained as an unfinished project. This serves as a cautionary tale against the all too common syndrome—aptly called the Babbage Syndrome—of intellectualizing ad infinitum without actually testing out your ideas in the real world.

A few decades later, mid-20th century, as the dust settled from war and the digital age dawned, came the man who's probably the most important figure in the history of Computer Science at large, the great Alan Turing. By the time his groundbreaking work on machine intelligence came out, Turing was already widely considered among the greatest logicians and mathematicians of his time.

He's basically the Father of Computer Science, having come up with the abstract model of computation we know as the Turing Machine—the theoretical blueprint for every modern computer—and proving not only its potential but its intrinsic limitations. His wartime experience, where he played a key role in breaking the Enigma code, gave him also a very practical grasp of the power of computing. He actually built the first electromechanic general purpose computer, but this massive milestone was kept secret for years after his death.

Turing was a man of quiet brilliance. He wasn't just curious about what a real machine could do; he was fundamentally wrestling with the very definition of thinking itself. In his famous 1950 paper, "Computing Machinery and Intelligence," he dare ask if machines could think, like, for real. He proposed a brilliant, practical yet deeply philosophical way to assert it: what he called The Imitation Game, but the world came to know as the Turing Test.

If a machine could chat with a human, he suggested, in such a way that the human couldn't tell if they were talking to a machine or another human, then, for all intents and purposes, the machine could be considered to be thinking. This wasn't just a practical experiment, though; it was a functional definition of thinking that sparked the computational theory of mind. The implications of his hypothesis are at the core of the most profound discussions in the field of Philosophy of Mind, even today.

But crucially, in that same paper 80 years ago, Turing looked beyond the test and tossed out several ideas for how such an artificial intelligence might actually be achieved. These included the concept of a learning machine, raised like a human child, soaking up knowledge from experience instead of being preprogrammed to know everything beforehand; and even hinted at using bio-inspired algorithms to mimic how evolution works.

These ideas foreshadowed major pillars of modern AI systems, like neural networks and metaheuristic search algorithms, showing his amazing foresight and his deep understanding of both rationalist and empiricist paths to intelligence. Tragically, he wouldn't live to see his dream materialize into the massive body of knowledge and practice that is the field of Artificial Intelligence.

The Foundational Era (1950s - Late 1960s)

The history of Artificial Intelligence as a scientific field formally begins in the summer of 1956. A small group of brilliant minds, including John McCarthy, Marvin Minsky, Nathaniel Rochester, and Claude Shannon, got together at Dartmouth College. It was right there, at the Dartmouth Summer Research Project on Artificial Intelligence, that the term "Artificial Intelligence" was officially coined. This workshop wasn't just a meeting; it was a declaration, setting up AI as a legitimate academic discipline with a huge goal: to build machines that could think like humans do.

In these early days, the predominant paradigm was symbolic AI. Researchers believed that machines could become intelligent by putting human knowledge and reasoning into explicit rules.

One of the first, and most impressive, demonstrations of this ethos was The Logic Theorist (Newell & Simon, 1956). This program could prove math theorems, not by brute force, but by using symbolic logic, kind of mimicking how humans solve problems. It was a clear sign that machines could actually do some form of abstract reasoning if instructed correctly.

Game-playing also became a hot area for symbolic AI research. The popularization of the Minimax algorithm in board games like chess and checkers enabled early computers to play optimally by exploring all possible moves.

These were the days of automatic reasoners and early forms of knowledge representation, with projects like the General Problem Solver aiming to tackle all formally decidable math problems. The idea was simple, yet powerful: if we could just write down all the rules, the machine would be smart enough to solve them.

It was also during this time that a seemingly simple program really captured people's imaginations: ELIZA, the first chatbot. Developed by Joseph Weizenbaum in the mid-1960s, ELIZA was a barebones linguistic interface designed to pretend to be a Rogerian psychotherapist. It worked by using simple pattern matching and rule-based responses, often just turning what you said into a question ("You say you are sad. What makes you say you are sad?").

Even though it was incredibly simple, many users found themselves opening up to ELIZA, believing it actually understood and empathized with them. This phenomenon became known as the ELIZA effect, a powerful reminder of how easily we humans tend to see human qualities in technology. ELIZA, despite being a purely symbolic, rule-driven system, sparked a persistent dream in the AI world: the quest for truly conversational AI, for machines that could talk with us naturally. This dream, born from simple rules, would keep pushing AI's boundaries for decades.

But even with symbolic AI leading the way, a different kind of idea was quietly taking root: connectionism, an early form of statistical AI. This approach drew inspiration from biology, specifically how neurons, despite their simplicity, could become exceedingly intelligent when connected in just the right way. The Perceptron, introduced by Frank Rosenblatt, was an early artificial neural network built to learn patterns directly from data.

The initial excitement was huge; these "learning machines" seemed to offer a path to intelligence without needing every single rule programmed explicitly. Imagine, a machine that could learn just by seeing examples, like a human brain! This perspective leans heavily into the empiricist tradition, where knowledge is gained through sensory experience and and data.

However, the honeymoon period didn't stick around. Both approaches ran into big problems and ultimately failed to scale beyond toy problems.

Early symbolic systems, while impressive in their specific areas, turned out to be quite brittle. They struggled with common-sense knowledge and couldn't easily adapt to new situations outside their carefully programmed rules. Trying to teach a machine absolutely everything it needed to know, one fact at a time, was an insurmountable challenge.

Meanwhile, perceptrons hit their own walls. Marvin Minsky and Seymour Papert's 1969 book, Perceptrons, famously pointed out their inability to adequately represent even the simplest nonlinear relationships in training data. They couldn't become complex enough, no matter how many neurons you connected.

This period became known as the First AI Winter: a big drop in funding and public interest as those initial grand promises didn't pan out. And this early struggle between explicit rule-based systems and pattern-based approaches set the stage for the dynamic tension that would define AI's whole history.

The Knowledge Era (1970s - Mid 1990s)

Coming out of the first winter, AI didn't vanish; it just regrouped, with symbolic methods making a strong comeback. The 1970s and 80s saw the development and initial commercial success of expert systems. These were AI programs designed to mimic how a human expert makes decisions in a very specific, narrow field.

Examples of these are systems like MYCIN, which helped diagnose blood infections, or XCON, which configured computer systems.

The main focus here was on capturing human expertise for a specific area and representing that knowledge using handcrafted rules and facts. Imagine writing down a comprehensive set of rules for medical diagnosis, for example. All the possible questions and follow ups, and all consequences of the possible answers. These systems used sophisticated inference engines (basically, automatic reasoners) to apply those rules and draw conclusions.

This era was the pinnacle of the rationalist approach to AI, aiming to formalize and apply human expertise through clear logical structures.

While expert systems grabbed the headlines, research into neural networks quietly kept going. A major algorithmic breakthrough during this time was the popularization of backpropagation (thanks to Rumelhart, Hinton, and Williams in the mid-1980s). This algorithm finally gave us an efficient way to train multi-layered neural networks, letting them learn much more complex patterns, thus breaking free of their primary limitation.

Just as expert systems hit their peak, their own limitations became painfully obvious. They were super expensive to build and maintain, needing human experts to painstakingly put in all their knowledge. They were also incredibly brittle, like all purely symbolic approaches; even a tiny change outside their programmed domain could break them entirely.

And so, the Second AI Winter arrived. This time, it was clear that while symbolic AI had done impressive things, it just couldn't scale to the complexity of the real world. At the same time, statistical AI wasn't really working yet, as the available data and computational infrastructure was insufficient. But this was about to change.

The Internet Era (Late 1990s - Early 2010s)

The internet changed absolutely everything, and AI wasn't the exception. Suddenly, data was everywhere, and statistical approaches were perfectly positioned to take advantage.

The huge growth of the internet in the late 1990s and early 2000s, combined with more and more computing power, led to an unprecedented explosion of digital data. Every click, every search, every photo uploaded contributed to a massive ocean of information. This Big Data was the fuel that statistical AI had been waiting for.

With tons of data available, statistical machine learning algorithms devised decades earlier really suddenly started to work. Techniques like support vector machines (SVMs), decision trees, and ensemble methods became extremely popular. They weren't just theoretical curiosities anymore; they were powering real-world applications. Search engines used them to rank billions of web pages, email providers deployed them to filter millions of spam messages, and e-commerce sites implemented them to recommend thousands of products to millions of users.

These were all problems perfectly suited for statistical machine learning, which could find subtle patterns in huge datasets without needing explicit rules for every single situation. It just need massive data and computational resources to work, and now we had both, in excess.

But symbolic AI didn't disappear. While statistical methods took center stage, symbolic approaches found new roles. The popularization of domain-specific ontologies (formal ways to define concepts and relationships) gave rise to the ideal of semantic web (an interconnected network of different data sources) which provided ways to structure and link information in ever growing knowledge bases, in a completely distributed and emergent process.

While the promise of a fully inter-connected semantic web hasn’t exactly panned out yet (and might never will), the underlying notion of organizing the world’s knowledge into networks of concepts stuck. These symbolic tools often worked alongside statistical methods, giving structured data that machine learning algorithms could then use, or making the results of statistical methods easier to understand. The Google Knowledge Graph is a prime example of this interplay between statistical and symbolic methods. It allowed Google to absolutely dominate the search industry for more than a decade and counting.

It was also during this period that the widespread use of AI in recommendation systems (like YouTube and Twitter) using purely algorithmic feeds started to show some of the earliest downsides of AI. While these systems were designed to personalize experiences and filter information, they also began creating filter bubbles and echo chambers.

The algorithms, often reflecting biases already in their training data, could also subtly strengthen existing prejudices or even be used to spread misinformation really fast. This early peek into AI's societal impact highlighted that even seemingly harmless applications could have big, sometimes negative, effects on how humans think and interact, setting the stage for the more complicated ethical discussions we have today.

The Deep Learning Era (Mid 2010s - Early 2020s)

If the Internet Era was just the warm-up, the mid-2010s brought the main event: Deep Learning. This wasn't just a step forward; it was a giant leap. The first big breakthroughs came from training much deeper neural networks than anyone thought possible before, often using clever tricks like layer-wise unsupervised pre-training.

The turning point arrived in 2012 with the ImageNet Large Scale Visual Recognition Challenge. A team led by Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton entered a deep convolutional neural network called AlexNet. Its performance was jaw-dropping, totally blowing away all previous attempts.

AlexNet showed off the immense power of Convolutional Neural Networks (CNNs) for recognizing images. It proved that deep neural networks, given enough data and computing power, could learn incredibly complex features. This was a huge win for the empiricist approach, demonstrating that some form of intelligence could eventually pop up from massive data and complex, learned patterns, rather than needing explicit programming.

From that point on, deep learning just exploded. Quick advancements led to super sophisticated deep architectures like ResNets and Inception, plus crucial innovations like attention mechanisms, which let models focus on the important parts of the input. Deep learning quickly spread beyond just computer vision. It totally changed natural language processing, speech recognition, and even reinforcement learning.

Remember when AlphaGo, a Google DeepMind AI, beat the world's best Go players? That wasn't just a symbolic search algorithm; it combined Monte Carlo Tree Search (MCTS) with deep learning. It was a powerful statistical approach that could efficiently explore the huge and complex game spaces of Go, where traditional symbolic search (like Minimax) was simply intractable. It was a clear demonstration of statistical AI's ability to tackle problems previously considered beyond reach.

This period of fast progress also brought a critical realization, often called The Bitter Lesson. Coined by Rich Sutton, the Bitter Lesson basically says that over the long haul, general methods that really lean into computation (like just making neural networks bigger and feeding them more data and processing power) tend to be more effective and robust than trying to build in human knowledge or super detailed, hand-crafted features.

While this insight powerfully highlighted the benefits of huge, data-driven learning (the empiricist path), it's important to get that it's not a total dismissal of human understanding. Instead, it suggests that how human knowledge gets integrated matters—less about rigid, fixed rules, and more about creating architectures and environments where learning algorithms can discover patterns and rules for themselves, often at scales beyond what any human could intuitively grasp. This solidified the move towards data-driven, statistical approaches, showing that raw computing power and general learning algorithms were often the real keys to unlocking more advanced AI capabilities.

Then came 2017, and with it, the Transformer architecture. This was a game-changer for Natural Language Processing (NLP), mainly because of its innovative attention mechanism, that allowed ML models to process whole chunks of text way more efficiently to understand long-range connections. This paved the way for the rise of Large Language Models (LLMs) like BERT and early GPT versions, which started showing an uncanny ability to understand and generate human-like text.

None of this would have been possible without the absolutely massive development of hardware. The exponential growth in computing power and the development of specialized hardware like Graphics Processing Units (GPUs) and later Tensor Processing Units (TPUs) have completely changed the landscape that made statistical AI fail in the 80s. These advancements allowed for the huge parallel processing needed to train deep neural networks on truly enormous datasets, unlocking their true potential.

Finally, the explosion of open-source frameworks like TensorFlow and PyTorch, along with platforms like Hugging Face that shared pre-trained models, dramatically sped up deep learning research and adoption. This has fostered a truly collaborative global community, letting innovations spread fast and build on each other. It was a collective effort that truly launched AI into its current era.

The Generative Era (Early 2020s - Present)

And that brings us to today. The early 2020s have ushered in an era that has captivated the public like nothing else. In 2022, ChatGPT burst onto the scene, quickly followed by other groundbreaking generative models like DALL-E and Midjourney. These models can create novel and mostly coherent content across all sorts of formats: text, images, code, audio, and even video. Suddenly, AI isn't just analyzing or predicting; it is creating, as Lady Ada envisioned—and the implications of it are still unfolding. This shift has put large-scale generative models squarely in the spotlight, forcing us to rethink what machines are truly capable of.

This era also brings us full circle to that early dream sparked by ELIZA. Remember that simple, rule-driven chatbot from the 1960s? While ELIZA relied on hand-coded patterns and clever tricks to fake conversation, ChatGPT works on a totally different scale and principle. It's a purely statistical marvel, having learned the ins and outs of human language from massive amounts of data, rather than explicit rules. ChatGPT, and its generative cousins, represent a stunning realization of that long-held dream of conversational AI, really pushing the boundaries of what we thought was possible. It's a testament to how far we've come, from basic, rule-based chatbots to incredibly fluent, statistically driven ones.

But here's where the historical pendulum swings again, with a cool twist. While these statistical deep learning models achieve incredible scale and performance, they also show some inherent limitations. They can hallucinate facts, struggle with real common-sense reasoning, and often lack explainability. You might ask them why they made a certain decision, and they can't always tell you in a way that makes sense.

These limitations have sparked a renewed interest in neuro-symbolic AI. This isn't about picking sides anymore; it's all about integration. This emerging field aims to combine the best of both worlds: the pattern-recognition power of statistical models with the logical reasoning and structured knowledge of symbolic AI. Imagine using ontologies to "ground" a large language model's outputs, making sure its generated text sticks to factual consistency, or adding logical rules to make AI systems more robust, reliable, and easier to understand.

The historical struggle between symbolic and statistical AI is evolving into a quest for effective synthesis, aiming to combine the strengths of both paradigms to create something truly greater than the sum of its parts.

Conclusion

We've journeyed through decades of ambition, breakthroughs, and tough realizations. What we've seen is a constant back-and-forth, a dynamic dance between two powerful ideas: the precise, rule-based, inflexible logic of symbolic AI and the adaptable, pattern-based, unreliable power of statistical AI. This dance, as we've explored, often mirrors the philosophical tension between rationalism and empiricism.

Today, AI stands at a fascinating crossroads. While purely statistical systems have achieved incredible feats, especially in areas like conversational AI—where the dream that began with ELIZA now thrives in ChatGPT—, their inherent limitations are becoming clearer.

This brings us to a crucial realization: the future of AI likely isn't about one approach winning out over the other, but about intelligently combining them. Hybrid approaches, particularly neuro-symbolic AI, hold immense potential.

However, as we push the boundaries of what AI can do, we also have to face the serious challenges and ethical questions that come with it. The sheer power of these systems brings risks, from spreading misinformation and amplifying societal biases (which are often baked into their training data) to more complex issues around accountability and even the long-term, existential implications of creating truly autonomous and superintelligent entities.

By integrating symbolic reasoning and structured knowledge with the power of deep learning, we can build AI systems that are not only smart but also robust, explainable, and truly capable of common-sense reasoning, all while carefully navigating these potential problems.

The history of Artificial Intelligence is far from finished. AI is a living, breathing, civilization-wide project that encompasses all human endeavors, with the potential to transform society for the better—or, some believe, to become our ultimate doom. Everyone has a place here: technologists, yes, but also humanists, economists, historians, artists, politicians… The next few years, if anything, promise to be extremely exciting, and you can be a part of it.

RDM

Jul 23

Succinct and motivating overview. Really satisfying to read.

Question - no mention of Active Inference/Free Energy Principle-based approaches? Hipshot you'd classify them as 'statistical', which is arguably correct but..they are more, allowing explainability, adaptivity, and speed unlike any large-scale number-crunching stat-based approach. Statistical (bayes, e.g.) and numeric to be sure, but also symbolic and experiential in a way LLMs will never be.

Perhaps these are the merger of the two approaches you write of...?

Expand full comment

2 replies by Alejandro Piad Morffis and others

Rob Melton

Oct 23

Simply brilliant recap of the history of my life in journalism and education. Thanks for that!

1 reply by Alejandro Piad Morffis

4 more comments...

Discussion about this post

Ready for more?