Foundations of Artificial Intelligence

Chapter 1 of Mostly Harmless Ideas

Aug 01, 2025

The following article is a first draft of Chapter 1 of my upcoming book Mostly Harmless Ideas. The book is a deep dive into the goods and bads of AI, especially Generative AI and Language Models, and it’s packed with advice for all kinds of knowledge workers and creative professionals. The first part of the book cover the foundations of Artificial Intelligence, Machine Learning, Generative AI and Language Models, in accessible and intuitive terms.

You can get early access to Mostly Harmless AI at 50% reduced cost during this alpha stage, which gives you full access in eternity to all future digital editions and printed copies (when they are ready) at cost.

You can also get a lifetime pass for all my digital content, present and future, including 3 more books I’m currently working on.

photo of girl laying left hand on white digital robot — Photo by Andy Kelly on Unsplash

What is Artificial Intelligence, Really?

Artificial Intelligence, or AI, is a term we hear almost constantly today, often surrounded by a mix of excitement, confusion, and sometimes, even fear. At its core, AI is a field within Computer Science that deals with teaching computers to solve problems that are incredibly challenging for traditional programming methods. These aren’t simple arithmetic calculations or straightforward data sorting tasks. Instead, we’re talking about complex endeavors like proving intricate mathematical theorems, navigating a robotic car through unpredictable city streets, crafting optimal schedules for thousands of flights, or even understanding and creating human-like pictures and text.

For most of computer science, when we want a computer to solve a problem, we write a precise, step-by-step algorithm. Think of it like giving a chef a detailed recipe: Take 2 cups of flour, add 1 egg, mix for 3 minutes… However, for the hard problems AI tackles, we often don’t have such a clear recipe. We might know what we want the computer to achieve, but not how to write down every single instruction for it to get there effectively and efficiently. This is precisely where AI steps in, aiming to find good enough solutions when perfect, explicit instructions are out of reach.

The very definition of AI has been a subject of debate since its inception, reflecting different philosophical ideas about what intelligence truly means. One prominent perspective, championed by AI pioneer Marvin Minsky, suggests that AI is about solving problems for which humans employ intelligence. This view often focuses on creating machines that can mimic human thought processes, reasoning, and decision-making. Essentially, it asks: Can a machine think like us?

Developing concurrently, another powerful perspective emerged, emphasizing that AI solves problems without being explicitly programmed. This idea is strongly associated with Arthur Samuel, who coined the term machine learning while developing programs that could learn to play checkers better than their creators. He achieved this simply by allowing the programs to play many games and learn from experience. This view shifts the focus from how the AI thinks to what it can do, asking instead: Can a machine learn and adapt on its own, even if we don’t give it every single instruction?

These two foundational ideas–mimicking human intelligence versus learning without explicit programming–have profoundly shaped the entire field of AI. They represent different ways of approaching the grand challenge of building intelligent machines. Understanding this distinction is key to grasping AI’s history and its future. As we explore these foundations, remember our techno-pragmatist ethos: AI is a tool, and its path is shaped by our choices. Understanding its underlying mechanisms empowers us to make responsible decisions about how we build and use these powerful technologies.

The Pillars of Good Old-Fashioned AI (GOFAI)

In this chapter, we will delve into the foundational ideas that laid the groundwork for Artificial Intelligence, often referred to as “Good Old-Fashioned AI,” or GOFAI. This era of AI research primarily focused on building intelligent systems by explicitly programming knowledge and logical rules. Our exploration will center on two main pillars of GOFAI.

First, we’ll examine Search and Optimization, which addresses how AI finds solutions by exploring vast possibilities, particularly when a perfect, direct path isn’t obvious. Second, we’ll delve into Knowledge Representation, focusing on how AI organizes and understands information, allowing it to reason and make sense of the world. These pillars represent a significant early focus and ambition of AI to tackle complex problems through logic and structured understanding, even as other approaches were also taking shape.

The Age-Old Debate: Symbolic AI vs. Statistical AI

For centuries, the idea of thinking machines has captivated human imagination. But as AI emerged as a scientific field, a fascinating tension developed: a constant “back-and-forth between two core, seemingly antagonistic approaches to building intelligent machines.” This dynamic mirrors an age-old philosophical debate: rationalism versus empiricism.

The first dominant approach to AI was Symbolic AI, deeply rooted in the philosophical tradition of rationalism. Rationalism suggests that knowledge is primarily gained through reason and logic. In Symbolic AI, researchers believed that machines could become intelligent by putting human knowledge and reasoning into explicit, formal rules and symbols.

Imagine, for instance, wanting to teach a computer to play chess. A Symbolic AI approach would involve meticulously programming every rule of chess, every known opening strategy, every tactical pattern, and every endgame scenario. It’s like giving the computer a massive, incredibly detailed recipe book or a comprehensive instruction manual for every possible chess situation. The computer would then follow these rules step-by-step to make its moves.

Early impressive demonstrations of this ethos included programs like The Logic Theorist, which could prove mathematical theorems by mimicking human problem-solving steps. Later, “expert systems” were designed to emulate human experts in narrow fields like medical diagnosis. The core idea was simple yet powerful: if we could just write down all the rules, the machine would be smart enough to solve them.

Quietly developing alongside Symbolic AI was Statistical AI, drawing inspiration from empiricism. Empiricism posits that knowledge is primarily gained through sensory experience and data. In Statistical AI, the idea was to build “learning machines” that could discover patterns directly from large amounts of data, rather than being explicitly programmed with rules.

Think of it like a child learning to recognize a dog. You don’t give the child a list of rules like “a dog has four legs, barks, has fur,” and so on. Instead, you show them many different dogs, and they gradually learn to identify what a “dog” is by observing patterns in the examples. Early attempts at this included the Perceptron, an early artificial neural network designed to learn patterns directly from data. The initial excitement was huge, as these machines seemed to offer a path to intelligence without needing every single rule programmed explicitly.

The Winters of AI

Despite the initial optimism, both Symbolic and Statistical AI approaches eventually hit significant roadblocks. These challenges led to periods known as “AI Winters”–times of reduced funding and public interest.

Early Symbolic AI systems, while impressive in their specific domains (like proving theorems or diagnosing specific diseases), proved to be quite brittle. They struggled immensely with common-sense knowledge, which is vast and often unstated. Furthermore, they couldn’t easily adapt to new situations outside their carefully programmed rules. Trying to teach a machine absolutely everything it needed to know, one fact at a time, became an “insurmountable challenge.” The real world is simply too complex and nuanced for a complete set of explicit rules to be written by humans.

Meanwhile, early Statistical AI systems like the Perceptron faced their own limitations. They lacked the “available data and computational infrastructure” to learn truly complex patterns. Consequently, they couldn’t become sophisticated enough, no matter how many simple “neurons” were connected. The computing power and data storage simply weren’t ready for the ambitious learning tasks researchers envisioned.

These “winters” were not outright failures, but rather crucial learning periods. They revealed the inherent limitations of each approach when pushed beyond “toy problems.” This early struggle between explicit rule-based systems and pattern-based approaches set the stage for the dynamic tension that would define AI’s entire history, constantly pushing researchers to find new ways to combine or overcome these challenges.

Search and Optimization

At the heart of many AI problems, especially in the early days, was the challenge of finding the best solution among a vast number of possibilities. This is the realm of search and optimization.

When Perfect is Impossible: The “Hard” Problems

Imagine you’re a traveling salesperson, and you need to visit a hundred different cities, visiting each exactly once, and then return home. Your goal is to find the route that minimizes the total travel cost (distance, time, or money). This is a classic example of a “hard problem” in computer science, known as the Traveling Salesman Problem (TSP). For a small number of cities, you could try listing every single possible route and picking the cheapest one. This is called a “brute force” search.

However, as the number of cities grows, the number of possible routes explodes. For just 20 cities, there are over 2.4 quintillion (2.4 followed by 18 zeros!) unique routes. Even the fastest supercomputer couldn’t check them all before the universe ends. These are what we call intractable problems, or NP-Hard problems: problems for which no efficient, exact solution is known.

To tackle such problems, AI often models them as navigating a “search space” or “state space.” This conceptual space represents all possible configurations or situations relevant to the problem. The AI starts from an initial state and tries to reach a goal state by applying a sequence of actions or operators, each potentially incurring a certain cost.

Since finding the absolute perfect solution is often impossible or impractical within this vast space, AI shifts its goal. Instead of perfection, it seeks approximate solutions. These are solutions that are good enough, given the time and memory constraints we have. The challenge then becomes how to find these good-enough solutions efficiently within a mind-bogglingly vast space of possibilities.

Smart Shortcuts: Heuristics and Metaheuristics

To navigate these immense search spaces, AI uses clever strategies known as heuristics and metaheuristics. A heuristic is a problem-specific “rule of thumb” strategy that uses some known properties of a problem to improve search performance. It’s not guaranteed to find the absolute best solution, but it often finds a very good one much faster than a brute-force approach.

Consider your GPS navigation app. When you ask for directions, it doesn’t calculate every single possible route from your current location to your destination. Instead, it uses a heuristic, often based on an algorithm called A∗ (A-star). If your destination is northeast of your position, the A∗ algorithm will prioritize roads going north or east, assuming they are more likely to get you there faster than roads going to the west or the south. Of course, this isn’t always perfect–there might be a faster detour to the west, or a highway that’s counter-intuitive. Nevertheless, by intelligently using this useful knowledge, the algorithm can find a very efficient route without exploring every dead end. It’s a smart shortcut that balances speed with a high probability of finding a good solution.

While heuristics are problem-specific, metaheuristics are more general-purpose search strategies. They leverage knowledge about the search paradigm itself and can be applied even when very little is known about the specific problem’s structure. They’re often used when “nothing else works.” A prime example of a metaheuristic approach is evolutionary algorithms. These computational strategies are “inspired by certain aspects of the biological process of evolution.”

Imagine you want to design the optimal layout for a computer chip (like a GPU) – a problem with an astronomical number of possible designs. An evolutionary algorithm would start with a “population” of random chip designs. Then, through cycles of “breeding” (combining elements from two good designs to create a new one) and “selection” (keeping only the best-performing designs), the algorithm iteratively “evolves” better and better designs. Just like biological evolution, it seems to “magically discover quasi-optimal design elements just by sheer luck and relentless repetition,” without needing explicit instructions for every design choice. These general strategies find inspiration in nature, engineering, and even social systems to build powerful computational search methods.

Specialized Search: Beyond Simple Paths

Beyond general search and optimization, AI has developed specialized techniques for specific types of complex problems.

Adversarial Search: Thinking Ahead of the Other

Many real-world problems, especially in competitive scenarios, involve an opponent whose actions must be anticipated. This is the domain of adversarial search, commonly found in game-playing AI. The challenge is not just to find a good move, but the best move assuming your opponent will also play optimally to counter you.

One of the oldest and most fundamental techniques is Minimax. Imagine a simple game like Tic-Tac-Toe. Minimax works by having the AI “look ahead” through all possible future moves, assuming that you (the opponent) will always choose the move that is best for you and worst for the AI. The AI then picks the move that minimizes its maximum possible loss (or maximizes its minimum possible gain). Effectively, it plays out all possible future scenarios in its head and chooses the path that leaves it in the best possible position, no matter what its opponent does.

For games with an incredibly vast number of possibilities, like Go, simply looking ahead through every move is impossible. This is where Monte Carlo Tree Search (MCTS) comes in. Instead of exhaustively analyzing every branch, MCTS “plays out” many random simulations of the game from a given point. It explores the most promising moves more deeply, learning which paths lead to success through repeated “trial and error” simulations. This allows AI to tackle games that were once considered beyond computational reach, like when Google’s AlphaGo beat the world’s best Go players.

Structured Search: Satisfying All Conditions

Sometimes, the goal isn’t to find the “best” path, but simply any solution that meets a specific set of requirements. These are constraint satisfaction problems. Here, the AI needs to find values for a set of variables such that all given conditions, or “constraints,” are simultaneously met.

Think about solving a Sudoku puzzle. You need to fill in numbers from 1 to 9 in each cell, but with strict rules: each row, column, and 3x3 box must contain all digits from 1 to 9 without repetition. The AI’s task is to find a set of numbers for all empty cells that satisfies all these constraints.

Another common example is creating a university class schedule. You have classes, rooms, professors, and students, and a multitude of constraints: Professor A can’t teach two classes at the same time; Room B can only hold 50 students; Class C requires a lab; no two classes can be in the same room at the same time. The AI’s job is to assign times and rooms to all classes such that every single constraint is satisfied. The “structure” of these problems, defined by the variables and their interdependencies, allows AI to use specialized search techniques to efficiently find a valid solution.

Knowledge Representation & Reasoning

Beyond just searching for solutions, a truly intelligent system needs to “know” things about the world. This brings us to the second pillar of GOFAI: Knowledge Representation. This field explores how AI can efficiently represent, store, and use domain knowledge in a way that computers can understand and process.

The fundamental goal of knowledge representation is to organize concepts and facts, as well as the relationships between them. This organization allows AI to reason about these facts and discover new relations. Ultimately, it’s about giving AI a structured way to “understand” and make sense of information, much like how humans build a mental model of the world around them. Without a clear way to represent what it “knows,” an AI would be unable to make logical inferences or apply its knowledge to new situations.

From Raw Observations to Understanding

To truly grasp how AI “knows” things, it’s helpful to understand the progression from raw observations to actionable understanding. At the most basic level, we encounter Data, which consists of raw, unprocessed facts or observations. This could be a list of numbers, individual words, or pixels in an image. In isolation, data has no inherent meaning; for example, the number “30” by itself is just a number.

When we introduce context or metadata, data transforms into Information. For instance, if we know “30” is a temperature reading taken in Celsius at noon on July 1st in Havana, it becomes information. This contextualization helps us relate different observations and gives them initial meaning.

Finally, when information is enriched with semantics and rules, enabling inference, reasoning, and the discovery of new relations, it becomes Knowledge. For example, if the AI knows that “temperatures above 30 degrees Celsius in July in Havana indicate a heatwave,” it possesses knowledge. This knowledge allows it to draw inferences (it’s a heatwave!), discover new relations (heatwaves can lead to increased energy consumption), and even take actions (warn residents about high temperatures). It’s this ability to add meaning and logical connections that truly transforms information into actionable knowledge.

Ways to Represent Knowledge

Just as humans use different ways to store and recall information, from precise definitions to vague intuitions, AI employs various methods for knowledge representation, each with its own strengths and weaknesses.

One key distinction lies between explicit and implicit representations. Explicit knowledge is clearly defined and directly encoded, often in rules or symbols. It’s much like a precisely written dictionary or a rulebook where every term and every rule is spelled out. This approach is central to Symbolic AI. For instance, Ontologies are explicit representations that define concepts within a domain and their strict relationships. Think of a meticulously designed family tree formally defining “parent,” “child,” “sibling,” and “ancestor,” along with rules such as “if A is a parent of B, and B is a parent of C, then A is a grandparent of C.”

Conversely, implicit knowledge is learned from patterns in data, rather than being directly programmed. It’s more akin to human intuition or a “gut feeling” developed from vast experience, and is fundamental to Statistical AI. Embeddings, for example, are numerical representations where concepts like words, images, or even entire documents are transformed into points in a multi-dimensional space. Systems like Word2Vec learn these embeddings by analyzing how words are used together, so words with similar meanings or contexts (e.g., “king” and “queen”) end up being numerically “close” to each other in this space, even though no human explicitly programmed that relationship.

Another way to categorize knowledge representations is by their formality. Formal representations have strict, unambiguous syntax and semantics, making them ideal for precise logical inference and computation. Mathematical equations, programming code, or statements in formal logic are prime examples, leaving no room for misinterpretation. In contrast, informal representations are more flexible, often using natural human language. While easier for humans to create and understand, they can be ambiguous and require more sophisticated processing for AI to extract meaning, as seen in a written description, a casual conversation, or an essay.

Finally, we distinguish between structured and unstructured representations. Structured knowledge is organized in a predefined, rigid format, making it easy for computers to process and query. Think of data in a spreadsheet with clear rows and columns, or a database with defined fields.

Knowledge graphs, for instance, are structured representations that organize facts as a network of interconnected entities (nodes) and their relationships (edges). A knowledge graph might have a node for “Paris,” a node for “France,” and an edge labeled “isCapitalOf” connecting them, allowing AI to easily query and infer facts.

Conversely, unstructured knowledge exists in free-form text, images, audio, or video, without a predefined schema. Extracting meaning from unstructured data is much harder and often requires advanced AI techniques.

Vector databases, for example, are often used to store and efficiently search implicit representations (embeddings) derived from unstructured data. You could take millions of research papers (unstructured text), convert each into an embedding (implicit representation), and store them in a vector database. Then, when a user asks a question, the database can find the most “similar” papers based on their embeddings, even though the papers themselves are unstructured.

Drawing Inference from Knowledge

Knowledge representation isn’t merely about storing information; its ultimate purpose is to enable AI to draw inferences and make decisions. This process of deriving new conclusions from existing knowledge is known as reasoning, and it can take both formal and informal forms.

Formal reasoning, deeply rooted in logic, is about deriving new conclusions from existing knowledge using strict, unambiguous rules. This is the hallmark of Symbolic AI. It’s a process of deduction, where if the initial premises are true and the rules are applied correctly, the conclusion is guaranteed to be true. For example, if a knowledge base contains the rules “All birds can fly” and “A sparrow is a bird,” a formal reasoner can deduce, with absolute certainty, “A sparrow can fly.” Such rule-based systems are precise and auditable, but they are limited by the completeness and accuracy of the explicitly programmed rules.

In contrast, informal reasoning is about drawing conclusions based on patterns, similarities, or analogies, often without strict logical guarantees. This type of reasoning is more akin to human intuition or common sense. It’s less about strict deduction and more about finding connections and probabilities. For example, if an AI has learned implicit representations (embeddings) of various animals, and it sees a new animal that is “numerically close” to many dogs, it might infer it’s a dog, even without explicit rules for every single feature.

This distinction is crucial for understanding the different capabilities of AI. While formal reasoning provides certainty within defined boundaries, informal reasoning allows AI to operate in ambiguous, unstructured environments. The latter, particularly reasoning by analogy in embeddings and language models, will be explored in more detail in later chapters, showcasing how AI can make sense of the world even when explicit rules are unavailable.

What is the Best Representation for Knowledge?

The choice of how to represent knowledge is a critical decision in AI design. Different representation types are chosen based on the specific problem, the type of data available, and the AI paradigm being used (Symbolic vs. Statistical). For instance, a Symbolic AI system designed for medical diagnosis might rely heavily on formal, explicit ontologies of diseases and symptoms. Conversely, a Statistical AI system for image recognition might primarily use implicit, unstructured representations of pixels that it learns from millions of example images.

This challenge highlights a theoretical result known as the Ugly Duckling Theorem. This theorem, in essence, states that without a specific purpose or “bias,” all objects are equally similar or dissimilar to one another. This implies that there is no single, universally “best” way to represent knowledge or measure similarity without a context or goal in mind. For example, an “ugly duckling” is only ugly relative to a flock of swans; it might be beautiful among other ducklings.

Therefore, the human responsibility in choosing the right representation is paramount. This choice directly impacts what an AI can “know,” how it can “reason,” and ultimately, the reliability and fairness of its inferences. Aligning the representation with the problem’s nature is a key part of building human-centered tools that truly understand and assist us.

Conclusion: The Need for Learning

Good Old-Fashioned AI (GOFAI), with its focus on search, optimization, and explicit knowledge representation, laid the essential groundwork for the field of Artificial Intelligence. Its strengths lie in domains where problems are well-defined, rules are clear, and knowledge can be precisely encoded. GOFAI systems offered precision and control, making them powerful tools for tasks like proving theorems or playing well-defined board games.

However, the ambitions of GOFAI soon ran into fundamental limitations when faced with the messy complexity of the real world. These systems proved to be brittle: a small change outside their programmed domain could break them entirely. They struggled immensely with common-sense knowledge, which is vast and often unstated. The sheer scale of real-world information made it an “insurmountable challenge” to explicitly program every piece of knowledge and every rule. GOFAI was excellent at solving problems for which it was explicitly programmed, but it couldn’t adapt, generalize, or handle unstructured data effectively. This revealed a crucial gap in AI’s capabilities.

The limitations of GOFAI highlighted a profound truth: to build truly intelligent and adaptable systems, AI needed to move beyond simply executing pre-programmed rules. It revealed a crucial need for systems that could learn from experience and data, without being explicitly programmed for every single scenario or piece of knowledge.

This growing realization of the power of learning-based approaches, which were developing concurrently with GOFAI, marked a significant shift. It showed that AI could discover its own patterns and adapt to unforeseen situations, offering a path to overcome the brittleness of purely symbolic systems.

Recognizing these limitations and actively seeking new approaches is a hallmark of the ongoing, human-driven effort to build more capable and adaptable AI. It’s a testament to our techno-pragmatist ethos: acknowledging challenges, learning from past efforts, and continuously striving to create tools that can better serve humanity’s complex needs. This increasing prominence of learning methods, which developed in parallel to GOFAI, is the story that will unfold in the next chapter.

Thank you for reading this far. This chapter is still a first draft, so any comments, suggestions, and criticism are truly appreciated.