Foundations of Artificial Intelligence
Chapter 1 of Mostly Harmless Ideas
The following article is a first draft of Chapter 1 of my upcoming book Mostly Harmless Ideas. The book is a deep dive into the goods and bads of AI, especially Generative AI and Language Models, and itâs packed with advice for all kinds of knowledge workers and creative professionals. The first part of the book cover the foundations of Artificial Intelligence, Machine Learning, Generative AI and Language Models, in accessible and intuitive terms.
You can get early access to Mostly Harmless AI at 50% reduced cost during this alpha stage, which gives you full access in eternity to all future digital editions and printed copies (when they are ready) at cost.
You can also get a lifetime pass for all my digital content, present and future, including 3 more books Iâm currently working on.
What is Artificial Intelligence, Really?
Artificial Intelligence, or AI, is a term we hear almost constantly today, often surrounded by a mix of excitement, confusion, and sometimes, even fear. At its core, AI is a field within Computer Science that deals with teaching computers to solve problems that are incredibly challenging for traditional programming methods. These arenât simple arithmetic calculations or straightforward data sorting tasks. Instead, weâre talking about complex endeavors like proving intricate mathematical theorems, navigating a robotic car through unpredictable city streets, crafting optimal schedules for thousands of flights, or even understanding and creating human-like pictures and text.
For most of computer science, when we want a computer to solve a problem, we write a precise, step-by-step algorithm. Think of it like giving a chef a detailed recipe: Take 2 cups of flour, add 1 egg, mix for 3 minutes⌠However, for the hard problems AI tackles, we often donât have such a clear recipe. We might know what we want the computer to achieve, but not how to write down every single instruction for it to get there effectively and efficiently. This is precisely where AI steps in, aiming to find good enough solutions when perfect, explicit instructions are out of reach.
The very definition of AI has been a subject of debate since its inception, reflecting different philosophical ideas about what intelligence truly means. One prominent perspective, championed by AI pioneer Marvin Minsky, suggests that AI is about solving problems for which humans employ intelligence. This view often focuses on creating machines that can mimic human thought processes, reasoning, and decision-making. Essentially, it asks: Can a machine think like us?
Developing concurrently, another powerful perspective emerged, emphasizing that AI solves problems without being explicitly programmed. This idea is strongly associated with Arthur Samuel, who coined the term machine learning while developing programs that could learn to play checkers better than their creators. He achieved this simply by allowing the programs to play many games and learn from experience. This view shifts the focus from how the AI thinks to what it can do, asking instead: Can a machine learn and adapt on its own, even if we donât give it every single instruction?
These two foundational ideasâmimicking human intelligence versus learning without explicit programmingâhave profoundly shaped the entire field of AI. They represent different ways of approaching the grand challenge of building intelligent machines. Understanding this distinction is key to grasping AIâs history and its future. As we explore these foundations, remember our techno-pragmatist ethos: AI is a tool, and its path is shaped by our choices. Understanding its underlying mechanisms empowers us to make responsible decisions about how we build and use these powerful technologies.
The Pillars of Good Old-Fashioned AI (GOFAI)
In this chapter, we will delve into the foundational ideas that laid the groundwork for Artificial Intelligence, often referred to as âGood Old-Fashioned AI,â or GOFAI. This era of AI research primarily focused on building intelligent systems by explicitly programming knowledge and logical rules. Our exploration will center on two main pillars of GOFAI.
First, weâll examine Search and Optimization, which addresses how AI finds solutions by exploring vast possibilities, particularly when a perfect, direct path isnât obvious. Second, weâll delve into Knowledge Representation, focusing on how AI organizes and understands information, allowing it to reason and make sense of the world. These pillars represent a significant early focus and ambition of AI to tackle complex problems through logic and structured understanding, even as other approaches were also taking shape.
The Age-Old Debate: Symbolic AI vs. Statistical AI
For centuries, the idea of thinking machines has captivated human imagination. But as AI emerged as a scientific field, a fascinating tension developed: a constant âback-and-forth between two core, seemingly antagonistic approaches to building intelligent machines.â This dynamic mirrors an age-old philosophical debate: rationalism versus empiricism.
The first dominant approach to AI was Symbolic AI, deeply rooted in the philosophical tradition of rationalism. Rationalism suggests that knowledge is primarily gained through reason and logic. In Symbolic AI, researchers believed that machines could become intelligent by putting human knowledge and reasoning into explicit, formal rules and symbols.
Imagine, for instance, wanting to teach a computer to play chess. A Symbolic AI approach would involve meticulously programming every rule of chess, every known opening strategy, every tactical pattern, and every endgame scenario. Itâs like giving the computer a massive, incredibly detailed recipe book or a comprehensive instruction manual for every possible chess situation. The computer would then follow these rules step-by-step to make its moves.
Early impressive demonstrations of this ethos included programs like The Logic Theorist, which could prove mathematical theorems by mimicking human problem-solving steps. Later, âexpert systemsâ were designed to emulate human experts in narrow fields like medical diagnosis. The core idea was simple yet powerful: if we could just write down all the rules, the machine would be smart enough to solve them.
Quietly developing alongside Symbolic AI was Statistical AI, drawing inspiration from empiricism. Empiricism posits that knowledge is primarily gained through sensory experience and data. In Statistical AI, the idea was to build âlearning machinesâ that could discover patterns directly from large amounts of data, rather than being explicitly programmed with rules.
Think of it like a child learning to recognize a dog. You donât give the child a list of rules like âa dog has four legs, barks, has fur,â and so on. Instead, you show them many different dogs, and they gradually learn to identify what a âdogâ is by observing patterns in the examples. Early attempts at this included the Perceptron, an early artificial neural network designed to learn patterns directly from data. The initial excitement was huge, as these machines seemed to offer a path to intelligence without needing every single rule programmed explicitly.
The Winters of AI
Despite the initial optimism, both Symbolic and Statistical AI approaches eventually hit significant roadblocks. These challenges led to periods known as âAI Wintersââtimes of reduced funding and public interest.
Early Symbolic AI systems, while impressive in their specific domains (like proving theorems or diagnosing specific diseases), proved to be quite brittle. They struggled immensely with common-sense knowledge, which is vast and often unstated. Furthermore, they couldnât easily adapt to new situations outside their carefully programmed rules. Trying to teach a machine absolutely everything it needed to know, one fact at a time, became an âinsurmountable challenge.â The real world is simply too complex and nuanced for a complete set of explicit rules to be written by humans.
Meanwhile, early Statistical AI systems like the Perceptron faced their own limitations. They lacked the âavailable data and computational infrastructureâ to learn truly complex patterns. Consequently, they couldnât become sophisticated enough, no matter how many simple âneuronsâ were connected. The computing power and data storage simply werenât ready for the ambitious learning tasks researchers envisioned.
These âwintersâ were not outright failures, but rather crucial learning periods. They revealed the inherent limitations of each approach when pushed beyond âtoy problems.â This early struggle between explicit rule-based systems and pattern-based approaches set the stage for the dynamic tension that would define AIâs entire history, constantly pushing researchers to find new ways to combine or overcome these challenges.
Search and Optimization
At the heart of many AI problems, especially in the early days, was the challenge of finding the best solution among a vast number of possibilities. This is the realm of search and optimization.
When Perfect is Impossible: The âHardâ Problems
Imagine youâre a traveling salesperson, and you need to visit a hundred different cities, visiting each exactly once, and then return home. Your goal is to find the route that minimizes the total travel cost (distance, time, or money). This is a classic example of a âhard problemâ in computer science, known as the Traveling Salesman Problem (TSP). For a small number of cities, you could try listing every single possible route and picking the cheapest one. This is called a âbrute forceâ search.
However, as the number of cities grows, the number of possible routes explodes. For just 20 cities, there are over 2.4 quintillion (2.4 followed by 18 zeros!) unique routes. Even the fastest supercomputer couldnât check them all before the universe ends. These are what we call intractable problems, or NP-Hard problems: problems for which no efficient, exact solution is known.
To tackle such problems, AI often models them as navigating a âsearch spaceâ or âstate space.â This conceptual space represents all possible configurations or situations relevant to the problem. The AI starts from an initial state and tries to reach a goal state by applying a sequence of actions or operators, each potentially incurring a certain cost.
Since finding the absolute perfect solution is often impossible or impractical within this vast space, AI shifts its goal. Instead of perfection, it seeks approximate solutions. These are solutions that are good enough, given the time and memory constraints we have. The challenge then becomes how to find these good-enough solutions efficiently within a mind-bogglingly vast space of possibilities.
Smart Shortcuts: Heuristics and Metaheuristics
To navigate these immense search spaces, AI uses clever strategies known as heuristics and metaheuristics. A heuristic is a problem-specific ârule of thumbâ strategy that uses some known properties of a problem to improve search performance. Itâs not guaranteed to find the absolute best solution, but it often finds a very good one much faster than a brute-force approach.
Consider your GPS navigation app. When you ask for directions, it doesnât calculate every single possible route from your current location to your destination. Instead, it uses a heuristic, often based on an algorithm called Aâ (A-star). If your destination is northeast of your position, the Aâ algorithm will prioritize roads going north or east, assuming they are more likely to get you there faster than roads going to the west or the south. Of course, this isnât always perfectâthere might be a faster detour to the west, or a highway thatâs counter-intuitive. Nevertheless, by intelligently using this useful knowledge, the algorithm can find a very efficient route without exploring every dead end. Itâs a smart shortcut that balances speed with a high probability of finding a good solution.
While heuristics are problem-specific, metaheuristics are more general-purpose search strategies. They leverage knowledge about the search paradigm itself and can be applied even when very little is known about the specific problemâs structure. Theyâre often used when ânothing else works.â A prime example of a metaheuristic approach is evolutionary algorithms. These computational strategies are âinspired by certain aspects of the biological process of evolution.â
Imagine you want to design the optimal layout for a computer chip (like a GPU) â a problem with an astronomical number of possible designs. An evolutionary algorithm would start with a âpopulationâ of random chip designs. Then, through cycles of âbreedingâ (combining elements from two good designs to create a new one) and âselectionâ (keeping only the best-performing designs), the algorithm iteratively âevolvesâ better and better designs. Just like biological evolution, it seems to âmagically discover quasi-optimal design elements just by sheer luck and relentless repetition,â without needing explicit instructions for every design choice. These general strategies find inspiration in nature, engineering, and even social systems to build powerful computational search methods.
Specialized Search: Beyond Simple Paths
Beyond general search and optimization, AI has developed specialized techniques for specific types of complex problems.
Adversarial Search: Thinking Ahead of the Other
Many real-world problems, especially in competitive scenarios, involve an opponent whose actions must be anticipated. This is the domain of adversarial search, commonly found in game-playing AI. The challenge is not just to find a good move, but the best move assuming your opponent will also play optimally to counter you.
One of the oldest and most fundamental techniques is Minimax. Imagine a simple game like Tic-Tac-Toe. Minimax works by having the AI âlook aheadâ through all possible future moves, assuming that you (the opponent) will always choose the move that is best for you and worst for the AI. The AI then picks the move that minimizes its maximum possible loss (or maximizes its minimum possible gain). Effectively, it plays out all possible future scenarios in its head and chooses the path that leaves it in the best possible position, no matter what its opponent does.
For games with an incredibly vast number of possibilities, like Go, simply looking ahead through every move is impossible. This is where Monte Carlo Tree Search (MCTS) comes in. Instead of exhaustively analyzing every branch, MCTS âplays outâ many random simulations of the game from a given point. It explores the most promising moves more deeply, learning which paths lead to success through repeated âtrial and errorâ simulations. This allows AI to tackle games that were once considered beyond computational reach, like when Googleâs AlphaGo beat the worldâs best Go players.
Structured Search: Satisfying All Conditions
Sometimes, the goal isnât to find the âbestâ path, but simply any solution that meets a specific set of requirements. These are constraint satisfaction problems. Here, the AI needs to find values for a set of variables such that all given conditions, or âconstraints,â are simultaneously met.
Think about solving a Sudoku puzzle. You need to fill in numbers from 1 to 9 in each cell, but with strict rules: each row, column, and 3x3 box must contain all digits from 1 to 9 without repetition. The AIâs task is to find a set of numbers for all empty cells that satisfies all these constraints.
Another common example is creating a university class schedule. You have classes, rooms, professors, and students, and a multitude of constraints: Professor A canât teach two classes at the same time; Room B can only hold 50 students; Class C requires a lab; no two classes can be in the same room at the same time. The AIâs job is to assign times and rooms to all classes such that every single constraint is satisfied. The âstructureâ of these problems, defined by the variables and their interdependencies, allows AI to use specialized search techniques to efficiently find a valid solution.
Knowledge Representation & Reasoning
Beyond just searching for solutions, a truly intelligent system needs to âknowâ things about the world. This brings us to the second pillar of GOFAI: Knowledge Representation. This field explores how AI can efficiently represent, store, and use domain knowledge in a way that computers can understand and process.
The fundamental goal of knowledge representation is to organize concepts and facts, as well as the relationships between them. This organization allows AI to reason about these facts and discover new relations. Ultimately, itâs about giving AI a structured way to âunderstandâ and make sense of information, much like how humans build a mental model of the world around them. Without a clear way to represent what it âknows,â an AI would be unable to make logical inferences or apply its knowledge to new situations.
From Raw Observations to Understanding
To truly grasp how AI âknowsâ things, itâs helpful to understand the progression from raw observations to actionable understanding. At the most basic level, we encounter Data, which consists of raw, unprocessed facts or observations. This could be a list of numbers, individual words, or pixels in an image. In isolation, data has no inherent meaning; for example, the number â30â by itself is just a number.
When we introduce context or metadata, data transforms into Information. For instance, if we know â30â is a temperature reading taken in Celsius at noon on July 1st in Havana, it becomes information. This contextualization helps us relate different observations and gives them initial meaning.
Finally, when information is enriched with semantics and rules, enabling inference, reasoning, and the discovery of new relations, it becomes Knowledge. For example, if the AI knows that âtemperatures above 30 degrees Celsius in July in Havana indicate a heatwave,â it possesses knowledge. This knowledge allows it to draw inferences (itâs a heatwave!), discover new relations (heatwaves can lead to increased energy consumption), and even take actions (warn residents about high temperatures). Itâs this ability to add meaning and logical connections that truly transforms information into actionable knowledge.
Ways to Represent Knowledge
Just as humans use different ways to store and recall information, from precise definitions to vague intuitions, AI employs various methods for knowledge representation, each with its own strengths and weaknesses.
One key distinction lies between explicit and implicit representations. Explicit knowledge is clearly defined and directly encoded, often in rules or symbols. Itâs much like a precisely written dictionary or a rulebook where every term and every rule is spelled out. This approach is central to Symbolic AI. For instance, Ontologies are explicit representations that define concepts within a domain and their strict relationships. Think of a meticulously designed family tree formally defining âparent,â âchild,â âsibling,â and âancestor,â along with rules such as âif A is a parent of B, and B is a parent of C, then A is a grandparent of C.â
Conversely, implicit knowledge is learned from patterns in data, rather than being directly programmed. Itâs more akin to human intuition or a âgut feelingâ developed from vast experience, and is fundamental to Statistical AI. Embeddings, for example, are numerical representations where concepts like words, images, or even entire documents are transformed into points in a multi-dimensional space. Systems like Word2Vec learn these embeddings by analyzing how words are used together, so words with similar meanings or contexts (e.g., âkingâ and âqueenâ) end up being numerically âcloseâ to each other in this space, even though no human explicitly programmed that relationship.
Another way to categorize knowledge representations is by their formality. Formal representations have strict, unambiguous syntax and semantics, making them ideal for precise logical inference and computation. Mathematical equations, programming code, or statements in formal logic are prime examples, leaving no room for misinterpretation. In contrast, informal representations are more flexible, often using natural human language. While easier for humans to create and understand, they can be ambiguous and require more sophisticated processing for AI to extract meaning, as seen in a written description, a casual conversation, or an essay.
Finally, we distinguish between structured and unstructured representations. Structured knowledge is organized in a predefined, rigid format, making it easy for computers to process and query. Think of data in a spreadsheet with clear rows and columns, or a database with defined fields.
Knowledge graphs, for instance, are structured representations that organize facts as a network of interconnected entities (nodes) and their relationships (edges). A knowledge graph might have a node for âParis,â a node for âFrance,â and an edge labeled âisCapitalOfâ connecting them, allowing AI to easily query and infer facts.
Conversely, unstructured knowledge exists in free-form text, images, audio, or video, without a predefined schema. Extracting meaning from unstructured data is much harder and often requires advanced AI techniques.
Vector databases, for example, are often used to store and efficiently search implicit representations (embeddings) derived from unstructured data. You could take millions of research papers (unstructured text), convert each into an embedding (implicit representation), and store them in a vector database. Then, when a user asks a question, the database can find the most âsimilarâ papers based on their embeddings, even though the papers themselves are unstructured.
Drawing Inference from Knowledge
Knowledge representation isnât merely about storing information; its ultimate purpose is to enable AI to draw inferences and make decisions. This process of deriving new conclusions from existing knowledge is known as reasoning, and it can take both formal and informal forms.
Formal reasoning, deeply rooted in logic, is about deriving new conclusions from existing knowledge using strict, unambiguous rules. This is the hallmark of Symbolic AI. Itâs a process of deduction, where if the initial premises are true and the rules are applied correctly, the conclusion is guaranteed to be true. For example, if a knowledge base contains the rules âAll birds can flyâ and âA sparrow is a bird,â a formal reasoner can deduce, with absolute certainty, âA sparrow can fly.â Such rule-based systems are precise and auditable, but they are limited by the completeness and accuracy of the explicitly programmed rules.
In contrast, informal reasoning is about drawing conclusions based on patterns, similarities, or analogies, often without strict logical guarantees. This type of reasoning is more akin to human intuition or common sense. Itâs less about strict deduction and more about finding connections and probabilities. For example, if an AI has learned implicit representations (embeddings) of various animals, and it sees a new animal that is ânumerically closeâ to many dogs, it might infer itâs a dog, even without explicit rules for every single feature.
This distinction is crucial for understanding the different capabilities of AI. While formal reasoning provides certainty within defined boundaries, informal reasoning allows AI to operate in ambiguous, unstructured environments. The latter, particularly reasoning by analogy in embeddings and language models, will be explored in more detail in later chapters, showcasing how AI can make sense of the world even when explicit rules are unavailable.
What is the Best Representation for Knowledge?
The choice of how to represent knowledge is a critical decision in AI design. Different representation types are chosen based on the specific problem, the type of data available, and the AI paradigm being used (Symbolic vs. Statistical). For instance, a Symbolic AI system designed for medical diagnosis might rely heavily on formal, explicit ontologies of diseases and symptoms. Conversely, a Statistical AI system for image recognition might primarily use implicit, unstructured representations of pixels that it learns from millions of example images.
This challenge highlights a theoretical result known as the Ugly Duckling Theorem. This theorem, in essence, states that without a specific purpose or âbias,â all objects are equally similar or dissimilar to one another. This implies that there is no single, universally âbestâ way to represent knowledge or measure similarity without a context or goal in mind. For example, an âugly ducklingâ is only ugly relative to a flock of swans; it might be beautiful among other ducklings.
Therefore, the human responsibility in choosing the right representation is paramount. This choice directly impacts what an AI can âknow,â how it can âreason,â and ultimately, the reliability and fairness of its inferences. Aligning the representation with the problemâs nature is a key part of building human-centered tools that truly understand and assist us.
Conclusion: The Need for Learning
Good Old-Fashioned AI (GOFAI), with its focus on search, optimization, and explicit knowledge representation, laid the essential groundwork for the field of Artificial Intelligence. Its strengths lie in domains where problems are well-defined, rules are clear, and knowledge can be precisely encoded. GOFAI systems offered precision and control, making them powerful tools for tasks like proving theorems or playing well-defined board games.
However, the ambitions of GOFAI soon ran into fundamental limitations when faced with the messy complexity of the real world. These systems proved to be brittle: a small change outside their programmed domain could break them entirely. They struggled immensely with common-sense knowledge, which is vast and often unstated. The sheer scale of real-world information made it an âinsurmountable challengeâ to explicitly program every piece of knowledge and every rule. GOFAI was excellent at solving problems for which it was explicitly programmed, but it couldnât adapt, generalize, or handle unstructured data effectively. This revealed a crucial gap in AIâs capabilities.
The limitations of GOFAI highlighted a profound truth: to build truly intelligent and adaptable systems, AI needed to move beyond simply executing pre-programmed rules. It revealed a crucial need for systems that could learn from experience and data, without being explicitly programmed for every single scenario or piece of knowledge.
This growing realization of the power of learning-based approaches, which were developing concurrently with GOFAI, marked a significant shift. It showed that AI could discover its own patterns and adapt to unforeseen situations, offering a path to overcome the brittleness of purely symbolic systems.
Recognizing these limitations and actively seeking new approaches is a hallmark of the ongoing, human-driven effort to build more capable and adaptable AI. Itâs a testament to our techno-pragmatist ethos: acknowledging challenges, learning from past efforts, and continuously striving to create tools that can better serve humanityâs complex needs. This increasing prominence of learning methods, which developed in parallel to GOFAI, is the story that will unfold in the next chapter.
Thank you for reading this far. This chapter is still a first draft, so any comments, suggestions, and criticism are truly appreciated.
PS: Get your copy of Mostly Harmless AI at 50% off.
This is a very comprehensive overview. I liked all the distinctions that you have provided along different dimensions of the field. I enjoy your writing style as well - very soothing and down-to-earth and jargon-free - this feels written to everyone.
I like the Ugly Duckling theorem. That's a great point about data representation being undifferentiated until it has a goal.
Here are some comments. I am a learner so please forgive any misunderstandings and any problems with tone.
This Chapter
1. I would like to see more non-text examples, such as diagrams or pictures or code snippets. In fact, providing an example English sentence for each concept would be great.
2. Would you consider a paragraph about the philosophical roots of understanding knowledge, such as Wittgenstein and Kant ?
3. It might be great to have a case study of one very influential old fashioned AI implementation.
4. I think there could be more to say about Minsky.
5. I am fascinated by the SABRE airport planning system. Airline routing seems like one of the first uses of search optimization that many of us saw.
6. It might be interesting to have a table, showing each concept, what year it originated, by whom and what are some implementation examples that we would have heard of.
6a. Or what constituted AI in each decade - a table by decade of the concepts that ruled.
7. I think there could be more real world examples. We're standing on the shoulders of some many influential systems.
8. You could mention game theory and the work of John Nash.
9. I think you could edit the conclusion. Do you need paragraphs 3 and 4? The conclusion feels not concise.
10. I think some comments about AI in popular culture would get everyone to relate to the topic.
This is a great distillation of the important concepts. I like your focus on distilled principles.
Thank you for providing this content!
All of these are really good suggestions! I'll do my best to incorporate them. Thanks!