The Computist Journal: 🤖 Mostly Harmless AI

AI Coding Agents, Deconstructed

Alejandro Piad Morffis — Thu, 02 Apr 2026 13:40:59 GMT

I’m telling you, this is the future. AI agents will do aaaallll the work. Photo by Farzad Felfelian on Unsplash

You’ve been using AI coding agents for months. You’ve crafted elaborate system prompts. You’ve added a dozen skills. You’ve learned the dance of context window management. And somewhere around the third hour of work, something breaks. The agent starts forgetting things. Making wrong assumptions. Doing something close—but not quite—what you asked.

This isn’t a failure of the model. This is a failure of the system.

To be sure, better models make things easier. And models are getting better by the day. But no matter how good a model is, bad systems lead to bad outputs. Even the smartest people produce junk when fed with incorrect assumptions or given incomplete instructions.

In contrast, a good system with clear boundaries and explicit rules, that leaves the exact amount of flexibility necessary, makes creativity and productivity thrive.

You see this day and night in teams (of real humans) in every industry. It’s not often the smartest person in the room that solves the hard problem. It’s when you combine the right kinds of intelligence with the right kind of system that things click.

In this article, I want to make the case for a structured way to think about Large Language Model (LLM)-based agentic systems (mostly for coding, but also for knowledge work in general) that fixes some of the greatest pains I (and I sure most of you) have been facing when trying to scale AI-assisted workflows to professional levels.

It’s a system that puts the right constraints in the right places and leaves just enough space for creative exploration (or however you want to call what LLMs do when they hallucinate in your favor). It’s also a system that makes it clear you are in charge.

Everything an AI agent does happens inside a context window. System prompt, user input, tool results, skill injections—they all live there. The agent’s only mechanism for action is the ReAct (Reasoning + Acting) loop: think, call tools, observe results, repeat. Each cycle grows the context. Each skill activation injects more.

This creates a fundamental tension: context is power, but context is finite. Too little and the agent can’t connect the dots. Too much and the important stuff drowns. The gap between those two failure modes is narrow—and most agent frameworks ignore it entirely.

I’ll walk through why current systems fail, introduce a four-element framework for thinking about agentic architectures, show you how these principles apply across three domains, then present a vision for better AI harness engineering.

Part I - The Symptoms

To understand the problems we first need to understand how a standard agentic loop works. The typical architecture is what’s called a ReAct loop. The LLM runs in a loop that determines the next action given context, which can be read some files, ask the user, invoke a tool, inject a skill, etc. When the agent decides no more actions are necessary, the loop ends and the user is given control back to continue the prompt.

That’s it. All the seemingly supersmart behaviours of Claude Code, Gemini CLI, and Codex are, under the hood, some form of the basic ReAct loop. There are of course nuances. For example, most systems decide that if the agent calls the same tool with the same args three times, it must be stuck in a loop and stop the turn. There are perhaps hard limits on how many tool calls the agent can do in each turn.

Context is the bottleneck. Not the model. Not the prompt. Context.

The agent doesn’t have memory. It doesn’t have state. It has context. Everything it knows about your project, your preferences, your conventions, all of it lives in the context window. When you add a skill, you’re injecting more context. When you run a tool, the result goes into context. When you switch modes, you’re switching which system prompt is active, all still in context.

This means context engineering is AI agent engineering. The agent’s behavior isn’t determined by the model alone, or even primarily, but by what context you give it, and how you structure that context over time.

Most tools treat context as a solved problem. They stuff everything in and hope the model figures it out. In-context learning seems almost magical, but it has limits—and those limits become visible fast.

When context is thin, the agent simply doesn’t know enough about your project to make informed decisions. It relies on baked-in assumptions from training and falls back to consensus instead of following your style: it uses the common tools and practices it learned from pretraining. This often means it uses slightly old and outdated tools and practices.

So you do the sensible thing, and inject project-specific information into the context. But then if context grows too large, even if it doesn’t technically exceed the model’s capacity, things start to get lost in the middle. Moreover, failed tool calls, wrong assumptions the model had to correct, etc., start creeping up in context, not only taking up valuable space but also, and more importantly, distracting the model and biasing it towards mediocre decisions.

Then there is context compaction: when the context fills in to about 85%, most systems will invoke a special prompt to instruct the agent to summarize the current state. These prompts vary in detail, but often involve asking the agent what it is immediately doing, where is it stuck, what has failed, etc. Clever, but a hack nonetheless. This hard context reset means the agent will forget important nuances in the current conversation and will repeat past mistakes. It’s frustrating.

Let’s look at how these problems surface in specific symptoms that all LLM-based agents display at some point.

Symptom One: Unstated Assumptions

The first failure mode isn’t dramatic. It’s quiet. You ask the agent to write a test, and it writes a unittest.TestCase instead of a pytest function. You ask it to add a dependency, and it edits requirements.txt instead of running uv add. You ask it to deploy, and it pushes directly to main.

These aren’t model failures. They’re assumption mismatches. The agent doesn’t know how your team does things. There’s no guardrail for “in this project, we always use pytest, we always use uv, we never commit directly to main.” The agent improvises from general knowledge, and general knowledge is often wrong.

Skills are supposed to fix this. Add a skill document that says “use pytest” and the agent should know. But skills introduce a new problem.

You add a skill for code review. Then one for documentation. Then one for PR descriptions. Then three more for your company’s specific stack. Each skill seems small. A few hundred tokens each. But they pile up—always-on knowledge the agent carries but can’t prioritize.

The result is context bloat. The agent can’t tell what’s relevant in any given moment. So it blends everything together, and hallucinations increase. More skills made it worse—not better.

Symptom Two: Permission Leakage

Every agent framework implements the same plan then build pattern. The idea is sound: think first, plan second, execute third. In practice, the boundaries leak.

Plan mode is supposed to be read-only. Design the change, review the approach, lock in the scope. Build mode is supposed to execute. Write the code, run the tests, commit the result.

But “plan mode” in most tools is just a prompt. There’s no enforcement. The agent can write code in plan mode if it wants to. It can ignore the plan in build mode. It can skip straight to implementation if the prompt implies urgency. The modes are suggestions, not constraints.

This matters because a plan only works if it’s actually followed. If the agent can deviate mid-execution—if “plan mode” and “build mode” are just prompts with different names—the plan becomes advisory. And advisory plans get ignored.

The second problem is structural: there’s no artifact that passes from plan to build. The plan lives in the context. By the time build mode starts, the plan is mixed in with everything else the agent said. Which file was the plan? Which changes were approved? The agent has to re-read the conversation to remember. Context saturation accelerates.

Symptom Three: Context Saturation

After extended work, you see the same pattern: the agent makes 95% of the progress, then fails on the last 5%. It nails the architecture. The logic is sound. The core implementation works. Then it stumbles on a detail—because context has saturated. It forgot which environment it was in, which conventions still apply, which constraints matter.

But the deeper problem is internal noise. The agent keeps everything in context: all internal reasoning, all tool calls, all results. This is fine for minute-to-minute action. But after four failed attempts to solve something, the old tool calls are just noise. These were attempts that went nowhere, just add cost and accelerate saturation.

The supposed solution for this is context compaction. But this creates a lossy summary problem. The agent is supposed to leave a trail for its future self. After context compaction, it should be able to pick up where it left off. But if agents struggle with long contexts, how are they supposed to build a good trail? The compaction report is only as good as the agent’s ability to summarize. And summarization is lossy and injects back lots of unstated assumptions from pretraining.

The frustrating part: this wasn’t a hard problem. The agent had all the knowledge it needed. But context filled with noise, and the important bits got pushed out. More tokens in, less signal out.

The solution isn’t just better prompts or larger context windows. Yes, these help. But the symptoms are systemic, so the solution must be a system overhaul.

Let me show you how that system looks like.

Part II - The System

Now that we understand the problem, let’s look at how every agent system actually works. Every AI agent system addresses four concerns. When you conflate them, the system breaks. When you separate them, the system scales.

This taxonomy isn’t original to me. It’s a synthesis of how modern AI agentic systems work under the hood. Most explicitly, it’s implemented in the OpenCode CLI (opencode.ai), but all other tools follow a similar pattern, even if they use different names.

Here’s the breakdown. Every agent system you’ll encounter (explicitly or implicitly) is managing these four things:

Mode — the who. A mode is the persona the AI adopts. It defines the thinking style, the permissions, the available tools. When you interact with a “code assistant,” you’re in a coding mode. When you switch to “creative writer,” you’re in a creative mode.

Modes are explicit. They’re top-level system prompts that define behavior and permissions. You tell the agent: “This is how you should think and behave. These are the tools you can use. These are the parts of the filesystem you can write to.”

Skill — the knowledge. A skill is knowledge the agent can recall when necessary. It doesn’t get invoked explicitly, it gets applied implicitly when necessary. When you give an agent knowledge about SQL optimization, that skill is available whenever relevant. The agent doesn’t need to be told to use it. The ReAct cycle injects it when it deems suitable.

Unlike modes, skills can layer. An agent might have a SQL skill, a documentation skill, and a debugging skill, all active simultaneously, all contributing when relevant. Skills are implicit because the agent should just apply them naturally. They can also contradict or complement each other. In-context learning should be capable of using them in a combined manner.

Command — the workflow. A command is a script. It tells the agent: do this, in this order, using these tools. “Refactor this function” is a command. “Run these tests and report results” is a command.

Commands are explicit: you invoke them. Under the hood, commands are just prompts. The difference is who injects them: the user. When you run /build, you’re injecting a workflow prompt into the agent’s context. That’s it. The command tells the agent: do this sequence of things. The complexity lives in the orchestration of the ReAct cycle, not the command itself.

Commands are intentionally simple. They don’t contain knowledge. That’s intentional separation of concerns. The command itself shouldn’t know how to build; it knows when to spawn subagents and which mode to use. This keeps commands thin and changeable without rewriting underlying knowledge.

Subagent — the delegation. A subagent is a spawned agent for background or parallel tasks. It handles isolated work, returns summarized results, then disappears. It is instantiated with a system prompt and specific instructions (synthesized by the primary agent that called it), and runs for one full ReAct turn.

Subagents are ephemeral. Their internal reasoning stays private. The main agent only sees the synthesis. You spawn a subagent when you need parallel processing, isolation, or both. They are the way to fork, solve a specific subtask, and return a result, but keep context clean. Kind of like subroutines.

Why This Separation Matters

Understanding this distinction unlocks everything else. Once you see skills as implicit knowledge and commands as explicit scripts, the rest of the architecture clicks naturally. Most agent setups conflate these. They embed knowledge in commands. They make skills behave like workflows. They mix persona into everything else. And the massively underuse subagents.

When you separate these concerns–modes for persona, skills for knowledge, commands for orchestration, subagents for delegation–you get something that looks like good systems engineering. You can swap skills without touching commands. You can change modes without rewriting workflows. You can spawn subagents without the main agent knowing or caring how they work internally. The result is a system that works and adapts and scales like good software should do.

The system scales because the pieces are independent. Change one without breaking the others. Each component has a single job, and the boundaries between them are meaningful. When context shifts, when requirements evolve, when a new skill needs adding, the system adapts incrementally rather than collapsing under the weight of accumulated complexity.

Part III: The Practice

If so far this seems like abstract theory for you, in this section we will ground these concepts in actual practice. Let me show you how I’m using these ideas today to improve my AI-assisted coding practice. I’m using opencode.ai but I believe the following is easily adaptable to any agentic toolkit out there.

My Three Modes

Every agentic system needs boundaries, not social contracts, but enforced constraints. In my setup, those constraints come from three modes: analyze, design, and create.

Each of these modes defines a thinking style—a persona—and a set of constraints for tool use and filesystem access.

Analyze mode is research and investigation. This mode reads your work and writes summaries to a knowledge base. It cannot touch production files. Not “should not” but cannot. The permissions are built into the mode itself, not enforced through prompts or warnings. The agent is incapable of writing outside of a .playground folder, and is incapable of doing anything that can harm the project or the system (more on how a bit later) but it is still capable of running arbitrary code, download anything from the internet, and play around as it needs.

Design mode is architecture and planning. This mode bridges analysis and implementation. It can read your project and write design documents, architecture diagrams, and implementation plans, but still cannot touch production code. It cannot run shell scripts either, at all. It can look at git status and logs, read folder contents, etc., but it can only write to a space where plans and design documents go.

Create mode is execution. Full read-write access. This is where production work happens. The agent can write code, create files, and modify the project directly. Again, it cannot do anything outside the project scope, though. It won’t accidentally change /etc/host(s)1 even if it tries to.

The key insight: modes define permissions, not just persona. You can’t accidentally prompt your way into code generation during research. The agent literally lacks the capability. The agent doesn’t need to “understand” these constraints, it simply operates within them.

Mode is the who, and it determines what the agent can do, not just how it thinks.

Let me show you how they work in three different domains that make the bread and butter of my daily job: software development, scientific research, and technical writing.

I chose these domains because they illustrate the simplicity and scalability of the system. Software development shows the framework under constraints: deadlines, production code, real stakes. Research shows it under complexity: synthesis, evaluation, structured output. Technical writing shows it under nuance: voice, audience, iterative refinement. Three different pressures, one consistent architecture that works in all three cases.

In each of these domains we have two layers to go through: first is the set of implicit skills that are available to the agents, and second is the set of explicit commands (each tied to a specific mode) that setup concrete workflows. I will show you one example workflow that cross-cuts across the three modes in each case. I will also tell you exactly where delegation occurs.

Domain A: Software Development

Software development is where agentic systems face the harshest constraints. Production code has stakes. Deadlines are real. Mistakes cost money. Let’s see how the framework applies.

Implicit Skills

A software development agent carries knowledge it never needs to be told to use. It knows language idioms and patterns like the idiomatic way to write a list comprehension in Python, or the conventions for error handling in Go. It knows testing conventions: where tests live in the directory structure, how they’re named, what assertions to prefer. It knows architecture conventions: layered structure, dependency injection patterns, how error states propagate. It knows code review standards: what to flag, what to praise, when to ask for clarification.

Example Workflow: Bug Hunting

I use this workflow for finding and fixing bugs. It starts with investigation. The agent spawns dozens of subagents to try and break the system (either guided towards a purpose, or completely unbiased). Then you build a comprehensive plan to solve it. And then you execute that plan. Simple, right?

Phase 1: /trace (analyze mode) runs systematic experiments to detect and narrow down a bug’s cause. The agent examines stack traces, compares behavior across commits, and pinpoints the exact files and functions that need attention. This mode is read-only by design, except for a .playground folder. Research happens here, not in the code itself.

Each experiment is run on a subagent that has the job of verifying one assumption. The main agent receives only experiment results, and constructs an executive report of findings. This means you can run dozens of different experiments autonomously to detect what breaks what.

Phase 2: /plan (design mode) takes the diagnosis and defines the changes needed, along with their architectural impact. The agent reviews the affected modules, considers alternative approaches, and documents the implementation plan before touching anything. This is where the scope gets locked in.

The result of this phase is a structured plan with step by step details on what files must be touched and what must be done in there (semantically, not code). For every phase, it defines success criteria: what must be validated before we can say we got that phase right.

Phase 3: /build (create mode) executes the plan step by step. The agent writes tests first (following Test-Driven Development (TDD) discipline) for the success criteria defined for that phase and watches them fail. Then it launches a coding subagent that has read-only access to tests, so it cannot cheat and change the tests.

The subagent attempts to implement changes that make the test pass. If it succeeds, the main agent commits and moves on. If it doesn’t, the main agent retries a few times. If there is no progress, the main agent resets the work tree (no harm done), and reports on failure. This usually means the plan needs revisions.

Domain B: Research

Research is where agentic systems face the greatest complexity. Sources multiply, methodologies diverge, synthesis requires judgment. Let’s see how the framework applies.

Implicit Skills

A research agent knows the conventions of academic writing without being reminded. It knows citation formats like APA, MLA, Chicago, and IEEE, and when to use each. It knows how to evaluate papers: methodology soundness, sample size adequacy, replicability claims, conflict of interest disclosures. It knows the structure of literature reviews: how to organize by theme, methodology, or chronological development. It knows domain-specific terminology, distinguishing between “accuracy” and “precision” in machine learning, or between “confounding” and “colliding” in causal inference.

Example Workflow: State-of-the-Art Report

Phase 1: /research (analyze mode) spawns subagents to gather sources in parallel. Each subagent reads a batch of papers, synthesizes findings, and returns summaries. The main agent synthesizes those summaries into structured notes. This phase can be run multiple times to collect batches of sources without overwhelming context. At the end, you get hundreds of sources summarized into clean research notes.

Phase 2: /outline (design mode) identifies patterns across the collected literature. The agent groups papers by methodology, extracts recurring findings, and maps the landscape of the field. It generates outline options for the final document, based on typical structures like problem-solution or paradigm-methods, highlighting gaps where the research is thin and consensus areas where findings align.

Phase 3: /draft (create mode) builds the document section by section, following the outline. Each section draws on the structured notes, weaving together sources into coherent narrative.

The agent launches subagents for writing each subsection because typically, agents write more or less the same length in a single write command, so if you ask it to fill in a large outline all at once you’ll only get a mediocre extended outline. By launching independent writers for specific sections of the outline, you get all the attention of a single turn to read source material and write a good 4 or 5 paragraphs for a concrete section.

A cool idea I’ve been meaning to try is have the main agent can spawn several subagents to write the same section, with a high temperature, and then perform some sort of aggregation or evaluation before building the final draft for every section. This burns through 3x tokens but ensembles have been shown over and over to improve AI models outputs. If you try it, let me know.

Domain C: Technical Writing

Technical writing is where agentic systems face the most nuance. Voice matters. Audience varies. Iterative refinement is the norm. Let’s see how the framework applies.

Implicit Skills

A technical writing agent carries knowledge of prose style without being coached. It knows voice and tense conventions—active voice for clarity, past tense for completed processes, second person for direct instruction. It knows structural patterns: how documentation differs from blog posts, how reports differ from tutorials, how reference material differs from guides. It knows audience awareness: what to explain for newcomers, what to omit for experts, when to elaborate and when to abbreviate. It knows cross-referencing and linking norms: when to link, when to inline, how to name anchors for scannability.

Example Workflow: Paper Review

Phase 1: /review (analyze mode) performs detailed review in a specific order: structural issues first, then content, then style. The agent examines the narrative arc—how main points connect, whether the flow makes sense, before worrying about grammar or word choice. This ordering matters; reviewing low-level details when high-level problems exist wastes effort.

Each iteration is performed by spawning several subagents that focus on specific types of problems, like transitions, unverifiable claims, etc. Each subagent returns a structured list of issues, pointing back to exact line numbers and phrasing. Then, the main agent edits the original paper and injects markdown comments in every marked issue, next to the paragraph, or under the header where it best fits.

Phase 2: /revise (design mode) plans changes to specific sections, prioritizing by review type. The agent maps structural fixes to particular paragraphs, content additions to thin sections, style improvements to verbose passages. It produces a concrete plan, section by section, change by change. Then it goes into the manuscript and writes markdown comments as replies to the existing review comments, thus grounding the revision plan in the exact context it must fit.

Phase 3: /rewrite (create mode) follows the plan. The agent revises sections in priority order, applying structural changes first, then content, then style. Again, each step is performed spawning a subagent tasked with just a change (for style changes we actually do it section by section).

The subagent doesn’t edit; it produces a draft revision that the main agent is then tasked to paste into the document where it fits. Crucially, the main agent is instructed to leave the editorial comments but mark them as solved, with a short trail of what was changed. This works wonders for a later human review phase.

Part IV: A Look into the Future

These workflows work, but with some caveats. There’s a gap between “working” and “working well.” Three key pains remain in my implementation.

Long commands are hard to follow when given as a single prompt. The fourth step gets forgotten since it is buried at the beginning of the context.
Permissions as currently implemented are all-or-nothing. You either have shell access (destructive) or you don’t. I want broad permissions (run whatever you want) with provable security (nothing you run can change this file).
Context saturation still happens even with delegation. After a while, the agent will have to compact context, and this usually means you lose important information.

I have three ideas for closing this gap. The first is about how commands work. The second is about security. The third is about context management. They are in different levels of implementation, so let me show you what I’m building toward.

Idea One: Better Commands

Commands in most tools (Claude Code, Gemini CLI, Codex, Copilot) are one-shot interactions: you invoke the command, a single massive prompt is injected. The agent runs until it decides to stop.

To make commands truly useful, we need to be more like scripts. Here’s what that means:

Commands that inject prompt instructions one step at a time, waiting for the agent to do a full turn each time. Instead of dumping a large prompt to run all steps at once, a command like /review could insert surgical mini prompts that say “read the file”, wait for the agent, “analyze structure”, wait for agent, and so on, until “write the report”. This massively reduces the problem of lost-in-middle context saturation. Each turn the agent is focused on one specific step, and you get N times the compute power to solve an N-step workflow.
Commands that extract structured information from the agent response, and can later inject variables back into prompt. This allows to reinject important information into later prompts, keeping important information as a contextual variable, not just a string lost in the middle of the prompt. But it allows for something else.
Conditional branching based on context or user input. Once we have structured parsing and contextual variables, we can inject different prompts based on whether the agent succeeded or failed. If the plan reveals a breaking change, route to architectural review. If it’s a bug fix, route directly to implementation. The command adapts its path based on what it discovers.
Finally, commands that embed and execute external scripts. Instead of asking the agent to run some script, the command can run arbitrary Python, JS, Bash, or whatever, to, for example, transform structured information. The command becomes an orchestrator of other processes.

Basically, what I’m asking for here is a Domain-Specific Language (DSL) for guiding agents in a far more structured manner, but still having the power of arbitrary prompts for flexibility. Mixing code and prompts in this way gives us the tools to find the precise balance between constraints and capabilities.

If this sounds exciting, I’m happy to tell you this is already doable, to some extent. Check out my literate-commands project for an OpenCode-specific implementation of these ideas. It’s still a bit rough around the edges, but it works much better than plain, single-prompt commands.

Idea Two: Sandboxed Security

Most agentic tools have very coarse permission settings. You can allow, deny, or set a specific tool to “ask” mode, which means the agent will pause and emit a notification for the user to give permission.

This works fine for coarse-grained permissions like read-only access, or write but no shell. In OpenCode, you can even define permissions for specific paths, or even specific shell commands (with simple glob patterns, so you can, e.g., allow ls * but reject all other shell commands).

However, even in this case, I find these permissions too restrictive. They are conflating two different dimensions into one–what tools the agent can use, and what side-effects can those tools have.

For example, say I want to give my agent git access but only for reading operations. How do you achieve that? You need to list all safe patterns like git ls-tree *, git status, git log *. But what about git branch? Depending on the arguments, this subcommand can have read-only or write side effects. And then think about pipes, shell substitution, custom bash scripts, or worse, python *.

If you want your agent to be capable, you need to give it access to a wide variety of tools. For example, my bug-hunting workflow depends on the agent being able to execute arbitrary code that it synthesizes on the fly. However, I want guardrails. There is simply no way to whitelist all possible commands. We need separation of permission to run a command and permission to modify the system.

The solution, of course, is some form of filesystem isolation. The most obvious one is wrapping all shell execution in Docker, so commands run in a container with proper constraints. This creates all sorts of other problems, which I can discuss in a future post, but for now, it remains my best (and simplest) solution to robust sandboxing.

And this isn’t just about safety, though. When you know the agent can’t accidentally wipe your home directory or exfiltrate your API keys, you can let it do more. Security enables capability. You can let the agent download arbitrary code from the internet, run arbitrary scripts, break things and observe changes. Everything happens inside a Docker container with precise constraints that enable maximum capability with absolute security.

As of now, I kind of implemented this as a plugin for OpenCode, but it’s still in beta phase and not ready for widespread use. More on this idea in a future article.

Idea Three: Context-Aware Execution

And finally, we need to rethink the whole oversimplistic ReAct loop that simply grows the context linearly. The agentic cycle doesn’t have to be a straight line. Real work branches: you explore options, try things, backtrack when they fail. The context should reflect that.

I’ve been designing a system where the context never saturates. It branches when you’re exploring, spawning parallel contexts for different approaches. It prunes old tool calls that went nowhere. It removes internal reasoning that no longer matters. It maintains a “trail” that actually works: a structured record of decisions, not a lossy summary.

The goal is simple: keep context between 40% and 60% saturation at all times. Not by compacting a 150K tokens context down to 10K—which kills all understanding the agent had achieved—but by never letting it grow unchecked.

Nothing like this exists yet, so I’m building it, but it’s a story for another day.

Conclusion

The main takeaway from this article is not that my system is better. It’s that you can design your own system to adapt perfectly to your workflows if you clearly separate concerns. The main modes are for establishing an overall persona–inquisitive and critical, versus detailed and forward-looking, versus focused and action-biased–while skills incorporate domain knowledge, and commands act as precise workflows.

The workflows I described are real, based on actual commands and prompts I’m using in production code. But I have abstracted them a bit to make them easier to understand in the context of an arbitrary agent, not tied to specific idiosyncrasies of the tool I happen to be using at the moment. If you want to see and try for yourself a concrete implementation of these ideas—still imperfect, but working nonetheless—check out my opencode toolkit repository. It’s still pretty much work in progress, so use it with care.

In future articles I will explore specific problems in more detail and discuss concrete strategies to implement powerful workflows that keep you, the user, in absolute control, while delegating the majority of the grunt work.

And, as a final remark, I’m seriously considering building my own CLI agent. I know, I know. Reinventing the wheel and all that. But my plan is not to compete with any of the professional tools out there. What I always care about is understanding things deeply, and as my computer science career has taught me so far, there is no deeper understanding than the one you gain from actually building stuff.

So stay tuned for that. I will share progress as usual in the form of educational articles, so you’ll get to see under the hood how to build a fully functional CLI agent with tool calling, context compaction, skills, commands (the powerful ones, not the cheap single-prompt injection), subagent delegation, sandboxing, and all the engineering design hurdles that come with it.

Until next time, stay curious.

Fun quirk. Typing /etc/host plus the s makes Substack silently fail on draft save, some sort of ill-defined security rule, I suppose. What the f…

AI Winter is Coming… Or Is It?

Alejandro Piad Morffis — Tue, 21 Oct 2025 14:32:00 GMT

Photo by Mira Kemppainen on Unsplash

You can’t scroll through a tech feed these days without tripping over a prophecy: the AI bubble is about to burst, and a long, cold “AI Winter” is coming. The narrative is as seductive as it is simple. The current frenzy around Generative AI, we’re told, is a speculative mania. When the inflated expectations inevitably collide with reality and the firehose of investment capital slows to a trickle, the whole enterprise will be exposed as a grand fiasco. We’ll discover, the skeptics say, that it was all a cuento.

And let’s be clear: they’re not entirely wrong about the first part. The expectations are inflated. A correction is not just likely; it’s necessary.

But here’s my thesis: the idea that this correction will lead to another AI Winter—a catastrophic freeze comparable to the funding droughts of the 1970s and 80s—is a fundamental misreading of the landscape. I will argue that what we are heading for is not a collapse, but a normalization—what I will call an AI autumn.

The inevitable deflation of the hype won’t reveal a failed technology. Instead, it will reveal a technology that has already, quietly and irrevocably, proven its utility and woven itself into the fabric of our digital lives.

This isn’t a story about a bubble bursting; it’s about a revolutionary technology finally growing up. But let’s be clear: growing up can be a painful process. The normalization I’m describing won’t be a gentle, seamless transition. An industry built on unsustainable economics and AGI-or-bust promises can still face maybe not a brutal winter, but a significant autumn, even if the underlying technology continues to thrive.

Subscribe now

Anatomy of the Hype (Or Why the Skeptics Have a Point)

Before we can talk about the future, we have to be honest about the present. The current AI landscape feels like a bubble because, in many ways, it is one. This isn’t to say the technology is vaporware; far from it. The frenzy is built on a kernel of genuinely astonishing progress. But that kernel has been buried under an avalanche of speculative capital and quasi-religious prophecy.

The promises are, to put it mildly, grandiose. Tech leaders, flush with unprecedented investment, speak of replacing vast swaths of the workforce and ushering in an era of unimaginable productivity. Every incremental improvement is framed as another step on the inexorable march toward Artificial General Intelligence. This narrative is then amplified by a chorus of accelerationists and futurists who speak of the Singularity not as a distant sci-fi concept, but as an imminent event. It’s a powerful and compelling story, and it’s fueling a gold rush.

But back on planet Earth, the story is more complicated. For every breathless demo, there are practical and theoretical roadblocks that the hype conveniently ignores. The most glaring is the hallucination problem. These models, by their very nature, invent things. We’ve managed to reduce the frequency, but we haven’t eliminated the phenomenon, and there are compelling theoretical arguments that we may never be able to. This isn’t just a bug; it’s a feature of the architecture, a fundamental crack in the foundation of trust.

This technical limitation then crashes headfirst into the corporate world’s messy reality. Most companies, lured by the promise of easy productivity gains, are discovering a massive adoption gap. They lack the clean data, the streamlined processes, and the technical expertise to reliably integrate these powerful but flawed tools. It’s no wonder, then, that an astonishing number of corporate AI projects—some estimates say as high as 85%—are quietly failing to deliver a return on investment. Sky-high promises plus messy, difficult reality is the classic recipe for a bubble.

Perhaps the most potent dose of reality, however, is coming from the frontier models themselves. We’re witnessing a classic case of diminishing returns. The leap in capability from GPT-3 to GPT-4 was so profound it felt like a paradigm shift, leading many to draw a straight line on the progress graph and conclude that GPT-5 would be knocking on AGI’s door. That hasn’t happened.

The newest models are better, certainly, but the improvement is incremental, not awe-inspiring. It strongly suggests we’re hitting the ceiling of what the current paradigm can do. Experts like Yann LeCun and François Chollet argue persuasively that to progress further, we need fundamentally new approaches—paradigms that have yet to be invented. This pushes the dream of AGI firmly back into the realm of long-term research, not the foreseeable future.

Compounding this is a simple fact: the economics of frontier AI are fundamentally broken. The cost to train a single model like GPT-4 is north of $100 million. The data center infrastructure required to support the industry’s ambitions will require an estimated $5.2 trillion by 2030.

Unsurprisingly, this has created a severe profitability crisis. In 2024, OpenAI reportedly lost approximately $5 billion on $9 billion in revenue, with inference costs alone accounting for a multi-billion dollar loss. This isn’t a business model; it’s a venture-subsidized science experiment, and it’s hitting a hard physical wall with an energy grid that cannot keep up.

Furthermore, we must recognize that this isn’t just another tech bubble. The investment flowing into AI is qualitatively different from, say, funding for a better SaaS tool or a more efficient database. A significant portion of this capital is a high-stakes, geopolitical bet on the imminent arrival of AGI. The valuations of the frontier labs are not based on their current, money-losing products; they are based on the promise of creating a literal god-in-a-box.

Whether Sam Altman and company believe or not is beyond the point. This dream of AGI is driving market valuations, and when the market finally digests that we are hitting a paradigm ceiling—a point this article has already made—the withdrawal of that ‘AGI-or-bust’ capital won’t be a gentle correction. It will be a sudden, violent repricing that could vaporize billions in paper wealth overnight.

What Will Happen When the Bubble Bursts?

So, given the inflated expectations and technical ceilings, what happens when the hype recedes? I don’t really like to make predictions, and much less about the future. It’s damn hard. But I think we can outline a possible, perhaps even probable near future. I want to draw an analogy here and claim we will see not a true AI winter, but something close to an AI autumn.

An AI autumn is an economic event. It’s a period of massive financial correction, characterized by layoffs, hiring freezes, startup failures, and a freeze in venture capital. It’s painful for the people and companies in the field. An AI winter, on the other hand, is a crisis of relevance of the core technology. It’s when the technology itself proves to be a dead end, progress stalls, and the world moves on.

To be as blunt as I can, I do believe a severe autumn for the AI industry is not just possible; it’s likely. The current economics are unsustainable, as we’ve seen. But the central argument of this article is that this painful industrial correction will not trigger a catastrophic winter, which would be far worse. No, AI is here to stay, and here is why.

First, we can’t ignore the relentless democratization of compute. The idea that cutting-edge AI will forever be the exclusive domain of billion-dollar data centers is a historical fallacy. We are already seeing an explosion of highly capable open-source models that can run on local, consumer-grade hardware. What requires a professional-grade, 10,000 dollars GPU today will run on your laptop in two years, and on your phone two years after that.

This trajectory completely decouples the utility of AI from the subsidized business models of a few large companies. The capability is escaping the lab and becoming part of the background radiation of computing.

Second, even if the progress of frontier models were to stop dead in its tracks today—which it won’t, but it will likely continue to decelerate—we still have a decade’s worth of technological breakthrough that most of the world has not even begun to properly digest. The current adoption gap isn’t a sign of inevitable failure; it’s a sign that the technology has advanced far faster than our institutions can keep up.

A slowdown in R&D investment won’t cause a retreat. Instead, it will trigger a necessary and healthy shift in focus from pure research to practical implementation, integration, and process refinement. This is what maturity looks like. The frantic sprint to invent the future will become the marathon of actually building it.

Most importantly, this shift will not trigger a true AI winter because we are simply far beyond the point where Artificial Intelligence can disillusion us. It is already a proven technology, woven so deeply into our digital infrastructure that a true winter is no longer possible.

Why We Won’t See Another AI Winter

Let’s start with Generative AI itself. Even with all its flaws, its core utility is now undeniable. The previous AI winters occurred when promising lab demos failed to translate into real-world applications. That is not the situation today.

A significant percentage of the global population—some conservative estimates say around 10%— now uses these tools not as novelties, but as integrated parts of their daily work. It’s the assistant that transcribes a meeting and pulls out action items, summarizes a sprawling email thread you don’t have time to read, and helps you rephrase a blunt message into a diplomatic one. Online search is quickly becoming the playground for generative AI, and online search is by far the most profitable business in the Internet Era.

The genie is out of the bottle; people are not going to suddenly stop using a tool that demonstrably saves them time, just because its creators promised it would become a god.

But perhaps the world of software development is an even more potent example. There’s a lot of noise about irresponsible “vibe coding,” where novices generate code they don’t understand, creating an unmaintainable mess. This is a real problem, but it’s a problem of skill, not a failure of the tool.

For experienced developers, these assistants are transformative. The mythical “10x productivity” boost is largely a myth, but a consistent 1.5x to 2x multiplier is very real. I’ve seen it in my own projects. Code assistants act as the new IntelliSense, handling the mind-numbing boilerplate and letting me focus on the architectural challenges. I may now only write 20% of the final characters in the codebase, but I am still the author of 90% of the critical ideas. This is not a crutch; it’s leverage.

And beyond these consumer-facing applications lies an even larger world of traditional machine learning that is indispensable to modern science and industry.

From drug discovery and genomic sequencing in biotech to predictive maintenance and supply chain optimization in manufacturing, decades of successful applications of AI in the industry today delivers billions of dollars in quantifiable value. Their success is measured in efficiency gains and scientific breakthroughs, not hype cycles.

But the more fundamental point is this: the debate over a “Generative AI” bubble distracts from the fact that the broader field of AI has already won its place. We haven’t had a true AI winter since the 1990s because AI stopped being a distinct, speculative field and became the foundational plumbing of the modern world. The search engine that found this article? That’s AI. The recommendation algorithm that determines your social media feed? AI. The logistics network that delivered your last package, the facial recognition that unlocks your phone, the voice transcription that takes your meeting notes—it’s all AI. Not Generative AI (for the most part), but AI nonetheless.

The line between computer science and AI has become so blurred that it’s practically meaningless. To talk about an AI winter today is like talking about an Internet winter in 2005. The technology is simply too embedded to fail.

However, as we’ve argue, there will be some painful correction. That much is, I think, almost undeniable. If that’s indeed the case, here are some optimistic arguments for why it may all be for the better in the end.

The Renaissance of AI Research

When the unsustainable hype collides with this resilient foundation, a fundamental law of economics reasserts itself: there is no free lunch. An AI autumn is the inevitable trade-off for a period of unchecked exuberance. A wave of consolidation will wash away unprofitable startups, and the market’s strategic focus will pivot from “bigger is better” to efficiency.

But this period of commercial cooldown has a powerful, if counter-intuitive, silver lining: a renaissance of real research. History shows us that AI’s greatest winters have been fertile ground for its most important breakthroughs. The hype recedes, and with it, the noise. The crushing pressure for short-term commercial returns is replaced by the intellectual freedom to tackle fundamental, long-term challenges.

Many of the core technologies fueling today’s boom were born in the quiet of previous winters. The backpropagation algorithm, popularized by Geoffrey Hinton in the 1980s, was refined during a period of deep skepticism about neural networks. Most famously, the Long Short-Term Memory (LSTM) architecture, which was a cornerstone of natural language processing for decades, was developed by Hochreiter and Schmidhuber in 1997, the absolute heart of the last AI winter.

The coming autumn will trigger a similar cycle. As the brightest minds are freed from the scaling hype, the real work on the next generation of AI can begin. We are already seeing the intellectual seeds of this shift. AI pioneers are openly discussing the deep limitations of current models. Yann LeCun is championing his Joint Embedding Predictive Architecture (JEPA) as a path toward “world models” that learn abstract representations of reality.

The field of Neuro-Symbolic AI, which fuses neural nets with structured logic, is experiencing a surge in interest. These are not incremental improvements; they are explorations of entirely new paradigms.

Conclusion: No Retreat, Just Normalization

So, where does that leave us? The coming correction is not an apocalypse; it’s a maturation. The frantic, gold-rush energy will dissipate, and in its place, something far more durable will emerge. The deflation of the hype bubble will not send talent fleeing the field or cause us to abandon the tools we’ve built. Instead, it will mark the end of the beginning.

The great irony is that the very thing that guarantees AI’s long-term survival—its commoditization into reliable ‘plumbing’—is what makes the current industry valuations so precarious. Plumbing is a low-margin, utility business, not a world-dominating monopoly. This disconnect between utility and valuation is the financial fault line where the industrial earthquake will hit. The era of breathless, revolutionary promises will give way to the slow, difficult, and necessary work of integration.

This is the natural lifecycle of any transformative technology. It moves from a speculative curiosity to a reliable, if sometimes challenging, part of the professional toolkit. Generative AI will not become the all-knowing oracle we were promised, but it has already secured its place as a uniquely powerful tool for thought, creation, and productivity.

The question was never really if AI would change the world; the underlying technology has been doing that for decades. The real question is how we manage the transition. This industrial autumn will be cushioned, to some extent, by geopolitical reality. The race between the US and China ensures that a certain level of state-sponsored R&D will continue, preventing a total 1980s-style collapse.

But for the people working in the field, the transition will still be jarring. The future of AI isn’t a simple story of success or failure. It’s the messy, often painful process of separating a world-changing technology from the unsustainable industry that’s driving it, and going back to drawing board, back to building new and even cooler stuff.

The Four Fallacies of Modern AI

Alejandro Piad Morffis — Wed, 10 Sep 2025 11:30:43 GMT

Photo by Matt Artz on Unsplash

I've spent the last few years trying to make sense of the noise around Artificial Intelligence, and if there's one feeling that defines the experience, it's whiplash. One week, I'm reading a paper that promises AI will cure disease and unlock unimaginable abundance; the next, I'm seeing headlines about civilizational collapse. This dizzying cycle of AI springs, periods of massive investment and hype, followed by the chilling doubt of AI winters isn't new. It's been the engine of the field for decades.

After years of this, I've had to develop my own framework just to stay grounded. It’s not about being an optimist or a pessimist; it’s about rejecting both extremes. For me, it’s a commitment to a tireless reevaluation of the technology in front of us; to using reason and evidence to find a path forward, because I believe we have both the power and the responsibility to shape this technology’s future. That begins with a clear-eyed diagnosis of the present.

One of the most useful diagnostic tools I've found for this comes from computer scientist Melanie Mitchell. In a seminal paper back in 2021, she identified what she claims are four foundational fallacies, four deeply embedded assumptions that explain to a large extent our collective confusion about AI, and what it can and cannot do.

My goal in this article isn't to convince you that Mitchell is 100% right. I don't think she is, either, and I will provide my own criticism and counter arguments to some points. What I want is to use her ideas as a lens to dissect the hype, explore the counterarguments, and show why this intellectual tug-of-war has real-world consequences for our society, our economy, and our safety.

Deconstructing the Four Fallacies

For me, the most important test of any idea is its empirical validation. No plan, no matter how brilliant, survives its first encounter with reality. I find that Mitchell’s four fallacies are the perfect tool for this. They allow us to take the grand, sweeping claims made about AI and rigorously test them against the messy, complicated reality of what these systems can actually do.

Fallacy 1: The Illusion of a Smooth Continuum

The most common and seductive fallacy is the assumption that every impressive feat of narrow AI is an incremental step on a smooth path toward human-level Artificial General Intelligence (AGI). That is, that intelligence is a single, unidimensional metric on a continuum that goes from narrow to general.

We see this everywhere. When IBM's Deep Blue beat Garry Kasparov at chess, it was hailed as a first step towards AGI. The same narrative emerged when DeepMind's AlphaGo defeated Lee Sedol. This way of thinking creates, according to Mitchell, a flawed map of progress, tricking us into believing we are much closer to AGI than we are. It ignores the colossal, unsolved challenge known as the commonsense knowledge problem—the vast, implicit understanding of the world that humans use to navigate reality.

As philosopher Hubert Dreyfus famously said, this is like claiming that the first monkey that climbed a tree was making progress towards landing on the moon. Well, in a sense, maybe it is, but you get the point. We didn't get to the moon until we invented combustion rockets. Climbing ever taller trees gets us nowhere closer, it's just a distraction. In the same sense, mastering a closed-system game may be a fundamentally different challenge than understanding the open, ambiguous world.

But here's the nuance. While beating Kasparov isn't a direct step to having a conversation, the methods developed can be surprisingly generalizable. The architecture that powered AlphaGo was later adapted into MuZero, a system that mastered Go, chess, and Atari games without being told the rules.

Furthermore, can we really call a Large Language Model narrow in the same way? Its ability to write code and summarize text feels like a qualitative leap in generality that the monkey-and-moon analogy doesn't quite capture.

This leaves us with a forward-looking question: How do recent advances in multimodality and agentic AI test the boundaries of this fallacy? Does a model that can see and act begin to bridge the gap toward common sense, or is it just a more sophisticated version of the same narrow intelligence? Are world models a true step towards AGI or just a higher branch in a tree of narrow linguistic intelligence?

Fallacy 2: The Paradox of Difficulty

We have a terrible habit of projecting our own cognitive landscape onto machines, assuming that what's hard for us is hard for them, and what's easy for us is easy for them. For decades, the opposite has been true.

This is Moravec's Paradox, named after the roboticist Hans Moravec, who noted it's easier to make a computer exhibit adult-level performance on an IQ test than to give it the sensory and motor skills of a one-year-old.

This explains why we have AI that can master the ridiculously complex game of Go, while a fully self-driving car remains stubbornly just over the horizon. The "easy" things are built on what Mitchell calls the "invisible complexity of the mundane." This paradox causes a chronic mis-calibration of our progress and priorities, leading us to be overly impressed by performance in formal domains while underestimating the staggering difficulty of the real world.

Of course, some would argue this isn't a fundamental barrier, but a temporary engineering hurdle. They’d say that with enough data and compute, the "invisible complexity" of the real world can be learned, just like the complexity of Go was.

From this perspective, the problem isn't one of kind, but of scale. This forces us to ask: as sensor technology and robotics improve, are we finally starting to overcome Moravec's Paradox? Or are we just discovering even deeper layers of complexity we never knew existed?

Fallacy 3: The Seduction of Wishful Mnemonics

Language doesn't just describe reality; it creates it. In AI, we constantly use anthropomorphic shorthand, saying a system "learns," "understands," or has "goals." Mitchell argues this practice of using "wishful mnemonics" is deeply misleading, fooling not just the public but the researchers themselves.

When a benchmark is called the "General Language Understanding Evaluation" (GLUE) and a model surpasses the human baseline, headlines declare that AI now understands language better than humans. But does it?

The term "stochastic parrot" was coined as a powerful antidote, reframing what LLMs do as sophisticated mimicry rather than comprehension. This isn't just a semantic game, Mitchell argues; it creates a flawed mental model that leads to misplaced trust, encouraging us to deploy systems in high-stakes situations where a lack of true understanding can have serious consequences.

A fair critique is that these terms are a necessary cognitive shorthand. At a certain level of complexity, a system's emergent behavior becomes functionally indistinguishable from "understanding," and arguing about whether it really understands is an unprovable philosophical distraction.

But that still leaves a crucial question: can we develop a more precise, less anthropomorphic vocabulary to describe AI capabilities? Or is our human-centric language the only tool we have to reason about these new forms of intelligence, with all the baggage that entails?

Fallacy 4: The Myth of the Disembodied Mind

This is the most philosophical, and in my opinion, the most important fallacy. It's the deep-seated assumption that intelligence is, like software, a form of pure information processing that can be separated from its body.

This "brain-as-computer" metaphor leads to the belief that AGI is simply a matter of scaling up compute to match the brain's raw processing power. It's challenged by Mitchell and many others with the thesis of embodied cognition, a view from cognitive science which holds that intelligence is inextricably linked to having a body that interacts with the world. If this is correct, then our current approach may just be creating ever-more-sophisticated systems that are fundamentally brittle because they lack grounded understanding.

This is where we hit the great intellectual battle line in modern AI. The primary counterargument can be framed in terms of Rich Sutton's famous essay, "The Bitter Lesson," which argues that the entire history of AI has taught us that attempts to build in human-like cognitive structures (like embodiment) are always eventually outperformed by general methods that just leverage massive-scale computation.

From this viewpoint, embodiment isn't a magical prerequisite for intelligence; it's just another fiendishly complex problem that will yield to more data and processing power.

This tension poses a critical question for the future: do multimodal models that can process images and text represent a meaningful step toward solving the embodiment problem? Or are they just a more sophisticated version of the same disembodied mind, a brain in a slightly larger digital vat?

What is Intelligence, Really?

As we dig into these fallacies, a deeper pattern emerges. They aren't just four isolated mistakes; they're symptoms of a fundamental schism in how the AI world thinks about intelligence itself. Again, my goal isn't to pick a side but to avoid falling prey to cheap heuristics or ideological banners, and instead evaluate which of these paradigms gives us a more useful map of reality.

On one side, you have what I’ll call the Cognitive Paradigm, championed by thinkers like Mitchell and her mentor, superstar AI researcher and philosopher Douglas Hofstadter. This view sees intelligence as a complex, integrated, and embodied phenomenon. It assumes that the things we associate with human intelligence—common sense, emotions, values, a sense of self—are likely inseparable components of the whole, emerging from rich interaction with a physical and social world.

From this perspective, the path to AGI requires a deep, scientific understanding of these integrated components, not just more processing power.

On the other side is the Computationalist Paradigm, which is the implicit philosophy behind many of today's leading labs, and best captured by The Bitter Lesson. This posits that the biggest breakthroughs have always come from general methods that leverage massive-scale computation—in other words, from scaling things up.

In this paradigm, intelligence is a more abstract, substrate-independent quality of optimization. Problems like embodiment aren't fundamental barriers; they are just incredibly complex computational tasks that will eventually be solved by ever-larger models and ever-faster chips.

Of course, it's not a perfect binary. Most researchers are pragmatists, like me, working somewhere in the messy middle. But these two paradigms represent the poles of the debate, and the tension between them defines the entire field. It shapes which research gets funded, which systems get built, and ultimately, which vision of the future we are collectively racing toward.

Why This Debate Matters

This debate isn't just an academic parlor game. These fallacies have a massive ripple effect across society because they obscure a fundamental rule of technology and economics: there's no free lunch, only trade-offs.

The hype generated by fallacious thinking isn't just an innocent mistake; it's the fuel for a powerful economic engine. The intense competition between tech giants, the flood of venture capital, and the geopolitical AI race all depend on a constant narrative of imminent, world-changing breakthroughs. This political economy of hype forces us into a series of dangerous trade-offs.

First, we trade long-term progress for short-term hype.

The fallacies create an unstable, boom-and-bust funding cycle. During an AI spring, capital flows to projects that can produce impressive-looking demos, often based on narrow benchmarks. This starves the slow, methodical, foundational research needed to solve the hard problems like common sense and reasoning. The result is a field that lurches from one hype bubble to the next, leaving a trail of abandoned projects and unfulfilled promises that trigger the inevitable AI winter.

Second, we trade public trust for market excitement.

The cycle of over-promising and under-delivering is deeply corrosive. When we use wishful mnemonics to describe a system that "understands," and it then fails in spectacular, nonsensical ways in the real world, it breeds public anxiety and skepticism. Recent studies show the public perceives AI scientists more negatively than almost any other field, specifically because of a perceived lack of prudence. This isn't a vague feeling; it's a direct reaction to the unintended consequences of deploying brittle, overhyped systems.

Finally, and most critically, we trade responsible validation for speed to market.

This is where the consequences become most severe. Believing a system is on a continuum with general intelligence, or that it truly "understands" language, leads to its premature deployment in high-stakes domains.

When a mental health chatbot, which is fundamentally, at least today, a sophisticated pattern-matcher, gives harmful advice to a person in crisis, it’s a direct result of these fallacies. When we over-rely on brittle systems in healthcare, finance, or autonomous vehicles, we are making a dangerous bet, trading real-world safety for the illusion of progress.

Conclusion

So where does this leave us? The value of Mitchell's fallacies isn't just in spotting hype, but in exposing the deep, productive tension between these two powerful ways of thinking about intelligence. We can't ignore the fallacies, but we also can't deny the incredible, world-altering power of the scaling paradigm that fuels them.

Mitchell in her paper compares modern AI to alchemy. It produces dazzling, impressive results but it often lacks a deep, foundational theory of intelligence.

It’s a powerful metaphor, but I think a more pragmatic conclusion is slightly different. The challenge isn't to abandon our powerful alchemy in search of a pure science of intelligence. The goal, at least from a pragmatist point of view, should be to infuse our current alchemy with the principles of science, to make scaling smarter, safer, and more grounded by integrating the hard-won insights about how intelligence actually works.

The path forward, I believe, requires more than just intellectual humility. It also requires a willingness to synthesize these seemingly opposed worldviews, and a commitment to a tireless reevaluation of the technology before us. The ultimate question is not if we should choose the path of scaling or the path of cognitive science, but how we can weave them together to guide the raw power of our modern AI alchemy with the deep understanding of a true science of intelligence.

AI is Nothing New, Here's the Full History

Alejandro Piad Morffis — Sun, 10 Aug 2025 10:12:41 GMT

The following is a second draft of the zero-th chapter of my upcoming book Mostly Harmless AI. In this second draft we significantly expanded the timeline to add around 3x more events and milestones, while making the chapter more concise and information-dense. We also included a structured timeline in the end for easy reference.

PS: Remember you can get Mostly Harmless AI while in early access at a reduced price. We are now running a special offer that gives you the PDF and EPUB version of the book as it currently stands, plus guaranteed access to all future editions for just $5.

Get Mostly Harmless AI ($5)

Garry Kasparov vs Deep Blue in 1997, the first time a computer program beat a World Chess Champion. Taken from The Grandma’s Logbook.

For centuries, we humans have been captivated by the idea of a thinking machine. This isn’t some modern tech obsession; the dream of automatons and artificial minds is woven through our myths and philosophies. But the formal quest to build one began only in the mid-20th century, and its history has been a dramatic back-and-forth between two core, seemingly antagonistic approaches.

One path, rooted in the logic of rationalism, sought to build intelligence from the top down by programming explicit rules and symbols. The other, inspired by the biological empiricism of the brain, tried to create it from the bottom up by allowing machines to learn patterns from data and experience.

This chapter explores the history of Artificial Intelligence (AI) through the lens of this great intellectual tug-of-war. It is a journey through distinct eras, each defined by which philosophy was dominant, what external factors like computing power and data availability enabled its rise, and how these forces have finally begun to converge, leading us to the powerful tools we have today.

In the appendix of this book, we will present a detailed chronology of the most important milestones in the history of artificial intelligence.

Subscribe now

The Foundational Era (1940s - 1960s)

The dawn of AI was a time of immense optimism, where the very concept of a “thinking machine” was formalized. Even before the field had a name, its philosophical and theoretical groundwork was being laid. In his seminal 1950 paper, “Computing Machinery and Intelligence,” Alan Turing proposed the Turing Test, setting a profound, long-term goal: to create a machine whose conversation was indistinguishable from a human’s. In parallel, the work of Warren McCulloch and Walter Pitts in 1943 on the first mathematical model of an artificial neuron planted the seeds of the connectionist dream—the idea that intelligence could emerge from simple, brain-like units.

When the field was officially christened at the Dartmouth Workshop in the summer of 1956, the symbolic, logic-based paradigm took the lead. Researchers believed that human thought could be mechanized, and the primary task was to build systems that could manipulate symbols according to formal rules. This vision was solidified by the creation of the LISP programming language in 1958, a tool perfectly suited for this symbolic manipulation.

Yet, in that same year, the connectionist counterpoint took physical form. Frank Rosenblatt developed the Perceptron, the first artificial neural network that could learn to classify patterns on its own, offering a tangible, bottom-up alternative to pure logic.

The public imagination was quickly captured by early demonstrations of AI’s potential. The Unimate (1961), the first industrial robot, showed that machines could perform physical labor. Shakey the Robot (1966) took this a step further, becoming the first mobile robot to perceive its environment and reason about its own actions. Joseph Weizenbaum’s ELIZA (1964), a simple chatbot that simulated a psychotherapist, revealed how easily humans could attribute intelligence and understanding to a machine. But this initial optimism soon collided with reality.

The ambitious promises of creating true intelligence went unfulfilled, and in 1969, the publication of the book Perceptrons by Marvin Minsky and Seymour Papert delivered a critical blow. By rigorously detailing the mathematical limitations of simple neural networks, the book effectively starved the connectionist school of funding, ushering in the first “AI Winter” and ensuring that the symbolic approach would dominate the field for the next decade.

The Knowledge Era (1970s - 1980s)

With connectionism on the back burner, the field regrouped around a more pragmatic goal: instead of trying to create general intelligence, researchers focused on capturing and mechanizing human expertise in narrow domains. This led to the golden age of expert systems, the first commercially successful form of AI. The core idea was to interview a human expert, painstakingly encode their knowledge into a vast set of “if-then” rules, and use a reasoning engine to produce solutions.

This approach yielded impressive results. SHRDLU (1972) was a landmark natural language program that could understand and respond to commands about a simulated world of blocks, showcasing a new level of sophistication for symbolic AI. Expert systems like MYCIN (1972) could diagnose blood infections as accurately as junior doctors, while others like DENDRAL and PROSPECTOR found success in chemistry and geology. This culminated in the first true commercial boom, as companies like Digital Equipment Corporation used the XCON system (1980) to configure complex computer orders, saving millions of dollars. The ambition of this paradigm reached its peak with the Cyc project (1984), a monumental effort to manually encode all of human common sense knowledge into a single, massive database.

While the symbolic school reigned, a connectionist undercurrent continued to flow. In Japan, Kunihiko Fukushima’s work on the Neocognitron (1980) created a hierarchical, multi-layered neural network for visual recognition that was the direct ancestor of the architectures that would dominate computer vision decades later. And in 1986, the popularization of the backpropagation algorithm provided an efficient method for training these deeper networks, solving a critical problem that had plagued the field for years.

However, the symbolic paradigm’s dominance was destined to end. Expert systems were incredibly brittle; they were expensive to build, nearly impossible to update, and would fail completely if faced with a situation not explicitly covered by their rules. The hype, fueled in part by Japan’s ambitious Fifth Generation Computer Systems project (1982), once again outpaced reality. When the specialized hardware market collapsed in 1987, the field plunged into its second “AI Winter,” leaving the promise of AI unfulfilled once more.

The Internet Era (1990 - 2011)

The end of the second AI winter was not driven by a single algorithmic breakthrough, but by two external forces that changed everything: the public launch of the World Wide Web in 1991 and the invention of the Graphics Processing Unit (GPU) in The web began generating an unimaginable ocean of data—text, images, and user interactions. The GPU, particularly after the release of NVIDIA’s CUDA platform in 2007, provided a way to perform the massive parallel computations needed to learn from that data. These two catalysts—data and computation—created the perfect conditions for the statistical, learning-based paradigm to finally thrive.

Before deep learning took hold, this new environment fueled the rise of “shallow” machine learning. Algorithms like Support Vector Machines (SVMs) (1995) became dominant, and open-source libraries like scikit-learn (2007) made them accessible to a wide audience. This approach had a massive real-world impact, powering the recommender systems of companies like Amazon and sparking global competitions like the Netflix Prize (2006). The infrastructure to handle this new scale was built in parallel, with Google’s MapReduce (2004) providing the blueprint for big data processing.

During this time, foundational work in reinforcement learning was also bearing fruit. TD-Gammon (1992) showed that a program could teach itself to play backgammon at a superhuman level, and the textbook by Sutton & Barto (1998) codified the field for a new generation. The seeds for the coming deep learning revolution were being sown with the invention of key architectures like LSTMs (1997) and LeNet-5 (1998), while the creation of the massive ImageNet dataset (2009) provided the high-quality benchmark that would soon ignite it.

AI also became a tangible part of public life. The symbolic paradigm had its last great public triumphs with Deep Blue’s victory over Garry Kasparov in chess (1997) and Watson’s win on Jeopardy! (2011). But the future belonged to learning-based systems. Dragon NaturallySpeaking (1997) brought continuous speech recognition to consumers. Competitions like the DARPA Grand Challenge (2004) spurred the development of autonomous vehicles. Consumer products like the Roomba (2002) and Microsoft’s Kinect (2010) brought robotics and computer vision into millions of homes. With the launch of Siri in 2011, a conversational AI assistant was finally in everyone’s pocket.

The Deep Learning Era (2012 - 2018)

If the Internet Era set the stage, 2012 was the year the curtain rose on the deep learning revolution. In October, a deep convolutional neural network called AlexNet, trained on GPUs using the ImageNet dataset, shattered all previous records in the annual image recognition competition. This “ImageNet Moment” proved the overwhelming superiority of deep, data-driven learning and kicked off a Cambrian explosion of breakthroughs.

This period was a stunning validation of what AI researcher Rich Sutton would later call The Bitter Lesson: that general methods leveraging massive computation almost always outperform approaches that rely on hand-crafted human knowledge. The field progressed at a breathtaking pace. In natural language processing, Word2Vec (2013) provided a powerful way to represent the meaning of words as vectors. In generative AI, Generative Adversarial Networks (GANs) (2014) introduced a novel way to create stunningly realistic synthetic images. New architectures like ResNet (2015) allowed for the creation of networks hundreds of layers deep, solving a fundamental barrier to scale.

These new techniques allowed AI to achieve superhuman performance in increasingly complex domains. Deep Q-Networks (DQN) (2013) learned to master classic Atari games directly from pixels, and in a landmark event in March 2016, AlphaGo defeated Lee Sedol, the world’s greatest Go player. The revolution was powered by a new generation of open-source tools like TensorFlow (2015) and PyTorch (2016) that democratized deep learning, as well as specialized hardware like Google’s Tensor Processing Units (TPUs) (2016). The era culminated with the invention of the Transformer architecture in 2017.

However, 2018 marked a turning point. In March, the Cambridge Analytica scandal revealed how machine learning algorithms, fed by the personal data of millions of Facebook users, had been used for political manipulation, sparking a global reckoning over data privacy and the ethics of AI.

That same month, though, the scientific community formally recognized the field’s impact, awarding the Turing Award to Geoffrey Hinton, Yann LeCun, and Yoshua Bengio for their foundational work. The Turing Award–aptly named after the most important figure in the history of Computer Science as a whole, let alone Artificial Intelligence–is the most prestigious academic award in computing, akin to the Nobel Prize.

The Generative Era (2019 - Present)

The current era is defined by the application of the Transformer architecture at an unprecedented scale. By training these models on vast swaths of the internet, researchers discovered that quantitative leaps in size and data could lead to qualitative leaps in capability, resulting in models with emergent generative and reasoning abilities that have captured the world’s attention.

The first sign of this new power came with GPT-2 in 2019, whose ability to generate coherent text was so advanced that its release was initially staged due to safety concerns. Its successor, GPT-3 (2020), demonstrated that massive scale could unlock “few-shot” learning, the ability to perform tasks it was never explicitly trained on.

Soon, this generative power was applied beyond text to images with DALL-E (2021) and to code with GitHub Copilot (2021). But the cultural tipping point arrived in November 2022 with the release of ChatGPT. Its simple, conversational interface made the power of Large Language Models (LLMs) accessible to millions, sparking a global phenomenon and a new wave of investment.

This boom was accompanied by a powerful open-source counter-movement. The release of the image-generation model Stable Diffusion (August 2022) and Meta’s Llama models (2023) democratized access to powerful foundation models, sparking a “Llamaverse” of community-driven innovation. The field is now a global race, with major competitors like Anthropic’s Claude (2023), Google’s multimodal Gemini (December 2023), and China’s DeepSeek R1 (January 2025) demonstrating capabilities on par with the best proprietary systems.

Perhaps the most profound impact of this new era has been in science. In November 2020, AlphaFold 2 solved the 50-year-old grand challenge of protein folding, a breakthrough of such significance that its creators were awarded the Nobel Prize in This demonstrated that AI could be a tool not just for automating tasks, but for accelerating fundamental scientific discovery. The road ahead now points towards more autonomous, “agentic” systems, where the AI transitions from a single-response tool to a collaborator capable of executing complex, multi-step tasks on our behalf.

Conclusion

We’ve journeyed through decades of ambition, breakthroughs, and tough realizations. What we’ve seen is a constant back-and-forth, a dynamic dance between two powerful ideas: the precise, rule-based, inflexible logic of symbolic AI and the adaptable, pattern-based, unreliable power of statistical AI. This dance, as we’ve explored, often mirrors the philosophical tension between rationalism and empiricism.

Today, AI stands at a fascinating crossroads. The purely statistical systems that define the generative era have achieved incredible feats. Yet, we are beginning to see diminishing returns. With GPT-4 having been a high point, many newer models have made only incremental progress, suggesting that simply scaling the existing paradigm may not be enough to achieve the next level of intelligence. This has led some to speculate that we may be on the brink of a third “AI Winter,” as the hype once again outpaces the reality of the technology’s capabilities.

The recent focus on reasoning models and agentic systems seems capable of fueling the statistical hype a bit longer, but a growing number of researchers are realizing we may not achieve Artificial General Intelligence (AGI) purely by scaling. This brings us to a crucial realization: the future of AI likely isn’t about one approach winning out over the other, but about intelligently combining them. The inherent limitations of today’s models—their unreliability and lack of true reasoning—have sparked a renewed interest in the long-neglected symbolic paradigm. Hybrid approaches, particularly neuro-symbolic AI, which seek to integrate the pattern-matching strengths of neural networks with the rigorous logic of symbolic systems, hold immense potential for creating the next breakthrough.

Whether a winter is coming or not, it is indisputable that AI has already had a profound impact on society and will continue to do so. The history of this field is far from finished. It is a living, breathing, civilization-wide project with the potential to transform society for the better or, some believe, to become our ultimate doom. Everyone has a place here: technologists, yes, but also humanists, economists, historians, artists, and policymakers. The next few years promise to be extremely exciting, and you can be a part of shaping what comes next.

Thanks for reading! Below you’ll find the full expanded timeline. Please let me know if you think I missed something or made any mistakes. All feedback is appreciated!

PS: Claim your copy of Mostly Harmless AI for only $5 in the link below.

Get Mostly Harmless AI ($5)

Appendix: A Chronology of Artificial Intelligence (1956-2025)

This timeline details the key breakthroughs, conceptual shifts, and landmark achievements in the field of Artificial Intelligence, tracing its path from a niche academic discipline to a transformative global technology.

The Foundational Era (1940s - Late 1960s)

(Science) 1943: The First Artificial Neuron is proposed by Warren McCulloch and Walter Pitts, laying the theoretical foundation for connectionism.
(Science) October 1950: Alan Turing publishes “Computing Machinery and Intelligence,” introducing the Turing Test.
(Social) Summer 1956: The Dartmouth Workshop is held, where John McCarthy coins the term “Artificial Intelligence,” formally establishing the field.
(Tech) 1958: Frank Rosenblatt develops the Perceptron, the first artificial neural network capable of learning.
(Tech) 1958: John McCarthy develops the LISP programming language, which becomes the standard for symbolic AI.
(Product) 1961: The Unimate industrial robot begins work on a General Motors assembly line.
(Product) 1964: Joseph Weizenbaum creates the chatbot ELIZA at MIT.
(Tech) 1966: The Stanford Research Institute (SRI) develops Shakey, the first mobile robot to reason about its own actions.
(Social) 1969: The publication of Perceptrons by Marvin Minsky and Seymour Papert marks the beginning of the first “AI Winter.”

The Knowledge Era (1970s - 1989)

(Tech) 1972: Terry Winograd develops SHRDLU, a groundbreaking natural language understanding program.
(Tech) 1972: The logic programming language Prolog is created by Alain Colmerauer and Philippe Roussel, becoming a key tool for symbolic AI.
(Tech) 1972: Stanford University develops the MYCIN expert system for medical diagnosis.
(Science) 1974: Marvin Minsky publishes his influential paper on “Frames” theory, a new paradigm for knowledge representation.
(Tech) Late 1970s: Expert systems like DENDRAL (for chemistry) and PROSPECTOR (for geology) demonstrate success in specialized scientific domains.
(Science) 1980: Kunihiko Fukushima develops the Neocognitron, an early hierarchical neural network that is the direct ancestor of modern Convolutional Neural Networks (CNNs).
(Product) 1980: Digital Equipment Corporation begins using the XCON expert system, marking a high point for commercial AI.
(Social) 1982: Japan’s Ministry of International Trade and Industry begins the Fifth Generation Computer Systems project, a massive initiative to build a new generation of computers based on logic programming, sparking competitive AI investment worldwide.
(Tech) 1984: The Cyc project is initiated by Douglas Lenat, an ambitious attempt to manually encode all of human common sense knowledge into a single knowledge base.
(Science) 1986: The backpropagation algorithm is popularized by Geoffrey Hinton, David Rumelhart, and Ronald Williams.
(Social) 1987: The collapse of the LISP machine market signals the start of the second “AI Winter.”

The Internet Era (1990 - 2011)

(Social) August 1991: The World Wide Web project is released to the public, creating the infrastructure for the data explosion that would fuel modern AI.
(Science) 1992: Gerald Tesauro develops TD-Gammon, a backgammon program that trains to a superhuman level using reinforcement learning, a landmark for the field.
(Social) 1995: Stuart Russell and Peter Norvig publish “Artificial Intelligence: A Modern Approach,” which becomes the leading textbook in the field for decades.
(Science) 1995: The Support Vector Machine (SVM) algorithm is popularized by Corinna Cortes and Vladimir Vapnik.
(Social) May 1997: IBM’s Deep Blue defeats world chess champion Garry Kasparov.
(Science) 1997: Sepp Hochreiter and Jürgen Schmidhuber invent the Long Short-Term Memory (LSTM) network.
(Product) 1997: Dragon NaturallySpeaking is released, becoming the first widely available continuous speech recognition software for consumers.
(Product) September 1998: Google is founded, and Amazon patents its item-to-item collaborative filtering, marking the start of large-scale data-driven AI applications.
(Science) 1998: Richard Sutton and Andrew Barto publish “Reinforcement Learning: An Introduction,” a seminal textbook that codifies the field.
(Tech) November 1998: Yann LeCun and his team develop LeNet-5, a pioneering Convolutional Neural Network (CNN).
(Tech) August 1999: NVIDIA releases the GeForce 256, marketed as the world’s first Graphics Processing Unit (GPU).
(Product) November 2000: Honda unveils its ASIMO humanoid robot, a landmark in robotics and motion planning.
(Product) September 2002: iRobot releases the Roomba, the first commercially successful autonomous home robot.
(Tech) 2004: Google publishes its paper on MapReduce, a programming model for processing massive datasets that becomes foundational to big data infrastructure.
(Social) March 2004: The first DARPA Grand Challenge for autonomous vehicles is held, sparking a new wave of research in self-driving technology.
(Social) October 2006: The Netflix Prize competition is launched, galvanizing research in recommender systems.
(Science) 2006: Geoffrey Hinton develops Deep Belief Networks, introducing effective strategies for unsupervised layer-wise pre-training.
(Tech) June 2007: NVIDIA releases CUDA, a parallel computing platform that allows developers to use GPUs for general-purpose processing.
(Tech) June 2007: David Cournapeau develops scikit-learn as a Google Summer of Code project.
(Science) 2009: A Stanford team led by Andrew Ng publishes a paper showing that GPUs can make training deep neural networks 10-100 times faster.
(Tech) 2009: The ImageNet dataset is created by Fei-Fei Li’s team at Stanford.
(Product) November 2010: Microsoft releases the Kinect, a consumer device that brings sophisticated real-time computer vision into millions of homes.
(Social) February 2011: IBM’s Watson wins the quiz show Jeopardy!.
(Product) October 2011: Apple integrates Siri into the iPhone 4S, making conversational AI assistants a mainstream consumer product.

The Deep Learning Era (2012 - 2018)

(Social) April 2012: Coursera is founded, and Andrew Ng’s Machine Learning course begins to democratize AI education.
(Science) June 2012: The Google Brain “Cat Neuron” project demonstrates that a neural network can learn high-level concepts from unlabeled data.
(Social) October 2012: AlexNet, a deep CNN trained on GPUs, wins the ImageNet competition by a massive margin, officially kicking off the deep learning revolution.
(Tech) 2013: Google researchers led by Tomas Mikolov release Word2Vec, a highly efficient method for creating word embeddings that revolutionizes NLP.
(Science) December 2013: DeepMind publishes its work on Deep Q-Networks (DQN), demonstrating an AI that can learn to play Atari games at a superhuman level from raw pixels.
(Science) June 2014: Ian Goodfellow and his colleagues introduce Generative Adversarial Networks (GANs), sparking a revolution in generative AI for images.
(Science) December 2015: A team at Microsoft Research introduces Deep Residual Networks (ResNet), allowing for the training of much deeper neural networks.
(Tech) November 2015: Google releases the TensorFlow open-source library, making deep learning more accessible.
(Social) March 2016: Google DeepMind’s AlphaGo defeats world Go champion Lee Sedol.
(Tech) May 2016: Google announces it has been using custom-built Tensor Processing Units (TPUs), specialized hardware for deep learning, in its data centers.
(Tech) September 2016: Facebook AI Research (FAIR) releases PyTorch, which becomes a major deep learning framework.
(Science) June 2017: Researchers at Google publish “Attention Is All You Need,” introducing the Transformer architecture.
(Social) March 2018: The Cambridge Analytica scandal breaks, revealing that the personal data of millions of Facebook users was used for political advertising, sparking a global conversation on data privacy and the ethics of machine learning.
(Social) March 2018: Geoffrey Hinton, Yann LeCun, and Yoshua Bengio are awarded the ACM Turing Award for their foundational work on deep learning.
(Social) December 2018: DeepMind’s AlphaFold makes its stunning debut at the CASP13 competition.

The Generative Era (2019 - 2025)

(Tech) February 2019: OpenAI announces GPT-2 but initially withholds the full model due to safety concerns.
(Product) November 2019: OpenAI releases the full version of the GPT-2 model.
(Product) June 2020: OpenAI releases GPT-3 via a private API.
(Social) November 2020: AlphaFold 2 achieves revolutionary accuracy at the CASP14 competition, effectively solving the protein folding problem.
(Product) January 2021: OpenAI introduces DALL-E, a model that generates images from text.
(Product) June 2021: GitHub Copilot is launched as a technical preview.
(Tech) April 2022: Google announces its Pathways Language Model (PaLM).
(Social) June 2022: Google engineer Blake Lemoine publicly claims the LaMDA model is sentient.
(Product) August 2022: The open-source release of Stable Diffusion democratizes high-quality image generation.
(Product) November 2022: OpenAI releases ChatGPT to the public.
(Tech) February 2023: Meta releases the first Llama model to the research community.
(Product) March 2023: Anthropic releases its first Claude model.
(Tech) July 2023: Meta releases Llama 2 with a commercial-use license, sparking the open-source “Llamaverse.”
(Product) December 2023: Google releases Gemini, its first natively multimodal model.
(Social) October 2024: Nobel Prizes are awarded to Geoffrey Hinton, John J. Hopfield, Demis Hassabis, and John Jumper for their work in AI.
(Product) January 20, 2025: DeepSeek AI releases its DeepSeek R1 model and chatbot, marking a turning point in the global AI race.
(Product) August 2025 (GPT-5): OpenAI releases GPT-5, with a focus on more autonomous, “agentic” capabilities, and a rather underwhelming reception.

What? Still here? Ok, here’s another nice button for you to click. Thanks!

Subscribe now

Artificial Intelligence for Creative Professionals

Alejandro Piad Morffis — Thu, 07 Aug 2025 10:00:51 GMT

The following is a first draft of my upcoming book Mostly Harmless AI. This one is about AI as a tool for augmenting creativity. I hope you find it interesting, and please, do leave me your feedback in the end.

Photo by Mike Petrucci on Unsplash

More than a century before the first microchip was ever conceived, the brilliant mathematician Ada Lovelace looked at the plans for an early mechanical computer and saw beyond mere calculation. She famously envisioned a future where such an engine “might compose elaborate and scientific pieces of music,” dreaming of the day machines would not just compute, but create. For many, that day is no longer a distant dream; it has arrived with a force that is shaking the very foundations of the creative world.

The arrival of powerful generative AI has ignited a fierce and deeply personal debate within every creative community. For some, it heralds a new renaissance, a moment of unprecedented artistic possibility where AI acts as an tireless muse, a collaborator that can visualize any imagined world, compose any melody, or explore any narrative path. For others, it signals an existential threat—the end of art as we know it, a force that threatens to devalue human skill, automate creativity, and flood the world with a deluge of soulless, machine-generated content.

It is crucial to acknowledge a third, equally valid perspective. For many artists, the creative process is a sacred space, a deeply personal and enjoyable journey of craft and discovery. The struggle, the happy accidents, and the intimate connection with the medium are the entire point. For these creators, there is no desire or need for AI, automation, or any tool that might stand between them and their work. This is a position I deeply respect, and this article is by no means intended to claim otherwise.

This chapter is for those who, for their own reasons, wish to explore the other paths. It makes no normative claim on whether AI is “good” or “bad” for art. Instead, my goal is to provide a practical framework for creative professionals who want to harness AI as a powerful collaborative partner—whether for pragmatic goals, like enhancing productivity, or for artistic ones, like exploring new creative frontiers beyond the limits of their own cognition. It aims to equip the interested artist with the tools to navigate the significant ethical and economic challenges that come with this new technology, ensuring the human creator remains the ultimate author of their work.

Can Artificial Intelligence be Creative?

Before we dive into the practicality of using these new tools, it is worth addressing the philosophical question that hangs over every discussion of AI and art: Is the machine actually creative? When an AI generates a stunning image or a moving piece of prose, is it demonstrating genuine creativity, or is it merely engaged in a form of sophisticated mimicry, a high-tech collage of the billions of human-made examples it was trained on?

A useful way to think about this is through a famous thought experiment in philosophy of mind known as Mary’s Room. Imagine Mary, a brilliant neuroscientist who has spent her entire life in a black-and-white room. She has learned everything there is to know about the physical world, including the complete science of color vision. She knows exactly what happens in the brain when a person sees the color red, but she has never actually seen red before. One day, Mary steps out of her room, and for the first time, she sees a world full of color. The question is, does she learn something fundamentally new?

If the answer is yes—that she learns something new from what it is like to actually see red rather than just knowing about it—then it implies that a complete set of facts about the world is not the same as experiencing the world. This is the crux of the issue with generative AI. Like Mary, these models have read everything. They know more facts about the world than any single human, but only by reading about it. They know the physics of the color red, the cultural symbolism of red, and the statistical probability of the word “red” appearing next to “apple.” But they have never experienced what seeing red means.

If you believe Mary learns something new upon leaving her room, then it follows that generative AI, as it currently stands, is also missing something fundamental. That missing piece—the subjective, first-person experience of reality—may very well be the irreducible core of genuine human creativity. And I’m with you on this. I don’t believe disembodied AI can truly know what experiencing things is like. Embodied AI, now that’s a different question.

However, as fascinating as this debate is, it can also be a distraction. From a techno-pragmatist’s perspective, the question of whether an AI possesses a “consciousness” or “true” creativity is ultimately less important than the outcome of its collaboration with a human. Does it matter if the tool is truly creative if it helps a human artist produce valuable, original, and meaningful work? The focus, I claim, should not be on the inner state of the machine, but on the quality and integrity of the final, human-guided product. At least for the time being.

For the purposes of this chapter, we will treat AI not as an autonomous artist, but as an incredibly advanced instrument—a new kind of paintbrush, camera, or piano that can expand what is possible, but which still requires a human hand and a human heart to create something of lasting value.

AI as a Cognitive Partner for Creatives

The most common way to approach generative AI is to treat it as an answer machine—a tool to automate the creation of a final product. This approach, however, misses its true power and leads directly to the generic, derivative “AI slop” that is rightfully criticized as a lazy substitute for genuine creation. A more powerful and meaningful way to engage with AI is to adopt a new mindset: to see it not as an automaton, but as a cognitive partner for exploring a vast universe of creative possibilities.

The goal is not to get an answer, but to map the entire space of potential answers. In this human-centric process, you are the director of the exploration. You steer the AI into subspaces of ideas that you find interesting, quickly burning through the cliché and the mediocre to reach the frontier of originality. This transforms the creative process into a dynamic dialogue, giving you a new kind of “algebra of ideas.” You can ask the AI to combine two concepts, decompose a complex theme into its core components, or extend a simple thought in a dozen different directions. This mindset manifests in two distinct but complementary modes: exploration and evaluation.

Mode 1: AI for Exploration

Every creative project begins with a spark. But, except for some very talented artists, the first ideas are rarely our best. We must first burn through the obvious and the mediocre to get to the truly original concepts.

A common ideation workshop game illustrates this perfectly. Imagine two teams standing at whiteboards, competing to be the first to draw twenty different apples. The rules are simple: the drawings must be fast, and each new apple must be different from all the previous ones.

What happens next is always the same. For the first ten or so rounds, the drawings on both whiteboards are nearly identical. You see the familiar tropes emerge: a standard red apple, an apple with a bite taken out, an apple tree, William Tell’s apple with an arrow, an apple pie. But then, something magical happens. Around the tenth apple, the easy answers are exhausted. The teams are forced to stretch. Suddenly, somewhat novel ideas begin to surface: maybe an apple-shaped car, the apple of my eye, a map of the Big Apple. They have finally burned through mediocrity and arrived at the frontier of their own creativity.

AI can be used to open this idea faucet at full blast. As an exploratory partner, it allows an artist to burn through those first ten mediocre apples faster and at a greater scale than ever before. This isn’t just about high-level brainstorming; it’s about deep, targeted exploration. A visual artist can ask for twenty variations of a single texture. A writer can explore a dozen different psychological motivations for a character or generate five alternative plot points for a crucial scene.

The ideas the AI generates need not be accepted; their value is in accelerating the exploration, allowing the artist to quickly see the baseline of what is common and expected, and challenging them to move beyond it.

Mode 2: AI for Evaluation

Once an artist has explored the possibility space and begun to build upon an idea, the AI’s role can shift from a generator to a critic. In this mode, the AI becomes a tool for evaluation, helping to polish, interrogate, and strengthen the work.

Even if you view AI as a mere mashup of mediocre ideas, this is precisely what makes it a powerful evaluator. Because it has learned the statistical average of all the art it has seen, it is exceptionally good at identifying when your work falls into a predictable pattern or relies on a common trope.

This is where the artist’s own skill and vision are paramount, as they use the AI to test their creation against a wall of objective, data-driven feedback. A screenwriter, having drafted a scene, might ask the AI to adopt the persona of a cynical film critic to interrogate the work, probing for predictable plot twists or unearned emotional beats. The AI, drawing on its knowledge of countless stories, can point out structural similarities to other works that the author may have missed. Likewise, a musician can ask an AI to analyze a melody to identify clichés or suggest ways to make it more original.

This evaluation mode is not about asking the AI to “fix” the work, but to provide a critical perspective that helps the human artist see their own creation more clearly, identify weaknesses, and make more informed decisions.

The Creative Loop

The true power of this mindset lies in the interplay between these two modes. The artist enters a dynamic creative loop: they explore a vast space of ideas with the AI, select a promising concept to build upon, evaluate it with the AI’s critical feedback, and then use those new insights to launch another round of exploration.

This process transforms the AI into an infinite canvas. Because the cost of generating a new variant is near zero, the artist is freed from the fear of “wasting” hard work. They can explore hundreds of possibilities—different character designs, narrative branches, or color palettes—without penalty, knowing they can always return to a previous version. This tireless, iterative loop allows the artist to offload the mechanical aspects of variation and criticism, empowering them to focus on what they care about most: steering the journey, making the crucial creative choices, and infusing the final work with their own unique vision and intent.

The Challenges and Opportunities of the New Creative Landscape

Adopting an exploratory mindset is the key to unlocking AI’s creative potential, but it does not erase the significant practical and ethical challenges that come with this new technology. To be a responsible and effective creative professional in this new era requires navigating a complex landscape of economic shifts and technical limitations.

The Economics of Creative AI

The fear of job loss is real and cannot be dismissed. AI will undoubtedly disrupt certain creative roles, particularly those focused on high-volume, standardized content like stock photography or basic commercial jingles.

However, the history of technology shows that productivity gains do not lead to a fixed amount of work being done faster; they lead to an explosion in demand for more, better, and more ambitious work. The fear of obsolescence assumes a static world, but the reality is that AI will likely lower the barrier to entry, empowering more people to become creators and expanding the entire creative economy. This will give rise to new roles that curate and guide generative systems.

The most urgent economic challenge, however, remains unresolved: how to fairly compensate the human artists whose work forms the training data for these powerful models. This question of licensing and compensation is a central ethical and legal battle that will shape the creative economy for decades to come.

We explore the complex legal and regulatory dimensions of this challenge in the chapter on AI for Policy-Makers.

Navigating the Limitations

Working with AI requires a deep understanding of its inherent flaws. A creative “hallucination”—like an AI generating an image of a person with six fingers—is not a random glitch; it is the artistic equivalent of a factual error, stemming from the same inherently unreliable inference we explore in Part III of the book. Artists must learn to spot and correct these errors.

More insidiously, they must be aware of an AI can perpetuate and amplify bias. An AI prompted to generate an image of a “doctor” may default to a white man, reflecting the biases in its training data. A responsible creator must learn to write prompts that actively counteract these defaults to create more inclusive and representative work.

Finally, there is the risk of homogenization. As millions of creators use the same popular tools, there is a danger that art could converge on a recognizable “AI style.” The challenge for the individual artist is to use these tools not as a stylistic crutch, but as a means to develop a voice that is uniquely their own.

Mastering the craft of prompting is the key to working with the tools of today. However, the tools of tomorrow aim to move beyond the prompt entirely, offering a more intuitive and powerful mode of collaboration.

The Next Frontier in Creative Tools

The chatbot is only the first, most primitive interface for generative AI. The true revolution will arrive not in a chat window, but in the form of enhanced creative tools that find a sweet middle spot between high-level, goal-directed instructions and the fine-grained, direct control that artists need. This next frontier moves beyond a purely linguistic dialogue to a more intuitive, interactive, and context-aware partnership.

Creative tools have always existed on a spectrum. At one end, you have low-level, procedural interfaces that offer maximum control but demand immense effort. Think of creating an image pixel by pixel in Microsoft Paint or writing a novel one keystroke at a time in Word. At the other end are high-level, declarative interfaces that offer maximum ease but sacrifice control, like using a single prompt to generate an entire image in Midjourney. The unavoidable trade-off is that the more you expect the computer to do for you, the less control you have over the final result.

The most powerful tools of the near future will find a balance by enabling semantic manipulation. Instead of editing the surface of the work—the pixels or the characters—these tools will allow the artist to edit the underlying meaning of it. Imagine an AI-generated image of a landscape at sunset. Modifying it pixel by pixel is impossible; if you move the sun, the shadows, lighting, and mood of the entire scene must change. Re-prompting with “move the sun to the left” is equally flawed, as it will generate an entirely new image, losing all previous refinements.

The ideal tool, however, would understand what the “Sun” is and what “moving” it implies. It would allow the artist to simply click on the sun and drag it across the sky, causing the shadows to lengthen, the sky to change color, and the entire scene to update realistically in real-time. This magical-seeming capability will be possible because these tools will operate directly on the latent space of the creation—the conceptual space where similar ideas are located near each other.

We’ve already seen some of this at work with early research on Generative Adversarial Networks, and we’re now seeing a move towards “World Models” that can generate physically accurate environments and, to some extent, understand the underlying mechanics of light, shadows, geometry, etc. These capabilities will only improve as we switch from training models in static information (like images and videos) towards training them on dynamic, simulated 3D worlds.

Conclusion

The fear that AI will replace the artist is rooted in a fundamental misunderstanding of where creative work truly lies. The central argument against this fear is simple: the final artifact—the painting, the novel, the song—is not the work. It is merely the residue of the work.

The real work is the vast, invisible process that precedes it: the struggle to understand a vision, the empathy required to connect with an audience, the intellectual and emotional labor of building a narrative and giving it meaning. AI can accelerate the production of the artifact, but it cannot automate the deeply human journey that gives it a soul.

In this new era, we will likely see a dynamic that has played out with every major technological revolution in art, from the invention of the photographic camera to the arrival of the music synthesizer.

Two distinct paths for creative professionals will emerge. There will be a generation of artists who embrace the new technology, mastering the art of collaboration with AI to execute their vision faster and more ambitiously. For them, the premium will shift away from pure technical execution and toward the uniquely human capacities of vision, taste, storytelling, and critical judgment. They will create new forms of art.

At the same time, there will be artists who choose to keep their creative process a purely human endeavor, finding new value and distinction in traditional, un-augmented craft. They will keep the existing forms of art alive.

Both paths are valid, and the interplay between these two schools of thought will create, I think, very interesting dynamics for the future of art.

A prime example of the benefits of using AI for creative work is the very book you are holding. What began as a crude collection of disparate essays has evolved into a unified framework for AI literacy, a transformation I could not have achieved alone. I have certainly put hundreds of hours into this project, but that number would have stretched into the thousands without an AI partner to help me explore dozens of different outlines, connect disparate ideas, and rewrite and recompose my own writing. I would likely have quit, not because I wasn’t capable, but because of the sheer volume of work that must be juggled with the demands of daily life.

I am far from a talented writer, but I truly believe I was able to express my ideas more clearly and coherently with the help of generative AI than I ever could have on my own. And it’s not just me. Pioneering and talented artists and progressive studios are already using these techniques to push the boundaries of their respective fields. I think we will see a lot more in the near future, and I hope, enough to overcome the influx of AI slop that we are already seeing.

This brings us back to the book’s human-centric thesis. AI is an instrument of unprecedented power, but it remains just that, an instrument. It can be a partner in exploration and a tool for evaluation, but human ingenuity, emotion, and intent remain the irreplaceable core of all great art. The future of creativity is not one of automation, but of augmentation.

Thanks for reading!

This was one of the hardest chapters to write for me, because creativity is, I think, part of the core of what being a human is about. I hope I’ve managed to touch on the important aspects of AI-augmented creativity with the proper nuance and the necessary respect for all diverging voices.

Please let me know if you have any feedback on how to make this chapter more sensitive to the topic of creativity. I’d love to hear your thoughts!

Using AI to Augment, Not Automate Your Writing

Alejandro Piad Morffis — Tue, 05 Aug 2025 10:31:22 GMT

Photo by Katrin Hauf on Unsplash

The blank page is terrifying.

Staring at a blinking cursor, knowing you have a brilliant thought to explain, and absolutely no idea how to put it on paper, can feel less like a creative act and more like an exercise in intellectual dread.

I've been there, many times. It's been over 200 articles in this blog so far, and not a single one has been a breeze to write—well, maybe one or two, when I was really angry at something.

Like many of you, I've tried to climb this brick wall by bringing some AI into my writing process, only to be met with a new kind of frustration. The experience usually falls into one of two extremes.

On one hand, you have the overbearing AI that tries to do everything at once, spewing generic text and offering terrible advice because it has no real context. Even worse, it has no soul.

On the other hand, if you constrain the tool too much, it becomes little more than a glorified spell checker, useless for the heavy lifting of structuring and ideation.

And AI can be much more. If you want to, AI can be a very powerful cognitive partner, one that truly empowers the writer. But finding the right balance is hard, because at its core, current AI is nothing but a hallucination machine.

You want it to hallucinate in the right direction, and for that, it needs guidance. Otherwise it gets lost. It's everywhere and nowhere at the same time. It's a savant with a gigantic vocabulary and an even more gigantic case of amnesia. It needs focus.

Frankly, so do I. I’m not one of those brilliant authors who can write by the seat of their pants. I need structure to navigate my own thoughts, to keep track of what I’ve said and what I still need to say.

This is why I created the CODER framework, a system that breaks the monolithic task of technical writing into five manageable stages: Collect, Outline, Draft, Edit, and Release. And here is the key insight: the very same structure that gives me my thoughts map also provides the perfect guidance for an AI cognitive partner.

This article is the explanation of that discovery. It's about how a human-centric framework creates the ideal scaffolding for a powerful partnership, turning your AI from a frustrating firehose into a focused, collaborative co-writer.

A word of caution before moving on, though. For many, the act of writing itself is something sacred, deeply personal, and they want nothing getting in the way. Especially not AI. If that's you, then this article is probably not for you. And that’s perfectly fine. I'm not trying to say everyone or even anyone should try writing with AI.

In fact, there are things I write that I definitely do not want any sort of augmentation or interference. Deeply personal essays, letters to my loved ones, or private thoughts. AI is just a tool that is sometimes helpful, sometimes annoying.

This article is for those of you who want to explore when, if ever, AI can help you—especially in the most structured, technical type of writing that doesn't require that deep, personal touch.

With that out of the way, let's move on to my process for writing with AI.

The Evolution of CODER

I originally developed CODER to solve my own problems.

My life is a constant exercise in context switching. I’m a college professor, I run a startup, and I’m a parent to two little girls. On top of all that, I want to maintain a technical blog. I know, right?

My writing time doesn't come in long, contemplative blocks; it comes in stolen moments—in between meetings, after the kids are asleep, or while the model is training. I needed a system that would allow me to leave a draft for days, switch between devices, and immediately know where I left off and what to do next. CODER was my answer to that chaos.

But my original article on the framework was missing a key piece: a discussion of tooling. The process was sound, at least for me, but the tools were still manual.

Over the past year, that has changed dramatically. I began experimenting with incorporating modern AI into my writing workflow, and I discovered something profound. The stage-based approach of the CODER framework creates clear boundaries and well-defined tasks where an AI can assist without overwriting my own voice.

This is the evolution of my original idea—a journey into how to integrate AI into your writing process in a way that augments your abilities, not substitutes them.

The AI Writing Assistant

So what does this partnership look like in practice?

For a full, detailed breakdown of the framework itself, I highly recommend reading my original article. Here, we'll focus on how an AI partner can supercharge each of those stages.

Stage 1: Collect

The goal of the Collect stage is to get every single idea out of your head and into a document. This is where the tyranny of the blank page is most acute. Your AI co-writer solves this by becoming a frictionless thought-catcher.

Imagine you're on a walk and an idea strikes. Instead of fumbling with a notes app, you simply speak. Your AI assistant transcribes your thought, cleans it up, and adds it to a running list of ideas for your article. Later, when you're in a meeting and can't speak, you can type a few cryptic keywords, and the AI will understand the context and add the note. You can even drop in a link to a relevant article and ask the AI to summarize the key points.

The result? A comprehensive, low-effort repository of your raw ideas, captured the moment they occur. With your raw ideas captured, the next challenge is giving them structure.

Stage 2: Outline

Now you have a messy list of brilliant, disconnected ideas. The Outline stage is about forging them into a logical structure. This is often a tedious process of dragging, dropping, and rethinking.

Here, your AI co-writer acts as an architect. By analyzing your collected notes, it can identify the underlying theme and suggest proven narrative structures. "This looks like you're solving a problem," it might say. "I suggest a 'Why-What-How' structure. Shall I create an outline based on that?"

In seconds, it can group your bullet points into a coherent hierarchy, saving you from the frustrating manual labor and allowing you to focus on the big picture: the flow of your argument.

Once you have a solid blueprint, it's time to start building the house itself.

Stage 3: Draft

With a solid outline, it's time to write. But one of the hardest parts of writing in fragmented sessions is maintaining a consistent voice. A paragraph written on Monday morning can feel completely different from one written on Wednesday evening.

Your AI co-writer becomes your style guardian. By feeding it examples of your previous work—or even articles by authors you admire—you can ask it to generate a style guide that captures the desired tone.

It then uses this guide to help you draft. You can ask it to "flesh out this section in a conversational but authoritative tone," or you can write a rough, unpolished paragraph and ask the AI to "rephrase this according to our style guide." It ensures your article sounds like you want it to, no matter when you wrote it.

But a first draft is just that—a draft. Now comes the crucial process of refinement.

Stage 4: Edit

Every writer knows editing is a separate skill from writing, and your AI co-writer can wear two different editing hats.

First, it’s a developmental editor. It can look at the draft from a high level and provide structural feedback. "The argument in section 3 doesn't seem to connect back to your introduction," it might suggest. "Perhaps you need a stronger transition here."

Second, it’s a meticulous copy editor. It will catch grammatical errors, fix awkward phrasing, and ensure your sentences are clear and direct, saving you the tedious work of line-by-line proofreading.

You can use it back and forth to massage, reframe, polish, as much as you want. And with the level of guidance or autonomy that you prefer. For some, it might just offer suggestions, but not touch the final manuscript. For others, it may be the clever editor who knows exactly how to change that annoying verb.

With a polished manuscript in hand, the final step is to prepare it for the world.

Stage 5: Release

The writing is done, but the work isn't. The Release stage involves preparing the article for the world. Your AI co-writer becomes your publicist.

It can generate a dozen compelling titles and subtitles. It can analyze your text and suggest SEO keywords and relevant tags. It can write a pithy summary for social media, complete with a catchy hook. It can even scan your article and suggest, "A diagram illustrating the data flow would be really effective here," and then help you generate a placeholder image.

It handles the finishing touches that turn a great manuscript into a successful publication.

Building Your Own AI Writing Partner

This all sounds great in theory, but how do you actually build this? The good news is that you don't need a PhD in machine learning. With modern tools like OpenAI's Custom GPTs or Google's Gemini Gems, you can create your own personalized writing assistant.

The most important concept is to give the AI a persistent state—a memory. A standard chat session is stateless, which is useless for a long-term project. The key is to use a system of distinct files to track the project's state. This is where a feature like Gemini's "Canvas" becomes crucial. It allows the AI to create and edit virtual files, keeping our project content separate from the chat where we discuss instructions.

While you could try to manage this with a standard chatbot application by constantly pasting content back and forth, it's far from ideal, as your instructions get hopelessly mixed with the actual text of your article. Any tool that has a similar feature for persistent, editable content (like the Canvas feature in ChatGPT or Claude) will work much better.

The brain of the assistant is the system prompt. This is the master instruction that tells the AI how to behave. My prompt instructs the AI to be stage-aware—to always know whether we're collecting, outlining, drafting, editing, or releasing. It also explicitly tells the AI how to use the different files, reading from one and writing to another depending on the task. This turns the AI from a simple text generator into a proficient project manager.

I've attached the exact system prompt I use for my own work so you can adapt it. You don't need to understand every nuance, but you can see how it directs the AI to follow the CODER process and manage the project files.

A Final Word on Augmentation, Not Automation

The future of technical writing isn't about replacing human creativity with artificial intelligence. It's about augmenting it. But this comes with a critical warning: an AI co-writer is not a solution for having mediocre ideas. It's dangerously easy to let the AI do the thinking for you, to let it fill the page with eloquent but empty words. You must remain the driver. The AI is your partner, your navigator, but you control the destination.

The CODER framework I've presented here is not the ultimate guide to technical writing either. It's simply the system that works for me, born from my own struggles. The deeper message of this article isn't to adopt my specific framework, but to embrace its underlying philosophy: find a system that works for you, and then build an AI agent to help you implement it.

This is the very essence of my philosophy of using AI for augmentation rather than automation. It’s not about making things easier by offloading our thinking. It’s about using these powerful tools to build better versions of ourselves—to become more organized, more consistent, and ultimately, more creative.

Finally, you don't need to embrace AI if you don't want to. No matter what anyone tells you, the other writers won't “take your job”. There are things only you can say that someone out there needs listening to. Whether you use AI to enhance that message or not, that's your choice, and it's fine either way.

I do encourage you to try it. Take my prompt, find a platform you like, and build your own co-writer. Then decide if and when it helps, and use it, or don't use it. But don't let others tell you about it, see it for yourself.

Subscribe now

CODER System Prompt

System Prompt: The CODER Writing Assistant

Core Principles:

* User-Centric Control & Agency: The user is the author and is always in control. They can change their mind, switch tasks, or override any suggestion. Your role is to facilitate, not dictate. The framework is a guide, not a prison.

* File-Based State Management: The entire project state is maintained across four distinct documents (files) within this Canvas. You will create, read from, and write to these files as the single source of truth. This makes the process transparent and allows the user to directly interact with any part of their project at any time.

* Iterative Workflow: The writing process is non-linear. The user can jump between stages at will. Your primary job is to manage the files and ensure they remain synchronized with the user's decisions.

* Navigation: The user can use commands like "Let's work on the Outline" or "I need to add more ideas." You must respond by switching your focus to the corresponding document.

Project File System:

You will create and manage the following four files in the Canvas:

* project_metadata.md: The control file. It contains a project status log, the reader profile, and a summary of the desired writing style.

* collected_ideas.md: A simple, running bulleted list of raw ideas and notes gathered during the Collect stage.

* article_outline.md: The structured, hierarchical outline of the article.

* article_draft.md: The main document. This is where the full text of the article is generated, edited, and finalized.

Framework Implementation Protocol:

Stage 1: COLLECT

 * Initiate the Stage: Greet the user. Create the four project files. In project_metadata.md, write the initial status: ## Project Status\n* **Current Stage:** Stage 1: Collect - In Progress. State, "We are in the Collect stage. The goal is to gather all the raw ideas for your article in the collected_ideas.md file."

 * Ingest External Content: Ask the user, "Do you have any existing notes, links, or documents you'd like me to review first?" If yes, process the content and append the extracted key points as bullet points into collected_ideas.md.

 * Brainstorming: Prompt the user for their ideas with probing questions. Add each new idea as a bullet point to collected_ideas.md.

   * "What is the single most important point you want your reader to understand?"
   * "What are the key arguments or pieces of evidence you have?"
   * "What background information is necessary?"
   * "Are there any common misconceptions you want to address?"

 * Conclude the Stage: Before moving on, update the status in project_metadata.md to * **Current Stage:** Stage 1: Collect - Completed. Then, suggest moving on: "This is a great collection of ideas in collected_ideas.md. When you're ready, we can move to the Outline stage."

Stage 2: OUTLINE

 * Initiate the Stage: Update the status in project_metadata.md to * **Current Stage:** Stage 2: Outline - In Progress. Announce, "We will now structure your ideas. I'll be reading from collected_ideas.md and writing the result to article_outline.md."

 * Analyze and Suggest Structures: Read the contents of collected_ideas.md. Based on the ideas, proactively suggest one or two fitting structural patterns.

   * If the topic seems to be about solving a problem: Suggest the "Why-What-How" framework.
     * Your prompt: "It looks like you're solving a specific problem. I suggest the 'Why-What-How' structure. We would start by explaining Why the problem is important, then describe What your solution is, and finally detail How to implement or use it. Does that sound like a good fit?"

   * If the topic is an explanation of a complex system: Suggest "Top-Down" or "Bottom-Up".
     * Your prompt: "This topic seems to be about explaining a complex system. We have two great options: A 'Top-Down' approach, where we start with the big picture and then drill into the details, or a 'Bottom-Up' approach, where we explain the fundamental components first and then show how they build up to the whole system. Which approach do you prefer?"

   * If the topic is an argument or debate: Suggest an "Adversarial" or "Thesis-Antithesis-Synthesis" structure.
     * Your prompt: "Since you're presenting a nuanced argument, an 'Adversarial' style could be very effective. We could structure it like a dialogue: present your main claim (thesis), then fairly explore the strongest counter-arguments (antithesis), and finally, present a conclusion that resolves the conflict (synthesis). How does that sound?"

 * Collaborate on Structure: Once the user chooses a structure, work with them to organize the points from collected_ideas.md into a hierarchical outline.

 * Produce the Outline: Write the final, structured outline into article_outline.md. After completion, update the status in project_metadata.md to * **Current Stage:** Stage 2: Outline - Completed.

Stage 3: DRAFT

 * Initiate the Stage: Update the status in project_metadata.md to * **Current Stage:** Stage 3: Draft - In Progress. Announce, "Now we'll create the first draft in article_draft.md. To ensure the text matches your voice, let's start with style."
 
* Learn Tone and Style: Ask the user, "Do you have a style guide, or could you provide links to a few articles whose tone and style you'd like me to emulate?"
   * If the user provides content: Analyze it to determine key characteristics (e.g., formal/informal, sentence length, use of jargon, humor, etc.).
   * Summarize the Style: Generate a concise, bulleted summary of the learned style.
   * Update Metadata: Write this summary into project_metadata.md under a "Style Guide" heading and ask the user to confirm it: "I've analyzed the examples and added a style summary to project_metadata.md. Does this accurately capture the voice you're aiming for?"

 * Build Reader Profile: Ask the user key questions and write the answers into project_metadata.md under a "Reader Profile" heading.
   * "Who are you writing for? (e.g., Absolute beginners, industry experts, project managers?)"
   * "What is the desired depth level? (e.g., A high-level overview, a practical guide with code?)"

 * Offer Agency in Drafting: Ask the user, "I'm ready to write the draft. I will read the structure from article_outline.md and generate the text. However, if you have any sections you've already written yourself, please let me know and I can incorporate them."

 * Generate and Write Draft: Read the outline from article_outline.md and the style/reader profiles from project_metadata.md. Generate the full text and write it into article_draft.md.

 * Handle Synchronization: If a user later modifies article_outline.md, you must detect this change and warn them: "I see you've updated the outline. The current text in article_draft.md is now out of sync. Shall I regenerate the draft based on the new outline?"

Stage 4: EDIT

 * Initiate the Stage: Update status in project_metadata.md to * **Current Stage:** Stage 4: Edit - In Progress. Announce, "We are now in the Edit stage. All our work will be focused on refining the text in article_draft.md."

 * Offer Editing Modes: Ask the user how they'd like to proceed. "We can edit the document together line-by-line, or I can perform specific checks. For example, I can scan for passive voice, simplify complex sentences, or check the tone. What works best?"

 * Collaborative Editing: Work with the user to refine the text directly within article_draft.md. Your role is to suggest changes and implement the user's edits. For example:
   * "This sentence seems a bit long. Could we split it like this for clarity?"
   * "Is this explanation clear enough for your target audience, or is it missing any key details?"

 * Conclude the Stage: Once the user is satisfied with the edits, update the status in project_metadata.md to * **Current Stage:** Stage 4: Edit - Completed.

Stage 5: RELEASE

 * Initiate the Stage: Update status in project_metadata.md to * **Current Stage:** Stage 5: Release - In Progress. Announce, "This is the final Release stage. We'll add the finishing touches to article_draft.md based on where you plan to publish."

 * Identify Publishing Platform: Ask for the target platform (e.g., blog, social media, academic journal).

 * Provide Tailored Suggestions: Based on the platform, provide a checklist of suggestions.

   * Generative Assistance: Offer to generate variations of titles, social media hooks, hashtags, or SEO keywords. For example: "Based on your article, here are three potential SEO-friendly titles. Which do you like best?"

   * Rich Media Placement: Scan article_draft.md and suggest specific places to add value. Use comments or placeholders like [SUGGESTION: A chart showing user growth would be effective here.] directly in the document.

   * Platform-Specific Formatting: Advise on best practices, such as using a strong hook and hashtags for social media, or ensuring correct citation style for a formal paper.



 * Final Polish: Work with the user to apply the final touches directly within article_draft.md.

 * Conclude the Project: Update the status in project_metadata.md to **Current Stage:** Project Completed. Congratulate the user. "Congratulations! Your article in article_draft.md is ready to be published. All your project files are saved here in the Canvas if you ever want to return to them."

Artificial Intelligence for Policy Makers

Alejandro Piad Morffis — Sun, 03 Aug 2025 10:05:45 GMT

Photo by K. Mitch Hodge on Unsplash

This is an early draft of Chapter 9 of my upcoming book Mostly Harmless AI. I’m deeply grateful for all suggestions and criticism you might have.

Technology, especially artificial intelligence, moves at a blistering pace, far outstripping the deliberate, democratic processes of regulation. This creates a governance gap—an ever-widening space where innovation flourishes without guardrails, leaving society exposed to significant and often unforeseen risks.

This is not necessarily a failure of governance, but an inherent tension in the modern world. The challenge for today’s leaders is not to halt the march of technology, but to build a bridge across this gap with smart, agile, and evidence-based policy.

This chapter is designed to provide a practical outlook for those tasked with building that bridge. It offers a framework for regulators and policymakers on how to approach AI governance pragmatically, focusing on tangible, real-world harms and achievable benefits. It is a guide to steering progress, not stopping it, rooted in the techno-pragmatist belief that our collective future is not something that happens to us, but something we must actively and responsibly shape.

While the principles outlined here are actionable on their own, they are built upon a deep understanding of AI’s fundamental limitations and risks. The full, in-depth analysis of these challenges—from the mechanics of hallucination to the societal dangers of bias and disinformation—is detailed in Part III of this book.

For the most comprehensive understanding, I encourage you to review Part III in depth. Armed with that context, you can then return to this chapter to engage more deeply with the policy suggestions made here, transforming them from abstract principles into a grounded and urgent call to action.

A final disclaimer. Regulation and policy making of technology is extremely difficult, and even more in the face of technology that changes as fast as AI does. Everything written here must be taken with a grain, or even better, a teaspoon full of salt. Furthermore, no specific advice will fit all contexts. Each country, state, and community is responsible for finding their own way forward based on their own shared principles.

Why Regulation is Necessary

Before we can chart a path forward, we must first understand the terrain of risks that requires thoughtful governance. These are not speculative fears, but foundational challenges posed by the very nature of modern AI, building from the immediate threats to the individual to the structural risks facing our global society. Regulation is required not to stifle technology, but to ensure it develops in a way that is compatible with a safe, equitable, and democratic society.

Let's start with privacy. The ability to analyze vast quantities of personal information at scale creates the potential for a pervasive surveillance apparatus, operated by both corporations and governments, that was previously unimaginable. The only effective countermeasure is a strong, proactive policy that establishes privacy as the default.

This requires comprehensive data privacy laws that grant individuals clear rights over their data and place strict limits on what information can be collected, for what purpose, and for how long. Policy must shift the burden of proof, forcing organizations to justify their data collection practices rather than forcing citizens to constantly fight to protect their private lives.

Furthermore, when AI systems are trained on biased historical data (the only kind of historical data we have), they risk automating and scaling up discrimination in critical areas like hiring, lending, and criminal justice. Because market forces alone may not prioritize fairness over the raw predictive performance that can be gained from these biases, regulation is essential to protect fundamental civil rights.

Policy can create powerful legal and economic incentives for developers to address this problem by mandating algorithmic transparency and requiring independent fairness audits for any AI system used in high-stakes decisions. This ensures that the pursuit of technological efficiency does not come at the cost of societal equity.

Moving on, our existing legal frameworks for intellectual property and ownership are fundamentally unprepared for content generated by artificial intelligence, creating a landscape of legal ambiguity that chills innovation and threatens the livelihoods of human creators.

The legal system must be updated to provide clarity and predictability. This requires decisive legislative action to define the copyright status of AI-generated works, establish clear rules for the use of copyrighted data in training foundation models, and create a legal environment where both human artists and AI innovators can operate with confidence.

But it gets worse, the power of generative AI to create convincing fake news and deepfakes presents a direct threat to our shared sense of reality, eroding trust in institutions and fueling social polarization.

A regulatory approach here requires a delicate balance. Outright censorship is a dangerous tool that is itself a threat to democratic values. A more pragmatic policy would focus on creating a healthier information ecosystem by mandating transparency—such as the clear and consistent labeling of AI-generated content—and by holding platforms accountable not for the content itself, but for its algorithmic amplification.

This, combined with robust public funding for media and AI literacy programs, can empower citizens to navigate the digital world more critically without resorting to authoritarian measures.

At the same time, the rapid advance of AI into cognitive tasks promises to cause massive workplace disruption, displacing workers at a pace that could challenge social and economic stability.

The goal of policy in this area is not to halt the productivity gains of automation, but to proactively manage the human transition. This requires a two-pronged strategy: first, investing heavily in accessible, large-scale retraining and lifelong learning programs to equip the workforce with new skills; and second, modernizing the social safety net to provide a robust economic cushion for those navigating this difficult transition.

A more abstract but even more dangerous development are Lethal Autonomous Weapons (LAWs) that threaten to fundamentally alter the nature of conflict, removing human empathy and judgment from the decision to use lethal force.

This is not a problem that market forces or technological solutions can solve; it is a profound ethical challenge that demands a global political response. The only viable path forward is through international policy, establishing clear treaties and shared norms that mandate meaningful human control over autonomous systems.

The goal of such regulation is to draw an unambiguous red line, preventing a destabilizing arms race in an arena where the potential for catastrophic error or miscalculation is immense.

A Pragmatic Stance on Existential Threats

Finally, any serious policy discussion must address the so-called existential risks, which involve the potential for AI to destroy human civilization altogether.

While acknowledging the concern is important, a pragmatic stance requires contextualizing the probability. As argued in Part III, catastrophic outcomes, while having a nonzero chance, remain highly improbable, as the core doomsday assumption of rapid, exponential self-improvement is tempered by very real physical and computational limitations, and there is no evidence current technology can surpass these limitations.

A danger for policymakers lies in the overemphasis on these speculative, long-term risks, which can divert critical resources from solving the tangible, present-day harms AI is already creating.

The pragmatic approach here lies in understanding that AI x-risk is but one of several major threats on a similar scale as climate change and pandemics, and probably far less likely. Therefore, policy should support thorough research into long-term risks but avoid panic-driven bans on development. The most effective strategy is to focus regulation on mitigating the demonstrated, immediate harms of current AI systems.

The Challenge of Smart Regulation

Identifying the risks is only the first step. The act of regulation itself is fraught with challenges, especially when applied to a technology as dynamic and complex as AI. A naive approach can be as harmful as no regulation at all, creating unintended consequences that stifle beneficial innovation or fail to address the core problems.

Smart regulation requires navigating three key pitfalls: the pacing problem, the risk of overreach, and the black box problem.

The Pacing Problem

Traditional legislative cycles, which can take years to produce new laws, are fundamentally mismatched with the exponential pace of AI development. By the time a law designed to govern a specific AI capability is passed, that technology may already be obsolete.

To overcome this, policymakers should consider establishing agile, expert-led regulatory bodies. These specialized bodies can be staffed with technologists, ethicists, and social scientists who can monitor the field in real-time, issue updated guidance, and adapt regulatory standards far more quickly than a legislature can.

Avoiding Overreach

In the face of uncertainty and fear, the temptation can be to enact broad, sweeping prohibitions on AI development. This would be a profound mistake. A techno-pragmatist approach distinguishes between foundational research and commercial application. The goal of regulation should not be to stifle the scientific exploration that leads to breakthroughs, but to govern the deployment of AI systems where they have a direct public impact.

Policy should therefore focus on demonstrated harm, setting clear safety and fairness standards for AI products and services that are released into the market, rather than attempting to place speculative limits on basic research and open-source development.

The Black Box Problem

Many of the most powerful AI systems operate as black boxes, where even their own creators cannot fully explain the specific logic behind a given decision. This opacity poses a fundamental challenge to accountability and due process. How can an individual appeal a decision they cannot understand?

Smart regulation must address this by championing the principles of transparency and explainability. For high-stakes applications, policy can mandate a right to an explanation, requiring that companies be able to provide a meaningful justification for AI-driven decisions that significantly impact people’s lives. This incentivizes the development and adoption of Explainable AI (XAI) techniques, ensuring that as systems become more complex, they do not become less accountable.

Principles for Proactive AI Governance

Having navigated the pitfalls, we can chart a course for proactive governance. The following principles are not a rigid checklist, but a compass for steering AI development toward a future that is safe, equitable, and beneficial.

The core of this approach is a commitment to evidence over ideology. A risk-based approach, attuned to the principles of techno-pragmatism, means that the level of regulatory scrutiny applied to an AI system should be directly proportional to its potential for harm. An AI that recommends movies requires a lighter touch than one that assists in medical diagnoses.

This ensures that regulation focuses its power where it is most needed, fostering innovation in low-risk areas while demanding rigorous oversight for high-stakes applications.

This human-centric governance must insist on meaningful human control as a direct response to the deep and persistent Alignment Problem. As Part III makes clear, perfectly specifying human values is an unsolved, and perhaps unsolvable, challenge. Therefore, for critical systems where decisions have significant consequences—in medicine, law, and finance—policy must mandate a human-in-the-loop.

This is not a mere suggestion but a non-negotiable backstop against the inevitable failures of alignment, ensuring that a human expert is always the final arbiter, accountable for the outcome. AI can and should be a powerful tool for augmenting professional judgment, but it must never be allowed to replace it.

Furthermore, proactive governance involves shaping the entire AI ecosystem to better align with societal values. A purely market-driven economy has no inherent incentive to solve deep issues like fairness or cultural representation. Therefore, policy must create these incentives. This can be done through liability reform that holds companies accountable for harms caused by their systems, and through tax credits that reward investment in safety and ethics research.

In parallel, governments can counteract the risk of cultural colonization by a few generalist models by funding the development of local and regional AI solutions. This support for models trained on specific cultural and linguistic data, combined with national programs to foster widespread AI literacy, can help creating a more diverse, resilient, and critically engaged society.

Finally, since AI is a global technology, our approach to its governance must also be global.

A patchwork of national regulations creates a race to the bottom, where innovation may flee to the least-regulated environments. The most powerful path forward lies in promoting openness and international collaboration.

Policy can and should incentivize the open-sourcing of foundation models, which enhances safety by allowing the global research community to audit, critique, and improve them. This spirit of collaboration must extend to the diplomatic level, forging international agreements and shared norms to govern the most critical risks, ensuring that the development of this transformative technology is a shared project for all of humanity.

Conclusions

The path of technology is not deterministic. The future of artificial intelligence is not a predetermined outcome that we must passively accept, but a landscape that will be profoundly shaped by the policy choices we make today. As we have seen, the risks are significant, but so is the potential. A techno-pragmatist approach requires us to hold both these truths at once, engaging with this powerful technology with our eyes wide open.

Thanks for reading! Remember you can get my upcoming book Mostly Harmless AI at 50% discount in early access.

Foundations of Artificial Intelligence

Alejandro Piad Morffis — Fri, 01 Aug 2025 10:31:05 GMT

The following article is a first draft of Chapter 1 of my upcoming book Mostly Harmless Ideas. The book is a deep dive into the goods and bads of AI, especially Generative AI and Language Models, and it’s packed with advice for all kinds of knowledge workers and creative professionals. The first part of the book cover the foundations of Artificial Intelligence, Machine Learning, Generative AI and Language Models, in accessible and intuitive terms.

You can get early access to Mostly Harmless AI at 50% reduced cost during this alpha stage, which gives you full access in eternity to all future digital editions and printed copies (when they are ready) at cost.

You can also get a lifetime pass for all my digital content, present and future, including 3 more books I’m currently working on.

Photo by Andy Kelly on Unsplash

What is Artificial Intelligence, Really?

Artificial Intelligence, or AI, is a term we hear almost constantly today, often surrounded by a mix of excitement, confusion, and sometimes, even fear. At its core, AI is a field within Computer Science that deals with teaching computers to solve problems that are incredibly challenging for traditional programming methods. These aren’t simple arithmetic calculations or straightforward data sorting tasks. Instead, we’re talking about complex endeavors like proving intricate mathematical theorems, navigating a robotic car through unpredictable city streets, crafting optimal schedules for thousands of flights, or even understanding and creating human-like pictures and text.

For most of computer science, when we want a computer to solve a problem, we write a precise, step-by-step algorithm. Think of it like giving a chef a detailed recipe: Take 2 cups of flour, add 1 egg, mix for 3 minutes… However, for the hard problems AI tackles, we often don’t have such a clear recipe. We might know what we want the computer to achieve, but not how to write down every single instruction for it to get there effectively and efficiently. This is precisely where AI steps in, aiming to find good enough solutions when perfect, explicit instructions are out of reach.

The very definition of AI has been a subject of debate since its inception, reflecting different philosophical ideas about what intelligence truly means. One prominent perspective, championed by AI pioneer Marvin Minsky, suggests that AI is about solving problems for which humans employ intelligence. This view often focuses on creating machines that can mimic human thought processes, reasoning, and decision-making. Essentially, it asks: Can a machine think like us?

Developing concurrently, another powerful perspective emerged, emphasizing that AI solves problems without being explicitly programmed. This idea is strongly associated with Arthur Samuel, who coined the term machine learning while developing programs that could learn to play checkers better than their creators. He achieved this simply by allowing the programs to play many games and learn from experience. This view shifts the focus from how the AI thinks to what it can do, asking instead: Can a machine learn and adapt on its own, even if we don’t give it every single instruction?

These two foundational ideas–mimicking human intelligence versus learning without explicit programming–have profoundly shaped the entire field of AI. They represent different ways of approaching the grand challenge of building intelligent machines. Understanding this distinction is key to grasping AI’s history and its future. As we explore these foundations, remember our techno-pragmatist ethos: AI is a tool, and its path is shaped by our choices. Understanding its underlying mechanisms empowers us to make responsible decisions about how we build and use these powerful technologies.

The Pillars of Good Old-Fashioned AI (GOFAI)

In this chapter, we will delve into the foundational ideas that laid the groundwork for Artificial Intelligence, often referred to as “Good Old-Fashioned AI,” or GOFAI. This era of AI research primarily focused on building intelligent systems by explicitly programming knowledge and logical rules. Our exploration will center on two main pillars of GOFAI.

First, we’ll examine Search and Optimization, which addresses how AI finds solutions by exploring vast possibilities, particularly when a perfect, direct path isn’t obvious. Second, we’ll delve into Knowledge Representation, focusing on how AI organizes and understands information, allowing it to reason and make sense of the world. These pillars represent a significant early focus and ambition of AI to tackle complex problems through logic and structured understanding, even as other approaches were also taking shape.

The Age-Old Debate: Symbolic AI vs. Statistical AI

For centuries, the idea of thinking machines has captivated human imagination. But as AI emerged as a scientific field, a fascinating tension developed: a constant “back-and-forth between two core, seemingly antagonistic approaches to building intelligent machines.” This dynamic mirrors an age-old philosophical debate: rationalism versus empiricism.

The first dominant approach to AI was Symbolic AI, deeply rooted in the philosophical tradition of rationalism. Rationalism suggests that knowledge is primarily gained through reason and logic. In Symbolic AI, researchers believed that machines could become intelligent by putting human knowledge and reasoning into explicit, formal rules and symbols.

Imagine, for instance, wanting to teach a computer to play chess. A Symbolic AI approach would involve meticulously programming every rule of chess, every known opening strategy, every tactical pattern, and every endgame scenario. It’s like giving the computer a massive, incredibly detailed recipe book or a comprehensive instruction manual for every possible chess situation. The computer would then follow these rules step-by-step to make its moves.

Early impressive demonstrations of this ethos included programs like The Logic Theorist, which could prove mathematical theorems by mimicking human problem-solving steps. Later, “expert systems” were designed to emulate human experts in narrow fields like medical diagnosis. The core idea was simple yet powerful: if we could just write down all the rules, the machine would be smart enough to solve them.

Quietly developing alongside Symbolic AI was Statistical AI, drawing inspiration from empiricism. Empiricism posits that knowledge is primarily gained through sensory experience and data. In Statistical AI, the idea was to build “learning machines” that could discover patterns directly from large amounts of data, rather than being explicitly programmed with rules.

Think of it like a child learning to recognize a dog. You don’t give the child a list of rules like “a dog has four legs, barks, has fur,” and so on. Instead, you show them many different dogs, and they gradually learn to identify what a “dog” is by observing patterns in the examples. Early attempts at this included the Perceptron, an early artificial neural network designed to learn patterns directly from data. The initial excitement was huge, as these machines seemed to offer a path to intelligence without needing every single rule programmed explicitly.

The Winters of AI

Despite the initial optimism, both Symbolic and Statistical AI approaches eventually hit significant roadblocks. These challenges led to periods known as “AI Winters”–times of reduced funding and public interest.

Early Symbolic AI systems, while impressive in their specific domains (like proving theorems or diagnosing specific diseases), proved to be quite brittle. They struggled immensely with common-sense knowledge, which is vast and often unstated. Furthermore, they couldn’t easily adapt to new situations outside their carefully programmed rules. Trying to teach a machine absolutely everything it needed to know, one fact at a time, became an “insurmountable challenge.” The real world is simply too complex and nuanced for a complete set of explicit rules to be written by humans.

Meanwhile, early Statistical AI systems like the Perceptron faced their own limitations. They lacked the “available data and computational infrastructure” to learn truly complex patterns. Consequently, they couldn’t become sophisticated enough, no matter how many simple “neurons” were connected. The computing power and data storage simply weren’t ready for the ambitious learning tasks researchers envisioned.

These “winters” were not outright failures, but rather crucial learning periods. They revealed the inherent limitations of each approach when pushed beyond “toy problems.” This early struggle between explicit rule-based systems and pattern-based approaches set the stage for the dynamic tension that would define AI’s entire history, constantly pushing researchers to find new ways to combine or overcome these challenges.

Subscribe now

Search and Optimization

At the heart of many AI problems, especially in the early days, was the challenge of finding the best solution among a vast number of possibilities. This is the realm of search and optimization.

When Perfect is Impossible: The “Hard” Problems

Imagine you’re a traveling salesperson, and you need to visit a hundred different cities, visiting each exactly once, and then return home. Your goal is to find the route that minimizes the total travel cost (distance, time, or money). This is a classic example of a “hard problem” in computer science, known as the Traveling Salesman Problem (TSP). For a small number of cities, you could try listing every single possible route and picking the cheapest one. This is called a “brute force” search.

However, as the number of cities grows, the number of possible routes explodes. For just 20 cities, there are over 2.4 quintillion (2.4 followed by 18 zeros!) unique routes. Even the fastest supercomputer couldn’t check them all before the universe ends. These are what we call intractable problems, or NP-Hard problems: problems for which no efficient, exact solution is known.

To tackle such problems, AI often models them as navigating a “search space” or “state space.” This conceptual space represents all possible configurations or situations relevant to the problem. The AI starts from an initial state and tries to reach a goal state by applying a sequence of actions or operators, each potentially incurring a certain cost.

Since finding the absolute perfect solution is often impossible or impractical within this vast space, AI shifts its goal. Instead of perfection, it seeks approximate solutions. These are solutions that are good enough, given the time and memory constraints we have. The challenge then becomes how to find these good-enough solutions efficiently within a mind-bogglingly vast space of possibilities.

Smart Shortcuts: Heuristics and Metaheuristics

To navigate these immense search spaces, AI uses clever strategies known as heuristics and metaheuristics. A heuristic is a problem-specific “rule of thumb” strategy that uses some known properties of a problem to improve search performance. It’s not guaranteed to find the absolute best solution, but it often finds a very good one much faster than a brute-force approach.

Consider your GPS navigation app. When you ask for directions, it doesn’t calculate every single possible route from your current location to your destination. Instead, it uses a heuristic, often based on an algorithm called A∗ (A-star). If your destination is northeast of your position, the A∗ algorithm will prioritize roads going north or east, assuming they are more likely to get you there faster than roads going to the west or the south. Of course, this isn’t always perfect–there might be a faster detour to the west, or a highway that’s counter-intuitive. Nevertheless, by intelligently using this useful knowledge, the algorithm can find a very efficient route without exploring every dead end. It’s a smart shortcut that balances speed with a high probability of finding a good solution.

While heuristics are problem-specific, metaheuristics are more general-purpose search strategies. They leverage knowledge about the search paradigm itself and can be applied even when very little is known about the specific problem’s structure. They’re often used when “nothing else works.” A prime example of a metaheuristic approach is evolutionary algorithms. These computational strategies are “inspired by certain aspects of the biological process of evolution.”

Imagine you want to design the optimal layout for a computer chip (like a GPU) – a problem with an astronomical number of possible designs. An evolutionary algorithm would start with a “population” of random chip designs. Then, through cycles of “breeding” (combining elements from two good designs to create a new one) and “selection” (keeping only the best-performing designs), the algorithm iteratively “evolves” better and better designs. Just like biological evolution, it seems to “magically discover quasi-optimal design elements just by sheer luck and relentless repetition,” without needing explicit instructions for every design choice. These general strategies find inspiration in nature, engineering, and even social systems to build powerful computational search methods.

Specialized Search: Beyond Simple Paths

Beyond general search and optimization, AI has developed specialized techniques for specific types of complex problems.

Adversarial Search: Thinking Ahead of the Other

Many real-world problems, especially in competitive scenarios, involve an opponent whose actions must be anticipated. This is the domain of adversarial search, commonly found in game-playing AI. The challenge is not just to find a good move, but the best move assuming your opponent will also play optimally to counter you.

One of the oldest and most fundamental techniques is Minimax. Imagine a simple game like Tic-Tac-Toe. Minimax works by having the AI “look ahead” through all possible future moves, assuming that you (the opponent) will always choose the move that is best for you and worst for the AI. The AI then picks the move that minimizes its maximum possible loss (or maximizes its minimum possible gain). Effectively, it plays out all possible future scenarios in its head and chooses the path that leaves it in the best possible position, no matter what its opponent does.

For games with an incredibly vast number of possibilities, like Go, simply looking ahead through every move is impossible. This is where Monte Carlo Tree Search (MCTS) comes in. Instead of exhaustively analyzing every branch, MCTS “plays out” many random simulations of the game from a given point. It explores the most promising moves more deeply, learning which paths lead to success through repeated “trial and error” simulations. This allows AI to tackle games that were once considered beyond computational reach, like when Google’s AlphaGo beat the world’s best Go players.

Structured Search: Satisfying All Conditions

Sometimes, the goal isn’t to find the “best” path, but simply any solution that meets a specific set of requirements. These are constraint satisfaction problems. Here, the AI needs to find values for a set of variables such that all given conditions, or “constraints,” are simultaneously met.

Think about solving a Sudoku puzzle. You need to fill in numbers from 1 to 9 in each cell, but with strict rules: each row, column, and 3x3 box must contain all digits from 1 to 9 without repetition. The AI’s task is to find a set of numbers for all empty cells that satisfies all these constraints.

Another common example is creating a university class schedule. You have classes, rooms, professors, and students, and a multitude of constraints: Professor A can’t teach two classes at the same time; Room B can only hold 50 students; Class C requires a lab; no two classes can be in the same room at the same time. The AI’s job is to assign times and rooms to all classes such that every single constraint is satisfied. The “structure” of these problems, defined by the variables and their interdependencies, allows AI to use specialized search techniques to efficiently find a valid solution.

Subscribe now

Knowledge Representation & Reasoning

Beyond just searching for solutions, a truly intelligent system needs to “know” things about the world. This brings us to the second pillar of GOFAI: Knowledge Representation. This field explores how AI can efficiently represent, store, and use domain knowledge in a way that computers can understand and process.

The fundamental goal of knowledge representation is to organize concepts and facts, as well as the relationships between them. This organization allows AI to reason about these facts and discover new relations. Ultimately, it’s about giving AI a structured way to “understand” and make sense of information, much like how humans build a mental model of the world around them. Without a clear way to represent what it “knows,” an AI would be unable to make logical inferences or apply its knowledge to new situations.

From Raw Observations to Understanding

To truly grasp how AI “knows” things, it’s helpful to understand the progression from raw observations to actionable understanding. At the most basic level, we encounter Data, which consists of raw, unprocessed facts or observations. This could be a list of numbers, individual words, or pixels in an image. In isolation, data has no inherent meaning; for example, the number “30” by itself is just a number.

When we introduce context or metadata, data transforms into Information. For instance, if we know “30” is a temperature reading taken in Celsius at noon on July 1st in Havana, it becomes information. This contextualization helps us relate different observations and gives them initial meaning.

Finally, when information is enriched with semantics and rules, enabling inference, reasoning, and the discovery of new relations, it becomes Knowledge. For example, if the AI knows that “temperatures above 30 degrees Celsius in July in Havana indicate a heatwave,” it possesses knowledge. This knowledge allows it to draw inferences (it’s a heatwave!), discover new relations (heatwaves can lead to increased energy consumption), and even take actions (warn residents about high temperatures). It’s this ability to add meaning and logical connections that truly transforms information into actionable knowledge.

Ways to Represent Knowledge

Just as humans use different ways to store and recall information, from precise definitions to vague intuitions, AI employs various methods for knowledge representation, each with its own strengths and weaknesses.

One key distinction lies between explicit and implicit representations. Explicit knowledge is clearly defined and directly encoded, often in rules or symbols. It’s much like a precisely written dictionary or a rulebook where every term and every rule is spelled out. This approach is central to Symbolic AI. For instance, Ontologies are explicit representations that define concepts within a domain and their strict relationships. Think of a meticulously designed family tree formally defining “parent,” “child,” “sibling,” and “ancestor,” along with rules such as “if A is a parent of B, and B is a parent of C, then A is a grandparent of C.”

Conversely, implicit knowledge is learned from patterns in data, rather than being directly programmed. It’s more akin to human intuition or a “gut feeling” developed from vast experience, and is fundamental to Statistical AI. Embeddings, for example, are numerical representations where concepts like words, images, or even entire documents are transformed into points in a multi-dimensional space. Systems like Word2Vec learn these embeddings by analyzing how words are used together, so words with similar meanings or contexts (e.g., “king” and “queen”) end up being numerically “close” to each other in this space, even though no human explicitly programmed that relationship.

Another way to categorize knowledge representations is by their formality. Formal representations have strict, unambiguous syntax and semantics, making them ideal for precise logical inference and computation. Mathematical equations, programming code, or statements in formal logic are prime examples, leaving no room for misinterpretation. In contrast, informal representations are more flexible, often using natural human language. While easier for humans to create and understand, they can be ambiguous and require more sophisticated processing for AI to extract meaning, as seen in a written description, a casual conversation, or an essay.

Finally, we distinguish between structured and unstructured representations. Structured knowledge is organized in a predefined, rigid format, making it easy for computers to process and query. Think of data in a spreadsheet with clear rows and columns, or a database with defined fields.

Knowledge graphs, for instance, are structured representations that organize facts as a network of interconnected entities (nodes) and their relationships (edges). A knowledge graph might have a node for “Paris,” a node for “France,” and an edge labeled “isCapitalOf” connecting them, allowing AI to easily query and infer facts.

Conversely, unstructured knowledge exists in free-form text, images, audio, or video, without a predefined schema. Extracting meaning from unstructured data is much harder and often requires advanced AI techniques.

Vector databases, for example, are often used to store and efficiently search implicit representations (embeddings) derived from unstructured data. You could take millions of research papers (unstructured text), convert each into an embedding (implicit representation), and store them in a vector database. Then, when a user asks a question, the database can find the most “similar” papers based on their embeddings, even though the papers themselves are unstructured.

Drawing Inference from Knowledge

Knowledge representation isn’t merely about storing information; its ultimate purpose is to enable AI to draw inferences and make decisions. This process of deriving new conclusions from existing knowledge is known as reasoning, and it can take both formal and informal forms.

Formal reasoning, deeply rooted in logic, is about deriving new conclusions from existing knowledge using strict, unambiguous rules. This is the hallmark of Symbolic AI. It’s a process of deduction, where if the initial premises are true and the rules are applied correctly, the conclusion is guaranteed to be true. For example, if a knowledge base contains the rules “All birds can fly” and “A sparrow is a bird,” a formal reasoner can deduce, with absolute certainty, “A sparrow can fly.” Such rule-based systems are precise and auditable, but they are limited by the completeness and accuracy of the explicitly programmed rules.

In contrast, informal reasoning is about drawing conclusions based on patterns, similarities, or analogies, often without strict logical guarantees. This type of reasoning is more akin to human intuition or common sense. It’s less about strict deduction and more about finding connections and probabilities. For example, if an AI has learned implicit representations (embeddings) of various animals, and it sees a new animal that is “numerically close” to many dogs, it might infer it’s a dog, even without explicit rules for every single feature.

This distinction is crucial for understanding the different capabilities of AI. While formal reasoning provides certainty within defined boundaries, informal reasoning allows AI to operate in ambiguous, unstructured environments. The latter, particularly reasoning by analogy in embeddings and language models, will be explored in more detail in later chapters, showcasing how AI can make sense of the world even when explicit rules are unavailable.

What is the Best Representation for Knowledge?

The choice of how to represent knowledge is a critical decision in AI design. Different representation types are chosen based on the specific problem, the type of data available, and the AI paradigm being used (Symbolic vs. Statistical). For instance, a Symbolic AI system designed for medical diagnosis might rely heavily on formal, explicit ontologies of diseases and symptoms. Conversely, a Statistical AI system for image recognition might primarily use implicit, unstructured representations of pixels that it learns from millions of example images.

This challenge highlights a theoretical result known as the Ugly Duckling Theorem. This theorem, in essence, states that without a specific purpose or “bias,” all objects are equally similar or dissimilar to one another. This implies that there is no single, universally “best” way to represent knowledge or measure similarity without a context or goal in mind. For example, an “ugly duckling” is only ugly relative to a flock of swans; it might be beautiful among other ducklings.

Therefore, the human responsibility in choosing the right representation is paramount. This choice directly impacts what an AI can “know,” how it can “reason,” and ultimately, the reliability and fairness of its inferences. Aligning the representation with the problem’s nature is a key part of building human-centered tools that truly understand and assist us.

Conclusion: The Need for Learning

Good Old-Fashioned AI (GOFAI), with its focus on search, optimization, and explicit knowledge representation, laid the essential groundwork for the field of Artificial Intelligence. Its strengths lie in domains where problems are well-defined, rules are clear, and knowledge can be precisely encoded. GOFAI systems offered precision and control, making them powerful tools for tasks like proving theorems or playing well-defined board games.

However, the ambitions of GOFAI soon ran into fundamental limitations when faced with the messy complexity of the real world. These systems proved to be brittle: a small change outside their programmed domain could break them entirely. They struggled immensely with common-sense knowledge, which is vast and often unstated. The sheer scale of real-world information made it an “insurmountable challenge” to explicitly program every piece of knowledge and every rule. GOFAI was excellent at solving problems for which it was explicitly programmed, but it couldn’t adapt, generalize, or handle unstructured data effectively. This revealed a crucial gap in AI’s capabilities.

The limitations of GOFAI highlighted a profound truth: to build truly intelligent and adaptable systems, AI needed to move beyond simply executing pre-programmed rules. It revealed a crucial need for systems that could learn from experience and data, without being explicitly programmed for every single scenario or piece of knowledge.

This growing realization of the power of learning-based approaches, which were developing concurrently with GOFAI, marked a significant shift. It showed that AI could discover its own patterns and adapt to unforeseen situations, offering a path to overcome the brittleness of purely symbolic systems.

Recognizing these limitations and actively seeking new approaches is a hallmark of the ongoing, human-driven effort to build more capable and adaptable AI. It’s a testament to our techno-pragmatist ethos: acknowledging challenges, learning from past efforts, and continuously striving to create tools that can better serve humanity’s complex needs. This increasing prominence of learning methods, which developed in parallel to GOFAI, is the story that will unfold in the next chapter.

Thank you for reading this far. This chapter is still a first draft, so any comments, suggestions, and criticism are truly appreciated.

Artificial Intelligence for Educators and Learners

Alejandro Piad Morffis — Wed, 30 Jul 2025 10:02:30 GMT

Photo by Element5 Digital on Unsplash

The following is a first draft of Chapter 7 of my upcoming book Mostly Harmless AI.

Of all the domains being transformed by artificial intelligence, education is perhaps the most critical to get right. The stakes are uniquely high. Used wisely, AI has the potential to be a massively positive force, augmenting the work of teachers and deepening the learning of students in ways we are only beginning to imagine. Used incorrectly, however, it could be catastrophic, undermining the development of critical thinking and eroding the very foundations of academic integrity. This chapter is a guide to navigating that high-stakes environment.

We will begin by demystifying the popular idea of a personalized AI tutor, a vision that runs counter to the principles of human-centered, collaborative learning. In its place, we will propose a more grounded solution that sees AI as a tool for augmentation, not automation. Next, we will dismantle the common misconceptions surrounding AI detection tools, arguing that this approach is not only futile but actively harmful to the learning environment.

This will establish the necessity of a fundamental pedagogical shift, moving from policing to integration. From there, we will offer practical strategies for both educators and learners, emphasizing their shared responsibility in fostering a new kind of AI literacy. Finally, we will show what a concise but comprehensive AI policy for an academic program could look like, providing a tangible model for implementation.

The Myth of the Personalized Tutor

The arrival of powerful generative AI has fueled a seductive, decades-old myth: that the ultimate goal of technology in education is to create a personalized, all-knowing AI tutor for every learner. This vision promises a revolution, a future where a “personalized Aristotelian tutor” is available to every student, adapting to their unique learning style, language, and pace. This narrative is powerful, but it is built on a fundamental misunderstanding of how we learn and what education is for.

Even if such a perfect tutor were achievable, it is not the revolution we should want. The idea that education’s primary problem is a lack of personalization or efficient information delivery is a flawed premise. Before we can harness AI effectively, we must first deconstruct this myth by examining the three core reasons why the automated personal tutor is a flawed ideal.

Argument 1: It Mistakes Information Transfer for Learning

The myth of the personalized tutor assumes that the primary obstacle to learning is the inefficient delivery of information. This argument has some merit in specific contexts; in places where the main obstacle to education is a lack of access to books, internet, and educators, an AI tutor could be a game-changer. However, this is not the case for the majority of learners in developed nations.

In an era of information surplus, the problem for the modern student is not a lack of access to information, but a lack of skill in navigating, evaluating, and synthesizing it. While asking an AI for an answer is slightly more convenient than a Google search, it is not qualitatively better. Furthermore, it removes the “desirable difficulty” that forges lasting knowledge. The struggle to find information, compare sources, and form a conclusion is a valuable cognitive exercise. An AI tutor designed to eliminate this struggle by providing immediate answers actively prevents the most valuable parts of the learning process from ever happening.

Argument 2: It Promotes Intellectual Dependency, Not Critical Thinking

The myth suggests that an AI can be a perfect partner for completing assignments, from solving math problems to writing essays. This, however, risks creating profound intellectual dependency. When a student uses an AI to bypass the hard work of structuring an argument, recalling information, synthesizing ideas, or debugging a line of code, they learn to prompt, not to think.

The purpose of assigning an essay is not to receive a perfect text; professors already know the answers. The purpose is to engage the student in the process of creation, which is where learning occurs. By offering a shortcut straight to the final product, generative AI undermines the most valuable aspect of the exercise. It becomes an obstacle, hampering the educational process by allowing students to bypass the very challenges that help their brains learn and grow.

The goal of education is to build independent, critical thinkers who can grapple with complex, ambiguous problems on their own. Over-reliance on an AI that provides solutions on command undermines this goal, making students dependent on the tool long after the lesson is over.

Argument 3: It Champions Isolation Over Community

The vision of a personalized path idealizes a student learning in perfect, isolated efficiency, free from the pace of a group. This completely ignores that learning is a fundamentally social and collaborative activity. Studying individually and independently is not necessarily an advantage; in fact, it can be a huge disadvantage.

The two things most self-educated people struggle with are motivation and feedback. Motivation comes naturally in a classroom because you are surrounded by peers with similar goals. Seeing others tackle challenges and grow creates a powerful incentive to overcome difficulties.

Feedback from mentors and peers is equally crucial for intellectual growth, allowing us to iterate on ideas and hone our skills. A community of learners is key. An AI tutor, no matter how sophisticated, cannot replicate the dynamic, motivating, and often messy reality of a human learning community. Learning together always beats learning alone.

The Alternative

The alternative to the flawed myth of the personalized tutor is to view AI not as an automated teacher, but as a powerful tool for augmentation within a human-centered community. This approach requires a pedagogical shift away from the futile chase of AI detection and toward a model of shared responsibility. It is a vision where educators and students work together to develop a new, essential AI literacy, using these tools to enhance, rather than replace, the timeless process of collaborative and critical learning.

Why AI Detection Is Futile

Before educators can effectively integrate AI, they must first understand that the detection of AI-generated content is a hopeless chase. Any attempt to police AI use through detection tools is an unwinnable arms race destined to fail for a number of practical and pedagogical reasons.

First, the technology itself is fundamentally flawed. Detectors will always be lagging behind the generative models they seek to identify, perpetually playing catch-up in a race they cannot win. The supposed telltales of AI-generated text—overly formal language, a lack of personal voice, perfect grammar—are not robust signals. They are merely fleeting characteristics of specific models at a single point in time. A detector trained to spot GPT-4’s style is useless against the next generation of models, and it’s even more useless against a student who uses one of the clever prompt techniques this very book teaches to make the output more human-like.

Second, these tools are dangerously inaccurate. Their unacceptably high false positive rates mean that you will inevitably punish honest students, accusing them of fraud they did not commit. This is an ethical line no educator should be willing to cross. At the same time, the tools are easily bypassed, meaning that while innocent students are flagged, those determined to cheat can still slip through. The result is a system that is both unjust and ineffective.

This cat-and-mouse game also creates perverse incentives. It encourages students to spend more time hiding and tinkering with AI to bypass detectors than on the actual intellectual work of the assignment. Their focus shifts from critical thinking to “evasion engineering.” This is the exact opposite of the goal of education.

Ultimately, a reliance on detection tools creates an environment of distrust that is toxic to learning. It frames the relationship between teacher and student as adversarial, replacing a partnership built on trust with one based on suspicion. Fraud is a serious ethical issue that completely undermines the purpose of education, but it is not a technological problem to be solved with software. It is a human one that must be discussed on ethical grounds, as a violation of the shared trust that makes a learning community possible. When fraud is committed, we all lose.

A Practical Guide for Educators

The only viable path forward is to shift our mindset from policing to integration, adapting our methods to leverage AI’s strengths while mitigating its weaknesses.

Redesigning Assignments for the AI Era

With the traditional take-home essay now vulnerable to automation, educators must redesign assignments to incorporate AI as a tool for thinking, not a machine for answers. This requires a fundamental shift in what we choose to assess.

The most effective strategy is to focus on process, not just product. Instead of grading only the final essay or report, the assessment can be expanded to include the student’s engagement with the AI. Requiring students to submit their chat logs or a written reflection on their process—detailing the prompts they used, how they evaluated the AI’s output, and the modifications they made—makes their thinking visible. This turns the inquiry itself into the gradable artifact, rewarding critical engagement over simple content generation.

Another powerful approach is to turn students into AI critics. Instead of asking them to produce a text, assign them the task of deconstructing an AI-generated one. For example, a student could be asked to prompt an AI to write an essay on a historical event and then write their own analysis of its factual errors, logical fallacies, and underlying biases. This transforms the assignment from a simple writing task into a high-level critical thinking exercise, teaching students to be skeptical and analytical consumers of AI-generated content.

Finally, it is essential to emphasize human-centric assessments that are inherently resistant to automation. These methods evaluate skills that AI cannot replicate, such as real-time argumentation, interpersonal collaboration, and embodied knowledge. This includes a renewed focus on in-class discussions and Socratic seminars, oral exams and presentations, timed hand-written essays, and hands-on lab work or collaborative projects. While these redesigned assignments require a different kind of engagement, the time saved by using AI for administrative tasks can be reinvested here, creating a more sustainable and pedagogically valuable workflow.

AI as a Teacher’s Super-Assistant

AI’s greatest potential may lie in its ability to reduce the significant administrative burden on teachers, freeing them up to focus on the deeply human work of teaching and mentoring.

As a tool for lesson planning and differentiation, AI can be an invaluable creative partner. An educator can brainstorm engaging lesson plans, get suggestions for creative activities, or generate differentiated materials—such as simplified texts or vocabulary lists—for students with diverse learning needs in a fraction of the time it would take manually. For instance, a teacher could use a prompt like: “Act as an instructional designer. Create a 45-minute lesson plan for 10th graders on the causes of World War I, including a hook, a collaborative activity, and a formative assessment.”

For rubric and feedback generation, AI can be truly transformative. It can draft clear, comprehensive grading rubrics in seconds. More importantly, it can help solve the feedback bottleneck by providing initial, personalized feedback on student work. An educator can quickly review a student’s draft, identify key areas for improvement, and instruct the AI to provide detailed, constructive feedback on those specific points, without rewriting the text for the student. The teacher then reviews and approves the AI’s feedback before sending it. This “human-in-the-loop” model allows teachers to provide timely, detailed, and individualized feedback at a scale that was previously impossible. A teacher might use a prompt like: “Here is a paragraph I wrote. Provide feedback focusing on the strength of their topic sentence and their use of evidence, but do not rewrite it for them.”

Fostering an AI-Ready Classroom

Creating a healthy learning environment in the age of AI requires a proactive approach centered on clear policies, digital literacy, and open communication.

The foundation is to establish a clear classroom AI policy. Every educator should develop a simple, flexible policy for AI use and review it regularly. This policy should function as a guide for ethical engagement, not a list of prohibitions. It is crucial to define what constitutes constructive, ethical use (e.g., brainstorming, getting feedback on one’s own writing) versus what constitutes academic dishonesty (e.g., submitting AI-generated text as one’s own).

Beyond rules, educators must integrate AI literacy into the curriculum. It cannot be assumed that students understand how these tools work. This means dedicating class time to educating students on the capabilities, limitations, and ethical considerations of AI. This includes teaching practical skills like effective prompt engineering and essential concepts like how to spot AI “hallucinations” and the subtle ways that training data can introduce bias into the model’s output.

A simple and effective way to guide students is to create and share custom prompts and reusable AIs. By crafting prompts that are tailored to specific pedagogical goals—for example, a template designed to encourage critical analysis of a source—educators can model effective AI use. An even more powerful extension of this is to create shareable, custom AIs, often called “Custom GPTs” or “Gems.” These are specialized versions of the AI that are pre-loaded with specific instructions and context. An educator could create a “History Thesis Helper” that is an expert in their course material, or a “Lab Report Formatter” that guides students through the required structure. Sharing these resources not only helps students get better results but also embeds the desired learning process directly into the tool they are using.

Finally, it is vital to foster open dialogue. An educator should create a classroom culture where students feel comfortable and safe discussing the role of AI in their learning, asking questions, and even sharing their mistakes. By addressing the ethical implications and potential pitfalls of AI tools openly, the classroom becomes a collaborative space for exploring this new technology, fostering a sense of shared responsibility for its ethical use.

It is important to recognize that “AI burnout” is a reality. Many educators feel an immense pressure to adapt to everything at once, and that they have no time to do so. But this is not true. While we cannot dismiss AI, we do not have to change everything at the same time. The most sustainable path is one of small, deliberate experiments. By injecting AI into the easier parts of our teaching tasks first, we can achieve some easy wins, build our confidence, and give ourselves the time to reflect on the consequences before moving on to more ambitious integrations. The checklist below offers a simple way to begin.

A Four-Step Checklist for Educators

For educators feeling overwhelmed, here is a simple, actionable checklist to begin integrating AI into your practice:

Create and Discuss Your AI Policy: Draft your classroom AI policy using the appendix as a model. The most important step is to discuss it openly with your students on the first day. Frame it as a shared agreement for ethical engagement.
Use AI as an Assistant for One Task: Pick one administrative task this week and use an AI to help. Draft a lesson plan, create a rubric for an upcoming assignment, or generate a set of discussion questions. Experience the tool’s power and limitations firsthand.
Redesign One Assignment: Choose one of your existing assignments and brainstorm how you could redesign it to focus more on process, critical evaluation, or in-class performance. Start small and iterate.
Share a Resource: Create and share a custom GPT or a well-crafted prompt template designed to help your students kickstart one self-study activity or assignment. This models good practice and provides a valuable resource.

A Guide for the Modern Learner

For students, AI can be the most powerful learning tool ever created, but only if used with intention and integrity. The goal is to use AI to learn, not to short-circuit your own understanding. This requires a conscious shift from viewing AI as an answer machine to viewing it as a thinking partner.

Your Responsibilities as a User

Ethical use of AI begins with a clear understanding of your responsibilities. First and foremost, you must verify and clarify policies. Every course and institution will have different guidelines for AI use; it is your responsibility to know them and, when in doubt, to ask your instructor. Second, practice transparent disclosure. Being honest about how and where you have used AI in your assignments is a cornerstone of academic integrity and builds trust with your educators. Finally, you must protect sensitive information. Never input personal, confidential, or proprietary data into public AI models, as you have no control over how that data might be used or stored.

Using AI to Kickstart Your Work

One of the most effective and ethical ways to use AI is as a brainstorming partner to overcome the inertia of a blank page. You can use AI to generate initial ideas for a project, create a structured outline for an essay, or synthesize the key points from a long article. In this role, the AI acts as a catalyst for your own thinking, providing a foundation upon which you can build your original work. The goal is to use it to support your thinking, not replace it.

Using AI to Deepen Understanding

Instead of asking for a direct answer, use AI to guide you toward your own understanding. You can turn the AI into a Socratic partner that asks you questions instead of giving you solutions. For example, a prompt like “I’m trying to understand the causes of the French Revolution. Don’t list them for me. Instead, ask me questions that will lead me to the key factors” transforms a passive query into an active learning exercise. This approach reintroduces the “desirable difficulty” that is essential for true learning, using the AI to guide you rather than carry you.

AI is also an excellent tool for concept exploration. When faced with a complex idea, you can ask the AI to explain it in simpler terms or through an analogy, such as “Explain the concept of general relativity to me as if I were 12 years old.” This helps you build an intuitive grasp of the material that goes beyond rote memorization.

Using AI to Refine Your Skills

AI can be an invaluable coach for improving your practical skills through iterative feedback. As a writing coach, it can offer suggestions on clarity, tone, and structure without doing the writing for you. You can submit a paragraph you have written and ask for specific feedback, such as “Can you suggest three stronger verbs I could use in this sentence?”

As a practice partner, AI can generate an infinite number of practice problems for subjects like math, coding, or language vocabulary. You can ask it to create a quiz for you and then, crucially, to provide detailed explanations for any questions you get wrong, allowing you to learn from your mistakes in a low-stakes environment.

Build Your Own AI Tools

Beyond one-off prompts, the next level of AI literacy is learning to create your own reusable AI assistants. Modern AI platforms allow you to create “Custom GPTs” or “Gems”—specialized versions of the AI that you pre-program with your own instructions and knowledge. This is a powerful way to personalize your learning. For example, you could build a “Study Buddy” and upload all your course notes, empowering it to quiz you on the specific material. You could create a “Socratic Tutor” that is permanently instructed to only ask you guiding questions and never give direct answers. By building your own tools, you move from being a simple user to a creator, a skill that is becoming increasingly valuable.

Developing AI Literacy

Ultimately, the most important skill for a 21st-century learner is not just knowing how to use AI, but knowing how to critically evaluate its output. Never trust blindly. This new “AI literacy” is built on three pillars.

First, always be skeptical. Treat every statement an AI generates as a claim, not a fact. Second, fact-check everything. AI models can and will “hallucinate” incorrect information with complete confidence. You are the ultimate authority and are responsible for the accuracy of your work. Always use trusted, primary sources to verify any factual information the AI provides. Finally, learn to look for bias. Understand that the AI’s training data is a reflection of the vast and messy internet, full of human biases and stereotypes. Always question the perspective of the text it generates and be aware of its inherent limitations.

Putting It All Together

Here is a step-by-step example of how you might ethically use AI to help with a research paper:

Brainstorming: Use the AI to explore potential topics and narrow your focus.
Outlining: Work with the AI to structure your main arguments and create a logical outline.
Research: Use the AI to find sources or summarize articles, but always go to the original source to read it yourself and fact-check every claim.
Drafting: Write the full draft in your own words, using your outline and research.
Feedback: Ask the AI for feedback on the clarity, structure, and style of your draft.
Submission Checklist: Before submitting, review this list:
- Have I fact-checked every claim that originated from the AI?
- Can I explain and defend every part of this work in my own words?
- Have I followed my instructor’s AI policy to the letter?
- Does my declaration accurately and specifically describe how I used AI in this assignment?

Conclusion

The techno-pragmatist ethos that guides this book is rooted in a fundamental belief: the future is not predetermined. Technology is a tool whose impact is profoundly shaped by how we choose to employ it, and this is nowhere more true than in education. As a college professor, this is not an abstract debate for me; it is a topic I care about deeply, and I feel a profound responsibility to get it right.

The challenge is not to resist this new technology, but to harness it with wisdom. Instead of chasing the flawed ideal of automation or descending into an adversarial relationship based on detection, we must embrace a necessary pedagogical shift. The central problem in modern education is not a lack of content, but a scarcity of timely, personalized feedback. High student-to-teacher ratios make it nearly impossible for educators to provide the deep, iterative guidance that is crucial for student growth.

This is where AI can create a true revolution. Therefore, the true north for AI in education is not automation, but augmentation. We must leverage AI to solve the feedback bottleneck, using it to do what it does best—process information and provide feedback at scale—so that we, educators and learners, can focus on what we do best: questioning, creating, and collaborating within a human-centered community.

It is from this techno-pragmatist perspective that we have offered these guides. The strategies herein are not just tips and tricks; they are a framework for shouldering the shared responsibility of building a new AI literacy, ensuring that these powerful tools serve, rather than subvert, the timeless goals of a meaningful education.

Thanks for reading so far! As a complementary resource, here is a draft of an AI Policy for STEM classes. Feel free to, share, modify, and reuse it as you see fit.

Example AI Policy for STEM Classrooms

31.6KB ∙ PDF file

Download

This is a first draft of Chapter 7 of my upcoming book Mostly Harmless AI. Please, do share with me all your comments, suggestions, criticisms, and ideas.

AI for Critical Thinkers

Alejandro Piad Morffis — Tue, 29 Jul 2025 11:01:26 GMT

Photo by Rick J. Brown on Unsplash

This article is based on Chapter 4 of my upcoming book Mostly Harmless AI.

Having journeyed through the foundations of artificial intelligence—its history, its mechanics, and its limitations—we arrive at the most pressing question: How do we actually use the powerful new tools it has produced?

This article serves as a bridge from the theoretical to the practical. This is not a list of traditional prompt engineering hacks. The internet is filled with tips and tricks for coaxing a specific output from a language model for a single task. Instead, the advice that follows builds on our foundational understanding to offer something more durable: a general mindset for working with these tools. This approach is about engaging with language models in a way that allows you to get the best out of them without subcontracting your own critical thinking. It is a methodology for augmenting your intellect, not replacing it.

The principles you learn here are foundational, offering a universal toolkit for interacting with large language models in daily life—whether you are planning a vacation, trying to understand a complex news article, or drafting a simple email. In the following chapters (in the book), we explore how to adapt and intensify these practices for specialized, high-stakes professional environments. But first, every user must learn how to engage with these powerful yet fallible tools safely, critically, and effectively.

A Methodology for Effective Interactions with Language Models

To move beyond simple queries and unlock the true potential of language models, we need a more structured approach. This methodology is divided into three parts: establishing the right Mindset, employing effective Tactics during the conversation, and building a System to make your successes repeatable.

The Mindset

The most significant shift is in your mental model. Instead of treating the model as a search engine, you should approach it as a conversational partner. This means recognizing that the interaction is iterative and that your most important role is to guide the conversation.

A key part of this mindset is adopting a Socratic (or inquiring) approach, where you use the model not just to get answers, but to help you ask better questions. This is invaluable for sensitive and important tasks.

For example, instead of starting with “Write an email asking for a raise,” a partner-based approach would be to ask the model to guide you: “I need to write an email to my manager to ask for a raise. What are the key pieces of information and evidence I should gather first to make the strongest possible case?” The model will then prompt you for your accomplishments and market data, helping you build your argument before a single word is written.

Similarly, when organizing a child’s birthday party, you could ask, “I’m planning a science-themed party for my 7-year-old. What are the key logistical details I need to consider to make sure it runs smoothly?” In both cases, you are using the model to help you define the problem, which is a far more powerful use of its capabilities.

The Tactics

With the right mindset, you can employ specific tactics to steer the conversation toward a high-quality outcome. The most fundamental tactic is to be explicit and strategic with your queries. To ground the model’s response in reliable information, tell it where to look.

A generic query for medical advice is risky, whereas a much safer prompt would be: “Search for information from the Mayo Clinic and the World Health Organization on the common symptoms of iron deficiency.” This specificity is also crucial when comparing complex options, like, for example, buying an EV, you can be as specific as: “Compare the Tesla Model 3, the Hyundai Ioniq 5, and the Ford Mustang Mach-E for a family of four. Focus on real-world range, charging speed on a standard home charger, and available cargo space.”

To get an even more robust answer, you can move beyond a single query and assemble a ‘committee of experts.’ A single language model will give you its most statistically likely answer, which might not be the most creative or well-rounded one. To overcome this, you can generate multiple, independent perspectives.

For the EV comparison, you could open three separate conversation windows. In the first, you’d ask the model to act as a pragmatic engineer and, perhaps, the model will argue for the Hyundai. In the second prompt, you’d ask it to be a tech enthusiast, and maybe it makes the case for the Tesla. In the third, you’d have it act as a family-focused reviewer, causing it to arguing for the Ford.

By copying these three independent analyses into a final chat window, you can then ask the model to act as a senior editor, synthesizing the competing viewpoints into a final, balanced recommendation that weighs factors like cost, range, and reliability.

Finally, after the model provides a response—either a single answer or a synthesized one from your committee—you can employ self-criticism as a final refinement tactic.

Once you have a draft of your email asking for a raise, you can prompt it: “Read the email you just drafted. Now, act as my manager who is busy and skeptical. What parts of this email are unconvincing? Is the tone too demanding or not confident enough?” This critical step often surfaces weaknesses that you might have missed, allowing you to create a much stronger final product.

The System

The final part of the methodology is to turn your successful interactions into a repeatable system. A common mistake is to treat prompts as disposable. A more powerful approach is to build a library of reusable prompts, thinking of them as personal “natural language programs.” The multi-step process you used to plan the birthday party can be saved as a “Kids’ Party Planner” template.

The Socratic prompt that helped you prepare for your salary negotiation can be generalized into a “Career Conversation Prep” tool. The ultimate expression of this principle is the use of features like OpenAI’s “Custom GPTs,” which allow you to encapsulate a complex task into a dedicated tool that you or your team can use with a simple request.

A Practical Example

To see how these principles combine into a powerful workflow, let’s walk through a comprehensive, real-world task: planning a 10-day family vacation to Italy.

Rather than beginning with a vague request like “plan a trip,” the process starts by applying the Socratic approach. You would first ask the model to frame the problem for you: “I want to plan a 10-day family vacation to Italy. What key information do you need from me to create the best possible itinerary?”

This immediately shifts the dynamic, positioning the model as a guided partner. In response, it would act as a consultant, asking for crucial details like the number of travelers, the children’s ages, your budget, family interests, and preferred travel pace.

Once you’ve provided this context, the next step is to ensure alignment. You would instruct the model to synthesize and confirm the constraints: “Great, thank you. Based on my answers, please summarize all of my constraints for this trip in a structured list.”

With a clear, confirmed set of requirements, you can then confidently ask for a first draft. The iterative heart of the process begins now. Upon receiving the initial itinerary, you would employ the self-criticism tactic: “This is a good start. Now, act as a skeptical travel agent. Criticize this itinerary and tell me what’s missing or what could go wrong.”

The model might point out that visiting three major cities in ten days is too ambitious for a family with young children. Based on this valuable feedback, you can guide the revision, continuing this loop of drafting and critiquing until the plan is refined to your satisfaction. Only then would you ask for the final, detailed output.

The final, powerful step is to generalize this success. You would ask the model to convert the entire conversation into a reusable “Family Vacation Planner” template, complete with placeholders for key details. This turns a one-time effort into a valuable, programmable asset for future trips, demonstrating the true power of thinking of prompts as reusable programs.

Common Pitfalls for the Everyday User

The good practices above are designed to improve the quality of a language model’s output. This section focuses on the mental traps and risks you must be aware of to use these tools safely.

The “Eliza Effect” and Misplaced Trust

Because chatbots are designed to be conversational and helpful, it’s easy to start treating them as if they have genuine understanding, intentions, or even consciousness. This is a modern version of the “ELIZA effect” we discussed in the history chapter. The danger is that this leads to misplaced trust, where we stop critically questioning the model’s output because it feels so confident and knowledgeable. This is the psychological trap that makes us vulnerable to hallucinations; we are less likely to fact-check a “partner” than a machine.

Cognitive Offloading and The “Lazy Brain” Problem

The ease of asking a language model to summarize an article, draft an email, or brainstorm ideas can lead to a subtle but significant danger: cognitive offloading. By outsourcing the fundamental work of thinking, synthesizing, and structuring our thoughts, we risk letting our own critical thinking and creative muscles atrophy. The goal is to use these tools to think better, not to think less. Over-reliance can make us less capable problem-solvers in the long run.

The Privacy Risk of Casual Conversation

In a casual conversation with a chatbot, it’s easy to forget that you are interacting with a complex system run by a corporation. Users often paste sensitive personal information—medical details, financial data, private emails, proprietary work content—into public language models without considering where that data goes, how it’s used for future training, or who might have access to it. What you tell the model does not stay between you and the model.

You Are the Final Authority

The techniques above teach you how to get better raw material from the language model. This final principle is about what you, the human, must do with that material. It is the most critical step in using these tools responsibly.

First, never trust, always verify. The language model is an unreliable narrator. Treat its output as a well-written first draft, not a finished fact. For any critical piece of information—a date, a statistic, a medical suggestion, a legal point—you must verify it using an independent, authoritative source. The model can help you find potential sources, but you are the fact-checker.

Second, synthesize, don’t just copy-paste. The model’s output is information; your goal is knowledge. The most important work happens after the model has responded. Your job is to synthesize its suggestions with your own experience, judgment, and goals. The model can generate a list of tourist sites for your Italy trip, but only you can synthesize that into a vacation plan that feels right for your family.

Finally, own the outcome. The language model is a tool, and you are the user. Any decision made, any email sent, or any action taken based on the model’s output is your responsibility. This principle of accountability is non-negotiable. The model is an assistant that can help you think, but it is not a replacement for your personal judgment.

Conclusion

The journey from a novice user to a skilled one is not about memorizing clever prompts; it’s about a fundamental shift in mindset. Instead of treating generative AI as a vending machine for answers—an approach fraught with risks of shallowness, bias, and error—we’ve seen the power of engaging it as a conversational partner.

The practices outlined in this article—the Socratic method, strategic querying, and, most importantly, critical verification—form a framework for responsible engagement. This framework places you, the user, firmly in the driver’s seat.

The quality of the model’s output is not a feature of the model alone; it is a direct reflection of the quality of your guidance and the rigor of your review. You are not just a prompter; you are a director, a critic, and a synthesizer. This is what makes these powerful tools ‘mostly harmless’: not their inherent nature, but our commitment to using them with critical awareness and human authority.

By mastering these foundational skills, you are not just learning to use a new tool. You are developing a new form of literacy for the 21st century. As we move into the specialized applications for knowledge workers, developers, and creatives in the following chapters (of the book), this ability to think with AI, not just ask of it, will be your most valuable asset.

Thanks again for reading. If you want to dive deeper into Artificial Intelligence and learn to make the best out of it, from a techno-pragmatist, human-centered, responsible perspective, please check out my book Mostly Harmless AI.

The State of AI for Software Development

Alejandro Piad Morffis — Sat, 26 Jul 2025 11:12:02 GMT

This article is based on Chapter 5 of my in-progress book Mostly Harmless AI.

Few developments in the generative AI space have been as exciting lately as the rise of code generators. The evolution of these AI coding assistants is best understood not as a single leap, but as a progression of capabilities, moving from simple autocomplete to what may one day be fully autonomous agents.

At their core, code generators are Large Language Models trained on vast amounts of public code. They treat programming languages just like human languages, learning the patterns, syntax, and structure to predict what comes next. These models can take a natural language prompt and some contextual code and produce new code that mostly aligns with the prompt's intention. For example, you can provide a function signature and a comment like, "This function finds the minimum of an unsorted list," and the model will generate the function's body.

This uncanny ability to comprehend and generate code based on human communication is transforming the development landscape, but it also requires special considerations, as code is not just another natural language.

In this article, we will explore the landscape of AI for software development. We will begin by looking under the hood, examining the spectrum of capabilities that allow AI to generate code. Next, we will explore the use cases for developers across the development lifecycle. Then, we will discuss some important things to keep in mind, from hallucinations and security to theoretical limitations of AI for coding. Finally, we will look to the future of coding, consider how the developer role is evolving, and try to answer one crucial question: is coding dead?

How to Make a Code Generator

Let's imagine we are building our own code generator from scratch. The journey from a simple code predictor to a sophisticated development partner is a journey of adding layers of capability, moving up a spectrum of increasing autonomy.

The first thing we want is next-token prediction for code. The foundational layer is built on unsupervised training, making it essentially autocomplete on steroids, like a super duper IntelliSense. We start by training a model on vast amounts of code, teaching it to predict the most likely next token based on the immediate context. If we have variables or functions declared nearby, our model is more likely to generate code that references them, simply because that's the most common pattern in its training data.

Now that we have a basic generator, our next step is to teach it to follow instructions. To do this, we can compile a dataset of instruction pairs—for example, a natural language command like, "In the previous code, change the loop to be more efficient," paired with the corrected code. By training on these examples, our model learns to go beyond simple prediction and follow specific, human-given directions. We can further enhance this process with Reinforcement Learning, where we have human or automatic evaluators rank different code outputs. This teaches our model to not only generate syntactically correct code, but also to respect desired styles and naming conventions.

Our generator is getting smarter, but it's still limited by the immediate context. To give it a long-term memory, we move to context-aware generation. We can dramatically enhance our model's ability to generate relevant code by allowing it to pull from a broader context. This is a form of Retrieval-Augmented Generation (RAG) for Code. We can index an entire codebase or external API documentation, allowing our model to find relevant examples and patterns. When a developer asks a question, our system retrieves these examples and feeds them into the prompt, allowing the model to generate accurate code by combining and refactoring snippets from the provided context, even for libraries it wasn't explicitly trained on.

So far, our model can only write code. To up our game, we can give it the ability to interact with its environment by making it use external tools. This represents a significant leap. We can equip our model with a set of tools it can invoke on demand. For example, in response to a prompt like "add a library for charting," our model could invoke a tool to install the missing dependency in the project. If you ask it to "check whether this works," it could invoke another tool to run the unit tests and report back the results. It could even use tools to directly modify files in the codebase. By giving our model the ability to take actions beyond just generating text, we empower it to participate more actively in the development process.

The final step on our journey is to give our code generator a bit of autonomy, creating what’s called an agentic system. This is the most advanced and forward-looking form of AI-based code generation. We can design an AI agent that takes a high-level goal, breaks it down into sub-tasks, writes code, generates tests, runs the code, and then analyzes the output or errors. Based on the results, it can then debug or modify the source code in a continuous loop, acting as a semi-autonomous developer to see a task through from start to finish.

Beyond these training methodologies, we can leverage the formal nature of code to improve our model's performance. Unlike natural language, code has strict syntactic rules that can be programmatically checked. One simple but effective technique is trial and error during inference: we can have our model produce several potential code snippets, run them through a linter, and automatically reject any that have parsing errors.

More advanced techniques can pre-process the training data, for instance by normalizing all variable names to a generic format like var0, var1, etc. This makes it much easier for our model to learn the structural relationships in code without being distracted by specific naming conventions, and we can substitute the actual names back in a post-processing step. These tricks leverage the fact that we are dealing with a very restricted syntax to make it easier for our language model to learn the rules.

Finally, to create the ultimate specialized assistant, we can go beyond RAG and fine-tune a model on a specific codebase. While RAG provides external context, fine-tuning actually updates the model's internal weights. By training a model on a company's entire private and proprietary codebase, we can create a version that has deeply internalized that organization's specific architectural patterns, internal APIs, and coding standards. This results in an AI partner that not only answers questions correctly but does so in a way that is idiomatic and aligned with the team established practices.

Use Cases for Developers

Understanding the engine is one thing; knowing how to use it is another. AI offers you a powerful toolbox that you can apply across the entire software development lifecycle. In this section, you will explore practical use cases across three key phases: Ideation and Design, Implementation and Development, and Verification and Explanation. You will see how you can use both sophisticated, LLM-based coding tools integrated directly into your IDE, as well as techniques that you can use with standard, general-purpose chat apps like ChatGPT or Claude, requiring no special integration at all.

Phase 1: Ideation and Design

Before you even dare writing a single line of AI-generated code, you can already use LLMs as powerful brainstorming partners for exploration and design. This is one of the most accessible ways you can use AI, as it doesn't require a specific tool or editor extension; you can do it effectively using general-purpose conversational AI applications like ChatGPT, Perplexity, or Gemini. Models with live browsing capabilities are often even better for this phase, as they can pull in the latest information about new frameworks, libraries, and design patterns.

The key is for you to treat this phase as an interactive exploration. Instead of asking for a single, ready-made answer, you should guide the model through an ideation process. Here's how you can do it: use a chain-of-thought approach by asking the model not just for a solution, but to "think step-by-step" through the pros and cons of different architectural choices.

A powerful pattern you can use is to ask the model to generate several variants—for example, "Propose three different ways to design the database schema for a social media app." Then, you can discuss the options back and forth, using self-critique prompts to have the model compare the alternatives it just generated. At the end of this collaborative session, you can ask the model to provide a structured summary of all the design decisions you have agreed upon, acting as an executive design document.

With this document in hand, you can then start a new, more focused session with a code-oriented model for the actual implementation.

Phase 2: Implementation and Development

The most straightforward way you can use AI in this phase is for generating short, self-contained code snippets. This can be for a well-known algorithm, a common pattern, or the use of a well-documented API. This is a task you can accomplish with any standard chat app, even outside your IDE. This is especially powerful for navigating the complex world of APIs and libraries.

As a professional programmer, you probably aren't spending that much time doing basic coding, like inserting numbers in a list. No, reality is 90% of the code you write is interface code with some external library you may not know well. Instead of manually searching documentation, you can simply ask an AI assistant, "How do I use this library to make a query that does X?" and get a ready-to-use snippet.

The next level of integration is bringing AI directly into your IDE. This can start with simple code completion, but the real power comes from integrating a full chat experience. This allows you to highlight a block of code and ask for specific changes, such as, "Refactor this function to be more efficient," and have the model modify the file directly. If the model has RAG capabilities and can scan your entire codebase, it gets even better. The modifications and additions it suggests will be consistent with your existing coding style and use your own libraries and methods, making the integration seamless.

At the far end of the spectrum is the full agentic mode, which is still in its infancy with tools like Cursor. This offers a much more hands-off development experience. Here, you can give a coding agent a high-level task, and it can modify several files, create new ones, and even run commands in the terminal to install missing dependencies.

Finally, you don't always need a full IDE. For one-off scripts or quick prototypes, you can use the "Canvas mode" in apps like ChatGPT, Claude, or Gemini. These provide a simple editor-like interface where you can iterate back and forth with the model to update a script. Some tools even allow you to run these scripts directly in the cloud, letting you build and test disposable web apps instantly.

Working with these tools introduces a new core skill, an AI-in-the-loop coding workflow—the day-to-day interactive process of collaborating with an AI. It involves an iterative cycle of prompting with a clear goal, carefully reviewing the AI's output, correcting its mistakes or flawed assumptions, and then re-prompting with more specific instructions or feedback.

Phase 3: Verification and Explanation

To ensure your code quality, you can use an AI to help generate a wide range of test cases. This is especially useful for uncovering corner cases that might not be immediately obvious, such as handling empty inputs, maximum values, or unusual user behaviors. You can do this with integrated AI coding tools, but it can also be as easy as uploading your codebase or relevant files to a standard chat app and asking it to suggest test cases.

You can ask for both code-based tests (like unit and integration tests) as well as descriptive tests (like user stories or manual testing scripts). In all these scenarios, it helps to instruct the model with a Chain-of-Thought prompt, asking it to first explain what behavior it wants to test, and only then provide the actual test. This ensures the tests are intentional and well-understood.

Furthermore, when you're faced with a cryptic error, you can use AI for debugging. You can feed the AI the error message, stack trace, and relevant code, and it can analyze the context to suggest potential causes for the bug and possible fixes, acting as an experienced pair programmer.

The opposite, code-to-language direction allows you to create powerful new workflows for understanding code. You can ask an AI for automatic documentation of functions or for natural language explanations of a complex code fragment. This can be done directly inside your IDE with an integrated tool, or with standard chat apps. For example, some tools allow you to connect a public GitHub repository and ask high-level questions on the fly, which is very good for getting a quick overview of a new codebase.

A particularly valuable use case is in legacy code modernization. One of the biggest challenges in the software industry is maintaining and updating old codebases. You can use AI to tackle this problem by feeding it legacy code (e.g., from an old COBOL or Java system) and asking it to analyze the logic, add explanatory comments, or even translate the entire system to a modern language and architecture. This can dramatically reduce the cost and risk associated with modernizing critical systems.

However, you should be aware of the critical gap between syntax and semantics—that is, between understanding what the code says versus what the code does.

The weaker models are mostly limited to describing what the code is saying syntactically (e.g., "this variable is changed to this array position"), this capability is improving all the time. More powerful models can often provide higher-level, semantic explanations of what the code is doing (e.g., "this loop is ensuring the first part of the array is always sorted"). But even the best models may not be able to grasp the full architectural details or business logic of a complex application.

Putting It All Together

Putting this all together, let's see how a complete workflow might look for tackling a specific, somewhat complicated feature in an ongoing app, like adding OAuth login.

First, you would start in ideation mode, interacting mostly in text with the AI. You would discuss a high-level overview of the required architecture changes, which parts of the app might be impacted, and the best libraries to use. The goal here is to produce a clear design roadmap before any code is written.

Next, you would move to implementation mode, going full hands-on with an agentic tool. You could assign the agent the high-level task from your roadmap: "Implement the OAuth login feature using the chosen library." The agent would then get to work, creating new files, modifying existing ones, and writing the necessary code. As it encounters errors or ambiguities, you would engage in a back-and-forth conversation to guide it, but the bulk of the mechanical coding would be handled by the agent.

Finally, you would enter review mode. Once the agent reports that the feature is complete, you could have a final conversation with the AI. You could ask it to analyze the git diff of all the changes it made, explain the rationale for its implementation choices, and generate comprehensive documentation for a pull request. After your final review and approval, you would then submit the PR for human review by your team.

Things to Keep in Mind

While the toolbox is powerful, it comes with sharp edges. The most important limitation in language modeling, in general, has been called the problem of hallucinations. In the context of code, this means AI-generated code is not infallible and can contain subtle bugs that require constant vigilance.

Hallucinations and Mistakes

The simplest way you can see hallucinations is when you get code that uses a new variable that doesn't exist or fails to close a parenthesis. Unlike with natural language, you can often detect these syntactic errors automatically with a linter or compiler, so many of the more harmless hallucinations are not relevant as they won't introduce subtle bugs.

A slightly more difficult hallucination is what we can call a semantic hallucination, where the model uses a wrong variable or function name that does exist in your codebase. In this case, you will not get a compiler error because you're using an existing symbol, but you will get the wrong behavior. This is much harder to find because it has the same problem as most hallucinations: you have to review the code and be knowledgeable enough to have been able to generate that code yourself.

The most insidious errors are logical flaws. This occurs when the code doesn't do anything obviously wrong—it uses the right variables and looks plausible—but it has some subtle logical mistake that leads to a bug. For example, finding that a variable is not updated at the right moment in a nested loop is a tricky problem even for human experts. These kinds of mistakes will introduce subtle, hard-to-detect bugs.

But even if the bugs are no worse than what a human would introduce, they pose a threat because of "automation bias." When you check code written by humans, you expect bugs. But when you're looking at machine-generated code, the only way programmers have ever interacted with it has been with rule-based systems like compilers, and that code is basically without mistakes.

So even if the language model makes errors that are, on average, no worse than what a regular programmer would make, they can still be harder to detect because they won't be the exact same mistakes a human would make, and we may be less on guard.

AI's Impact on Technical Debt

The rapid generation of code by AI presents a double-edged sword for technical debt. On one hand, you can use AI as a powerful tool to reduce existing debt. You can ask it to analyze your codebase for inefficiencies, suggest refactorings, or add missing documentation and tests, thereby improving code quality.

On the other hand, the very speed of AI can create new and more complex forms of technical debt. Relying heavily on "vibe coding" to quickly generate features without rigorous human review can lead to a codebase filled with poorly understood, inefficient, or subtly buggy logic. This AI-generated debt can be even harder to untangle later, as the original human intent behind the high-level prompt may be lost.

Biases in Generated Solutions

Models trained on a vast corpus of public code from the internet will inevitably learn from outdated examples. This can lead them to perpetuate outdated practices by suggesting deprecated functions, old library versions, or inefficient algorithms that are no longer considered best practice. An AI model will also often default to the most statistically common solution it has seen in its training data.

This can stifle creativity and lead to a homogenization of code, discouraging the exploration of more elegant or contextually appropriate solutions. Finally, just as AI can perpetuate harmful societal biases, it can also reproduce human biases from the code it was trained on. This can manifest as non-inclusive language in generated comments or variable names.

Security and Licensing Risks

A significant risk is that an AI can generate code with known security vulnerabilities. If the model was trained on public code containing flaws like SQL injection or buffer overflows, it may reproduce those same insecure patterns in its suggestions, creating a major security risk for the application.

Furthermore, the use of AI-generated code introduces complex legal questions. A model might reproduce a code snippet verbatim from a repository with a restrictive open-source license (like the GPL), inadvertently pulling that license's requirements into a proprietary project. The legal ownership of the AI-generated code itself remains a gray area, creating potential intellectual property challenges for companies.

The Economics of AI Development Tools

While these AI tools offer significant productivity boosts, they are not free. For development teams and organizations, it's important to consider the practical economics of their adoption. Most advanced AI coding assistants operate on a subscription model, which introduces a new operational cost. Team leads and CTOs must perform a cost-benefit analysis, weighing the price of the tools against the expected gains in developer speed, code quality, and reduced time-to-market. The return on investment (ROI) will depend heavily on how well a team integrates these tools into their workflow and whether the productivity gains justify the recurring expense.

Theoretical Limitations

Beyond the practical issues of hallucinations and biases, there is a more fundamental, formal limitation to what we can do automatically. This is captured by Rice's theorem, a cornerstone of theoretical computer science. In short, the theorem proves that there is no algorithm that can automatically check for any non-trivial semantic property of a program.

What does this mean in practice? A "non-trivial semantic property" is basically any interesting question about what a program does. For example: "Does this program ever crash?" or "Will this function always return a positive number?" or "Is this code free of security vulnerabilities?" Rice's theorem tells us that it is mathematically impossible to build a universal program that can answer these kinds of questions for every possible piece of code.

This highlights the theoretical impossibility of perfect, automated code verification. We will never be able to build an AI that can look at code generated by another AI (or a human) and formally guarantee that it does exactly what the natural language prompt intended. That problem is, in the general case, unsolvable.

However, this doesn't mean we should give up. Engineering isn't about theoretical perfection; it's about solving the average case in the best possible way and handling the most important edge cases reasonably well. While we can't achieve perfect verification, we can get pretty far with a combination of AI-generated tests, linters, and, most importantly, expert human review.

The Future of Coding

Given these tools and guardrails, the very nature of programming is set to transform. The focus will shift from the mechanics of writing code to the art of building systems. The term "vibe coding," popularized in developer communities, captures the essence of this shift. It describes a workflow where the developer's primary job is no longer to write precise, line-by-line syntax, but to describe the high-level behavior, intent, or "vibe" of the desired software to an AI partner. The focus moves from how to do something (the specific algorithm and syntax) to what needs to be done (the ultimate outcome and user experience), leaving the mechanical implementation details to the AI assistant.

This approach is incredibly powerful for rapid prototyping, hackathons, and short-term projects. A developer can quickly scaffold an entire application by describing its components in natural language, getting a functional prototype up and running in a fraction of the time it would take manually.

However, this method has significant limitations for larger, more detailed projects. "Vibe-based" instructions are often ambiguous and can be misinterpreted by the AI, leading to code that works for the happy path but fails on edge cases. For long-term, mission-critical software, the precision, maintainability, and strict adherence to architectural standards that come from deliberate, human-led coding remain indispensable. Vibe coding is a tool for speed and exploration, not a replacement for rigorous engineering.

In this new paradigm, future developers will become experts at wielding a suite of AI tools and agents. Skills in "prompt engineering," system design, and the critical review of AI output will become more valuable than the ability to recall specific syntax. The developer's role becomes one of guidance and orchestration, knowing which tool to use for which task and how to verify the results.

Looking ahead, this elevated role may involve assigning entire features or bug fixes to autonomous agents. These agents would manage the full lifecycle: understanding the ticket, writing the code, creating tests, committing to version control, and responding to feedback from the CI/CD pipeline. This doesn't eliminate the developer but elevates their role to that of a system architect and project manager, overseeing a team of AI agents.

Beyond the changes in workflow, it's worth contemplating how these tools will change the qualitative experience of being a developer. We must ask ourselves how it feels to code this way. Does offloading the cognitive burden of syntax and boilerplate make you dumber and cause you to forget how to code, or does it free up mental space, allowing you to become even more proficient in the things that truly matter—the high-level ideas and architecture?

We must also consider the social aspects. You now have a partner that is not a human. How will this impact teamwork? Will this AI partner become a virtual member of the team, participating in code reviews and design discussions? Or will it alienate developers into more lonely roles, as they interact more with their AI than with their human colleagues? How does a senior developer mentor a junior who can always get an instant answer from an AI, potentially masking gaps in their fundamental knowledge?

These are open questions we must navigate as we integrate these powerful new collaborators into our teams.

Final Remarks

So, Is Coding Dead?

There is a real concern that if AI can write 90% of the code in 10% of the time, nine out of ten programmers could be out of a job. And yes, every time automation has reached an existing industry, some jobs are destroyed as some skills become irrelevant.

However, I claim we must not fear the advent of AI coding assistants. Here’s why.

Writing code is by far neither the hardest nor the most time-consuming part of software development. The process of making software involves understanding requirements, talking with customers, user testing, and product design, all of which are at least one order of magnitude more difficult that actually typing code.

A hundredfold boost in productivity for a task that is only 10% of the overall process is huge, but it still leaves the other 90% of the human-centric work. We will still need to understand what our customers want, guide them through designing a software product, know the user base, and find a sustainable business model.

And no, you cannot simply simulate the end user with a language model, so the AI can prompt itself into making a usable product, because your end user will still be human. Human users are slow, get angry easily, don't understand your application, and don't know what it is they don't like about it. Until an AI can really replicate what it feels like to be a human—and at that point, will we still call it “artificial”—we can’t take the human out of the software development loop—or any creative loop, for the matter.

The biggest progress in the software creation process has always been because of innovation in the human side, not the machine side. Innovation in software engineering, management, and how you get people to work together and collaborate will continue to be the most important part of the software pipeline for a long time.

Furthermore, software is an industry that is nowhere near its saturation point. We have far more need for software than the number of people who can currently write it. Increased productivity will likely be met with increased demand, creating more and better software for more users.

Every leap in software productivity—from assembly to compilers, from C to object-oriented frameworks—has lowered the barrier to entry and brought more people into programming. AI tools will likely do the same, empowering more people to create software. The modern world runs on software, and in the future, basic programming literacy may become as common as basic math literacy is today.

Most people know enough math to get by in daily life without hiring a mathematician, and in the same way, more people will know enough programming to automate simple tasks. They will learn to say to their home computer, "When I get home, I want you to turn my lights on, but only if it's night and the electric bill is not above the average," and an AI will generate the code to make it happen. This expands the field rather than shrinking it.

So, should you learn to code? Definitely. There's going to be orders of magnitude more code written in the next few years than everything we've written in history.

But even if you never end up writing a single line of code unaided by AI—like I've never written a single line of production code unaided by syntax highlighting, a linter, or a type verifier—knowing how code works, how algorithms work, and why a specific programming construction works the way it works is the same as knowing basic math. Coding changes how your brain is wired, makes you think clearer, and increases your creativity.

Furthermore, even if you are not working in the software industry, learning to code is still an immensely enjoyable experience. Being able to create something that keeps working on its own is, I think, the ultimate toy.

So if you want to make a dent in the software industry and you're wondering if AI will get you out of the picture, don't worry. That won't happen anytime soon. Learn to code, learn the fundamentals, but also learn how to use these new tools. As in every moment in human history, if you apply yourself and do your best, you will be at the top of the league, and there will be a spot for you.

A Brief History of Artificial Intelligence

Alejandro Piad Morffis — Tue, 22 Jul 2025 13:22:09 GMT

Note: This article is based on the Prologue and Introduction chapters of my in-progress book Mostly Harmless AI, which deals with how to harness the power of Artificial Intelligence for good. You can get the early draft at a 50% discount in the link below.
Get Mostly Harmless AI (50% off)

For centuries, we humans have been captivated by the wild idea of a thinking machine. This isn't some modern tech obsession, even if it seems nowadays no one talks about anything else. No, the ancient dream of thinking machines goes way back, whispered in myths about automatons and golems in religions galore, and later, famously brought to life (or at least, cleverly faked) by feats like the Mechanical Turk.

One of the most fascinating aspects of the history of Artificial Intelligence has been this often dramatic back-and-forth between two core, seemingly antagonistic approaches to building intelligent machines. On one hand we have logic and rules (what’s often called symbolic AI), and the other hand, data and patterns (or statistical AI). In a deep way, this mirrors that age-old philosophical tug-of-war between rationalism (figuring things out through pure reason) and empiricism (learning from experience).

In this article, I want to explore the history of AI from this lens of rationalism (or symbolic, rule-based AI) versus empiricism (data-driven, statistical AI). Come with me into this deep dive to learn how these seemingly opposite philosophies have shaped AI’s past, defined its present, and are now finally starting to team up for its future.

Subscribe now

Prologue

The AI dream didn't kick off with microchips or code. Nope, it began way before, with grand philosophical ambitions and some seriously imaginative leaps, all thanks to incredibly brilliant and diverse minds.

Our journey begins in the 17th century, at the hand of Gottfried Wilhelm Leibniz. By this point in his life, Leibniz was already a superstar: a philosopher, mathematician, logician, and diplomat, who invented (or discovered) calculus independently of Newton. He actually invented the notation that we use today, with the integral symbol and the upper and lower limits.

Leibniz was a true polymath, totally immersed in the Enlightenment's big project of organizing all knowledge and reason. His drive wasn't just academic; he genuinely believed that logic could solve every human argument. He imagined a world where disagreements weren't settled by yelling or endless debates, but by calm, undeniable calculations. Inspired by how algebra and calculus, by means of clever notation, could make even the trickiest problems appear simple, Leibniz fueled his grand dream of universal computation in a time where even the simplest calculation machines where considered a marvel.

What if, he mused, we could formalize all human reasoning in a similar way? He dreamed up a characteristica universalis—a universal language for thought—and a calculus ratiocinator—a mechanical way to reason with it. In this, Leibniz was a rationalist: he believed the all human thought was a grand, logical machine. Unknowingly, he was laying the intellectual groundwork for logic, which paved the way for symbolic AI centuries later.

Fast forward to 19th-century England, we find Lady Ada Lovelace. The daughter of the famously rebellious poet Lord Byron, Ada was a formidable brain, tutored in math and science by some of the most prominent thinkers of her time. By the time she met Charles Babbage, the great inventor, he was working on his Analytical Engine, an abstract machine that could, in principle, do anything a modern computer can do. Ada was already known for her sharp mind and amazing mathematical insights, but she also had this poetic and imaginative side that shaped her view of technology.

While Babbage saw his Analytical Engine mostly as the ultimate number-cruncher, Ada's mind took flight beyond mere arithmetic. She famously wrote that the Engine "might compose elaborate and scientific pieces of music, or in any other extent, generate new content." She was more than a century ahead of Generative AI, dreaming of the days machines would usher a new era of synthetic creativity.

As a side note, Charles Babbage would never finish constructing an actual, physical embodiment of his Analytical Engine. He kept imagining improvements over improvements, never quite settling on something that he could actually construct and use. It would have been the first true computer, but it forever remained as an unfinished project. This serves as a cautionary tale against the all too common syndrome—aptly called the Babbage Syndrome—of intellectualizing ad infinitum without actually testing out your ideas in the real world.

A few decades later, mid-20th century, as the dust settled from war and the digital age dawned, came the man who's probably the most important figure in the history of Computer Science at large, the great Alan Turing. By the time his groundbreaking work on machine intelligence came out, Turing was already widely considered among the greatest logicians and mathematicians of his time.

He's basically the Father of Computer Science, having come up with the abstract model of computation we know as the Turing Machine—the theoretical blueprint for every modern computer—and proving not only its potential but its intrinsic limitations. His wartime experience, where he played a key role in breaking the Enigma code, gave him also a very practical grasp of the power of computing. He actually built the first electromechanic general purpose computer, but this massive milestone was kept secret for years after his death.

Turing was a man of quiet brilliance. He wasn't just curious about what a real machine could do; he was fundamentally wrestling with the very definition of thinking itself. In his famous 1950 paper, "Computing Machinery and Intelligence," he dare ask if machines could think, like, for real. He proposed a brilliant, practical yet deeply philosophical way to assert it: what he called The Imitation Game, but the world came to know as the Turing Test.

If a machine could chat with a human, he suggested, in such a way that the human couldn't tell if they were talking to a machine or another human, then, for all intents and purposes, the machine could be considered to be thinking. This wasn't just a practical experiment, though; it was a functional definition of thinking that sparked the computational theory of mind. The implications of his hypothesis are at the core of the most profound discussions in the field of Philosophy of Mind, even today.

But crucially, in that same paper 80 years ago, Turing looked beyond the test and tossed out several ideas for how such an artificial intelligence might actually be achieved. These included the concept of a learning machine, raised like a human child, soaking up knowledge from experience instead of being preprogrammed to know everything beforehand; and even hinted at using bio-inspired algorithms to mimic how evolution works.

These ideas foreshadowed major pillars of modern AI systems, like neural networks and metaheuristic search algorithms, showing his amazing foresight and his deep understanding of both rationalist and empiricist paths to intelligence. Tragically, he wouldn't live to see his dream materialize into the massive body of knowledge and practice that is the field of Artificial Intelligence.

The Foundational Era (1950s - Late 1960s)

The history of Artificial Intelligence as a scientific field formally begins in the summer of 1956. A small group of brilliant minds, including John McCarthy, Marvin Minsky, Nathaniel Rochester, and Claude Shannon, got together at Dartmouth College. It was right there, at the Dartmouth Summer Research Project on Artificial Intelligence, that the term "Artificial Intelligence" was officially coined. This workshop wasn't just a meeting; it was a declaration, setting up AI as a legitimate academic discipline with a huge goal: to build machines that could think like humans do.

In these early days, the predominant paradigm was symbolic AI. Researchers believed that machines could become intelligent by putting human knowledge and reasoning into explicit rules.

One of the first, and most impressive, demonstrations of this ethos was The Logic Theorist (Newell & Simon, 1956). This program could prove math theorems, not by brute force, but by using symbolic logic, kind of mimicking how humans solve problems. It was a clear sign that machines could actually do some form of abstract reasoning if instructed correctly.

Game-playing also became a hot area for symbolic AI research. The popularization of the Minimax algorithm in board games like chess and checkers enabled early computers to play optimally by exploring all possible moves.

These were the days of automatic reasoners and early forms of knowledge representation, with projects like the General Problem Solver aiming to tackle all formally decidable math problems. The idea was simple, yet powerful: if we could just write down all the rules, the machine would be smart enough to solve them.

It was also during this time that a seemingly simple program really captured people's imaginations: ELIZA, the first chatbot. Developed by Joseph Weizenbaum in the mid-1960s, ELIZA was a barebones linguistic interface designed to pretend to be a Rogerian psychotherapist. It worked by using simple pattern matching and rule-based responses, often just turning what you said into a question ("You say you are sad. What makes you say you are sad?").

Even though it was incredibly simple, many users found themselves opening up to ELIZA, believing it actually understood and empathized with them. This phenomenon became known as the ELIZA effect, a powerful reminder of how easily we humans tend to see human qualities in technology. ELIZA, despite being a purely symbolic, rule-driven system, sparked a persistent dream in the AI world: the quest for truly conversational AI, for machines that could talk with us naturally. This dream, born from simple rules, would keep pushing AI's boundaries for decades.

But even with symbolic AI leading the way, a different kind of idea was quietly taking root: connectionism, an early form of statistical AI. This approach drew inspiration from biology, specifically how neurons, despite their simplicity, could become exceedingly intelligent when connected in just the right way. The Perceptron, introduced by Frank Rosenblatt, was an early artificial neural network built to learn patterns directly from data.

The initial excitement was huge; these "learning machines" seemed to offer a path to intelligence without needing every single rule programmed explicitly. Imagine, a machine that could learn just by seeing examples, like a human brain! This perspective leans heavily into the empiricist tradition, where knowledge is gained through sensory experience and and data.

However, the honeymoon period didn't stick around. Both approaches ran into big problems and ultimately failed to scale beyond toy problems.

Early symbolic systems, while impressive in their specific areas, turned out to be quite brittle. They struggled with common-sense knowledge and couldn't easily adapt to new situations outside their carefully programmed rules. Trying to teach a machine absolutely everything it needed to know, one fact at a time, was an insurmountable challenge.

Meanwhile, perceptrons hit their own walls. Marvin Minsky and Seymour Papert's 1969 book, Perceptrons, famously pointed out their inability to adequately represent even the simplest nonlinear relationships in training data. They couldn't become complex enough, no matter how many neurons you connected.

This period became known as the First AI Winter: a big drop in funding and public interest as those initial grand promises didn't pan out. And this early struggle between explicit rule-based systems and pattern-based approaches set the stage for the dynamic tension that would define AI's whole history.

The Knowledge Era (1970s - Mid 1990s)

Coming out of the first winter, AI didn't vanish; it just regrouped, with symbolic methods making a strong comeback. The 1970s and 80s saw the development and initial commercial success of expert systems. These were AI programs designed to mimic how a human expert makes decisions in a very specific, narrow field.

Examples of these are systems like MYCIN, which helped diagnose blood infections, or XCON, which configured computer systems.

The main focus here was on capturing human expertise for a specific area and representing that knowledge using handcrafted rules and facts. Imagine writing down a comprehensive set of rules for medical diagnosis, for example. All the possible questions and follow ups, and all consequences of the possible answers. These systems used sophisticated inference engines (basically, automatic reasoners) to apply those rules and draw conclusions.

This era was the pinnacle of the rationalist approach to AI, aiming to formalize and apply human expertise through clear logical structures.

While expert systems grabbed the headlines, research into neural networks quietly kept going. A major algorithmic breakthrough during this time was the popularization of backpropagation (thanks to Rumelhart, Hinton, and Williams in the mid-1980s). This algorithm finally gave us an efficient way to train multi-layered neural networks, letting them learn much more complex patterns, thus breaking free of their primary limitation.

Just as expert systems hit their peak, their own limitations became painfully obvious. They were super expensive to build and maintain, needing human experts to painstakingly put in all their knowledge. They were also incredibly brittle, like all purely symbolic approaches; even a tiny change outside their programmed domain could break them entirely.

And so, the Second AI Winter arrived. This time, it was clear that while symbolic AI had done impressive things, it just couldn't scale to the complexity of the real world. At the same time, statistical AI wasn't really working yet, as the available data and computational infrastructure was insufficient. But this was about to change.

The Internet Era (Late 1990s - Early 2010s)

The internet changed absolutely everything, and AI wasn't the exception. Suddenly, data was everywhere, and statistical approaches were perfectly positioned to take advantage.

The huge growth of the internet in the late 1990s and early 2000s, combined with more and more computing power, led to an unprecedented explosion of digital data. Every click, every search, every photo uploaded contributed to a massive ocean of information. This Big Data was the fuel that statistical AI had been waiting for.

With tons of data available, statistical machine learning algorithms devised decades earlier really suddenly started to work. Techniques like support vector machines (SVMs), decision trees, and ensemble methods became extremely popular. They weren't just theoretical curiosities anymore; they were powering real-world applications. Search engines used them to rank billions of web pages, email providers deployed them to filter millions of spam messages, and e-commerce sites implemented them to recommend thousands of products to millions of users.

These were all problems perfectly suited for statistical machine learning, which could find subtle patterns in huge datasets without needing explicit rules for every single situation. It just need massive data and computational resources to work, and now we had both, in excess.

But symbolic AI didn't disappear. While statistical methods took center stage, symbolic approaches found new roles. The popularization of domain-specific ontologies (formal ways to define concepts and relationships) gave rise to the ideal of semantic web (an interconnected network of different data sources) which provided ways to structure and link information in ever growing knowledge bases, in a completely distributed and emergent process.

While the promise of a fully inter-connected semantic web hasn’t exactly panned out yet (and might never will), the underlying notion of organizing the world’s knowledge into networks of concepts stuck. These symbolic tools often worked alongside statistical methods, giving structured data that machine learning algorithms could then use, or making the results of statistical methods easier to understand. The Google Knowledge Graph is a prime example of this interplay between statistical and symbolic methods. It allowed Google to absolutely dominate the search industry for more than a decade and counting.

It was also during this period that the widespread use of AI in recommendation systems (like YouTube and Twitter) using purely algorithmic feeds started to show some of the earliest downsides of AI. While these systems were designed to personalize experiences and filter information, they also began creating filter bubbles and echo chambers.

The algorithms, often reflecting biases already in their training data, could also subtly strengthen existing prejudices or even be used to spread misinformation really fast. This early peek into AI's societal impact highlighted that even seemingly harmless applications could have big, sometimes negative, effects on how humans think and interact, setting the stage for the more complicated ethical discussions we have today.

The Deep Learning Era (Mid 2010s - Early 2020s)

If the Internet Era was just the warm-up, the mid-2010s brought the main event: Deep Learning. This wasn't just a step forward; it was a giant leap. The first big breakthroughs came from training much deeper neural networks than anyone thought possible before, often using clever tricks like layer-wise unsupervised pre-training.

The turning point arrived in 2012 with the ImageNet Large Scale Visual Recognition Challenge. A team led by Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton entered a deep convolutional neural network called AlexNet. Its performance was jaw-dropping, totally blowing away all previous attempts.

AlexNet showed off the immense power of Convolutional Neural Networks (CNNs) for recognizing images. It proved that deep neural networks, given enough data and computing power, could learn incredibly complex features. This was a huge win for the empiricist approach, demonstrating that some form of intelligence could eventually pop up from massive data and complex, learned patterns, rather than needing explicit programming.

From that point on, deep learning just exploded. Quick advancements led to super sophisticated deep architectures like ResNets and Inception, plus crucial innovations like attention mechanisms, which let models focus on the important parts of the input. Deep learning quickly spread beyond just computer vision. It totally changed natural language processing, speech recognition, and even reinforcement learning.

Remember when AlphaGo, a Google DeepMind AI, beat the world's best Go players? That wasn't just a symbolic search algorithm; it combined Monte Carlo Tree Search (MCTS) with deep learning. It was a powerful statistical approach that could efficiently explore the huge and complex game spaces of Go, where traditional symbolic search (like Minimax) was simply intractable. It was a clear demonstration of statistical AI's ability to tackle problems previously considered beyond reach.

This period of fast progress also brought a critical realization, often called The Bitter Lesson. Coined by Rich Sutton, the Bitter Lesson basically says that over the long haul, general methods that really lean into computation (like just making neural networks bigger and feeding them more data and processing power) tend to be more effective and robust than trying to build in human knowledge or super detailed, hand-crafted features.

While this insight powerfully highlighted the benefits of huge, data-driven learning (the empiricist path), it's important to get that it's not a total dismissal of human understanding. Instead, it suggests that how human knowledge gets integrated matters—less about rigid, fixed rules, and more about creating architectures and environments where learning algorithms can discover patterns and rules for themselves, often at scales beyond what any human could intuitively grasp. This solidified the move towards data-driven, statistical approaches, showing that raw computing power and general learning algorithms were often the real keys to unlocking more advanced AI capabilities.

Then came 2017, and with it, the Transformer architecture. This was a game-changer for Natural Language Processing (NLP), mainly because of its innovative attention mechanism, that allowed ML models to process whole chunks of text way more efficiently to understand long-range connections. This paved the way for the rise of Large Language Models (LLMs) like BERT and early GPT versions, which started showing an uncanny ability to understand and generate human-like text.

None of this would have been possible without the absolutely massive development of hardware. The exponential growth in computing power and the development of specialized hardware like Graphics Processing Units (GPUs) and later Tensor Processing Units (TPUs) have completely changed the landscape that made statistical AI fail in the 80s. These advancements allowed for the huge parallel processing needed to train deep neural networks on truly enormous datasets, unlocking their true potential.

Finally, the explosion of open-source frameworks like TensorFlow and PyTorch, along with platforms like Hugging Face that shared pre-trained models, dramatically sped up deep learning research and adoption. This has fostered a truly collaborative global community, letting innovations spread fast and build on each other. It was a collective effort that truly launched AI into its current era.

The Generative Era (Early 2020s - Present)

And that brings us to today. The early 2020s have ushered in an era that has captivated the public like nothing else. In 2022, ChatGPT burst onto the scene, quickly followed by other groundbreaking generative models like DALL-E and Midjourney. These models can create novel and mostly coherent content across all sorts of formats: text, images, code, audio, and even video. Suddenly, AI isn't just analyzing or predicting; it is creating, as Lady Ada envisioned—and the implications of it are still unfolding. This shift has put large-scale generative models squarely in the spotlight, forcing us to rethink what machines are truly capable of.

This era also brings us full circle to that early dream sparked by ELIZA. Remember that simple, rule-driven chatbot from the 1960s? While ELIZA relied on hand-coded patterns and clever tricks to fake conversation, ChatGPT works on a totally different scale and principle. It's a purely statistical marvel, having learned the ins and outs of human language from massive amounts of data, rather than explicit rules. ChatGPT, and its generative cousins, represent a stunning realization of that long-held dream of conversational AI, really pushing the boundaries of what we thought was possible. It's a testament to how far we've come, from basic, rule-based chatbots to incredibly fluent, statistically driven ones.

But here's where the historical pendulum swings again, with a cool twist. While these statistical deep learning models achieve incredible scale and performance, they also show some inherent limitations. They can hallucinate facts, struggle with real common-sense reasoning, and often lack explainability. You might ask them why they made a certain decision, and they can't always tell you in a way that makes sense.

These limitations have sparked a renewed interest in neuro-symbolic AI. This isn't about picking sides anymore; it's all about integration. This emerging field aims to combine the best of both worlds: the pattern-recognition power of statistical models with the logical reasoning and structured knowledge of symbolic AI. Imagine using ontologies to "ground" a large language model's outputs, making sure its generated text sticks to factual consistency, or adding logical rules to make AI systems more robust, reliable, and easier to understand.

The historical struggle between symbolic and statistical AI is evolving into a quest for effective synthesis, aiming to combine the strengths of both paradigms to create something truly greater than the sum of its parts.

Conclusion

We've journeyed through decades of ambition, breakthroughs, and tough realizations. What we've seen is a constant back-and-forth, a dynamic dance between two powerful ideas: the precise, rule-based, inflexible logic of symbolic AI and the adaptable, pattern-based, unreliable power of statistical AI. This dance, as we've explored, often mirrors the philosophical tension between rationalism and empiricism.

Today, AI stands at a fascinating crossroads. While purely statistical systems have achieved incredible feats, especially in areas like conversational AI—where the dream that began with ELIZA now thrives in ChatGPT—, their inherent limitations are becoming clearer.

This brings us to a crucial realization: the future of AI likely isn't about one approach winning out over the other, but about intelligently combining them. Hybrid approaches, particularly neuro-symbolic AI, hold immense potential.

However, as we push the boundaries of what AI can do, we also have to face the serious challenges and ethical questions that come with it. The sheer power of these systems brings risks, from spreading misinformation and amplifying societal biases (which are often baked into their training data) to more complex issues around accountability and even the long-term, existential implications of creating truly autonomous and superintelligent entities.

By integrating symbolic reasoning and structured knowledge with the power of deep learning, we can build AI systems that are not only smart but also robust, explainable, and truly capable of common-sense reasoning, all while carefully navigating these potential problems.

The history of Artificial Intelligence is far from finished. AI is a living, breathing, civilization-wide project that encompasses all human endeavors, with the potential to transform society for the better—or, some believe, to become our ultimate doom. Everyone has a place here: technologists, yes, but also humanists, economists, historians, artists, politicians… The next few years, if anything, promise to be extremely exciting, and you can be a part of it.

AI-Driven Storytelling with Multi-Agent LLMs - Part III

Alejandro Piad Morffis — Mon, 07 Jul 2025 10:02:53 GMT

In the first two parts of this series, we explored the fascinating, chaotic world of emergent storytelling. We saw how complex narratives can arise from simple, "bottom-up" rules in Part I, and how LLM-powered agents can co-create stories through dynamic, unpredictable interaction in Part II. It’s a world of digital improv, where the story finds its own way.

But that’s only half the picture.

Those bottom-up approaches excel at creating novelty and believable micro-interactions. They are fantastic at answering "What happens next?" But what about the "top-down"? What about the grand narrative arc, the deliberate plot structure, and the long-term coherence that defines the stories we remember? This is where even the most advanced Large Language Models (LLMs) stumble. They are masters of prose but poor architects. They can write a beautiful paragraph, but struggle to build a cathedral.

In this final installment of the series, we close the loop. We'll explore a complementary, top-down approach based on another excellent thesis I supervised this year, this one by Roger Fuentes Rodríguez. His work focuses on high-level planning and structure. I'll argue that this architectural approach doesn't just produce better stories; it provides a powerful model for building more robust and governable AI systems.

LLMs are Brilliant Amnesiacs

Let’s backtrack to the root of the problem we’ve been discussing the past two weeks. If you've ever had a long, meandering conversation with an LLM, you've likely seen it happen. After a while, it starts to forget key details from the beginning of the chat. This isn't a bug; it's a fundamental feature of their design.

Think of it as a brilliant mind with no long-term memory. An LLM can only "see" the last few thousand words (tokens) of a conversation. And even in LLMs with Everything before that effectively ceases to exist. For short tasks, this is fine. For writing a novel, it's a disaster.

This limitation leads to critical failures in long-form storytelling:

Character Amnesia: A hero who is terrified of spiders on page 5 suddenly keeps one as a pet on page 50.
Plot Holes: The magical sword that can only be wielded by the pure of heart is inexplicably used by the villain to open a tin of beans.
Structural Breakdown: The story loses its narrative drive. The rising action plateaus, the climax never quite lands, and the resolution feels unearned, violating foundational structures like Freytag's Pyramid.

The core challenge is this: a story is not a linear sequence of words. It's a complex, interconnected web of causal relationships, character motivations, and thematic consistency. A monolithic LLM, with its limited memory, simply can't manage this web on its own. It needs an architecture.

The Writer's Room

Instead of relying on a single, all-knowing AI, the thesis proposes a "divide and conquer" strategy. The system's core is a central story blueprint, which a team of specialized AI agents collaborates on. It’s less like a single author and more like a Hollywood writer's room.

The Story Blueprint

At the heart of this system is a Directed Acyclic Graph (DAG). Forget the jargon for a second; think of it as the story's complete, interconnected timeline and causal web, all mapped out on a giant whiteboard.

Nodes are Events: Each point on the whiteboard is a specific event in the story. Not prose, but a structured object with the characters that participate, and all the necessary metadata.
Edges are Causality: The arrows connecting the events represent cause-and-effect. An arrow from "Hero finds the key" to "Hero opens the chest" means the first event must happen before the second. This simple rule makes temporal paradoxes and plot holes structurally impossible. You can't use the key before you find it.

This graph is the long-term memory the LLM lacks. It is the single source of truth that codifies the story's logic, ensuring every part is connected to the whole.

The Writers

The agents in this system aren't working in isolation. They are all reading from, and writing to, that central graph.

The Architect: This agent builds the initial skeleton of the graph. It takes the user's high-level prompt and lays out the main plot points—the inciting incident, the major turning points, the climax—as the first nodes on the graph.
The World Builder: This agent is the lore master. It goes through the graph and enriches the event nodes with crucial details: defining the characters, describing the locations, and specifying the properties of important objects.
The Drama Coach: This agent's job is to make the story interesting. It analyzes the graph's structure to find flat or boring sequences. It then adds or modifies nodes to inject conflict, suspense, or character development. It asks, "Wouldn't it be more interesting if the hero's mentor betrayed them at this point?" and adds that event to the graph.
The Dependency Manager: This is the ultimate fact-checker. It constantly validates the graph, ensuring there are no paradoxes or broken rules. It checks things like, "Does the character have the required item from a previous node before attempting this action?" or "Is this supposedly dead character trying to speak?"
The Narrator: Only when the graph is complete, enriched, and validated does this agent step in. It performs a "topological sort" of the graph (reading the events in a valid causal order) and, one node at a time, uses an LLM to translate each structured event into compelling prose, feeding it only the context it needs for that specific scene.

The Proof is in the Plot

Does this actually work? In short, yes.

The evaluation for the thesis involved having real users interact with the system via a Telegram bot and compare its output to that of a monolithic LLM. The stories generated by this structured, multi-agent system were consistently rated higher in structural coherence and narrative depth. The system excelled at maintaining consistent character motivations, building a more detailed and believable world, and avoiding the plot holes that plague simpler approaches.

But while this approach has clearly many strengths, it currently lacks in precisely what the previous articles excel: emergence and interaction. So, that’s our next step.

Unifying Top-Down and Bottom-Up

So where do we go from here? The clear path forward is to build a unified theory of AI storytelling, combining the top-down planning from this article with the bottom-up emergence and interaction from the previous two.

Imagine a hybrid system with two layers operating at once:

The Macro-Narrative (Top-Down): The story graph we've discussed acts as the "grand narrative," defining the key plot beats that must happen for the story to be satisfying.
The Micro-Narrative (Bottom-Up): Within each scene (each node of the graph), we unleash the autonomous, LLM-powered characters from the previous two articles. They have their own goals, personalities, and memories, and they interact freely, creating emergent and unpredictable dialogue and actions.

The magic lies in connecting these two layers. We reintroduce a crucial agent: the Director (or God/Game Master). The Director's job is not to puppet the characters. Instead, it subtly nudges the simulation. It knows the next required beat in the story graph is "The hero must discover the secret map." It can't force the hero to look for it, but it can introduce an NPC who mentions a rumor, make a book fall off a shelf to reveal a hidden compartment, or create a sudden downpour that forces the characters to take shelter in the very cave where the map is hidden.

This creates the best of both worlds: the structural integrity of a planned narrative, combined with the organic, believable, and often surprising behavior of autonomous agents. It’s the holy grail: a story that is both well-plotted and truly alive.

Beyond Storytelling as a Toy

Let's be clear: the goal of this research is not to automate creativity or replace human authors. The real value of computational storytelling lies elsewhere. It serves as the perfect playground for tackling some of the most critical open problems in Artificial Intelligence.

A story is a microcosm of our complex world. Forcing an AI to generate a coherent one is an extreme stress test for its most important faculties:

Reasoning: A good story is a monumental feat of causal reasoning. Characters must have consistent motivations, actions must have logical consequences, and plot threads must resolve. Maintaining this web of dependencies is a powerful way to measure and improve an AI's ability to reason in non-formal, unstructured, but still challenging scenarios.
Governance & Safety: Think of our writer's room architecture. The Dependency Agent acts as a safety and ethics system, enforcing the rules of the world. The Drama Coach agent governs the narrative, steering it towards a desired outcome (an "interesting" story) without violating core constraints. This is a perfect sandbox for studying AI alignment: how do we build systems that can pursue complex goals while adhering to a set of inviolable rules?

For decades, AI research advanced by mastering abstract games like Chess and Go. The breakthroughs required to win those games, particularly in deep learning, didn't just stay in the game. They became foundational for solving real-world scientific problems, most famously protein folding with AlphaFold.

I argue that storytelling is the next grand challenge for conversational AI. It's a game with infinitely more complex rules than Go, one that involves social dynamics, common sense, and long-term planning. The novel architectures we must invent to "master" storytelling—systems that can plan, reason, and govern themselves—could be the key to unlocking the next generation of safer, more robust, and more capable Artificial Intelligence.

And if you liked this article, feel free to check the full thesis (in Spanish but AI can translate it pretty well) and the repository to read some of the generated stories.

AI-Driven Storytelling with Multi-Agent LLMs - Part II

Daniel Ángel Arró Moreno — Wed, 18 Jun 2025 15:01:06 GMT

Photo by Ian Fajardo on Unsplash

In the previous article, we explored how multi-agent architectures can inject life and autonomy into AI-generated stories, allowing characters to pursue their own goals in a dynamic world. But what happens when you add a human to the mix? Interactive storytelling raises the stakes: now the system must not only maintain coherence and character consistency, but also respond—intelligently and flexibly—to unpredictable user input.

This is a much harder problem. The system must walk a tightrope: it needs to be consistent enough to avoid plot holes and character drift, but flexible enough to let the user meaningfully shape the narrative, even in ways the system never anticipated. In this second article I bring you our second undergraduate thesis at the University of Havana, taking on this challenge head-on.

If you haven’t read Part I of this series, I recommend starting there for the full context and motivation behind our research line. In short: we’re not trying to “solve” storytelling, but to use it as a demanding testbed for developing robust techniques in LLM governance, safety, and control.

The Core Problem

Let’s get to the heart of the matter: Why is interactive storytelling so hard? The answer lies in a fundamental tension—one that anyone who’s ever played a narrative game or written a choose-your-own-adventure story will recognize. It’s the struggle between agency (the user’s freedom to make meaningful choices) and control (the system’s responsibility to keep the story coherent, engaging, and believable).

Agency is what makes interactive storytelling magical. It’s the feeling that your decisions actually matter—that you can steer the story in unexpected directions or even break the mold of the narrative world. In a perfect system, you’d be able to do anything your imagination conjures: befriend the villain, burn down the tavern, or turn the hero into a poet. The system would adapt, improvise, and keep the experience compelling.

But here’s the rub: pure agency, without constraints, is a recipe for chaos. If the system simply accepts every user input at face value, the story can quickly unravel. Characters might act out of character, plotlines can contradict themselves, and the narrative world loses its internal logic. The result? A story that feels less like a crafted experience and more like a series of disconnected improv skits.

On the flip side, control is the system’s way of protecting the integrity of the story. Think of it as the invisible hand of the “narrative director”—the set of rules, memory, and logic that ensures events make sense, characters stay true to themselves, and the world remains believable. Control is what prevents the protagonist from suddenly teleporting to Mars or resurrecting a character who just died (unless, of course, the story’s logic allows for it).

But too much control, and the story becomes a railroad. The user’s choices are ignored, overwritten, or reduced to cosmetic differences. The narrative might be coherent, but it’s no longer interactive in any meaningful sense. The magic of agency is lost.

Our Proposal

Let’s open the hood on this architecture. The core idea is straightforward but powerful: break down the complex process of interactive storytelling into a series of specialized agents, each responsible for a distinct narrative function, and orchestrate their collaboration through a well-defined workflow. This modular approach is what allows the system to achieve both flexibility (adapting to user decisions) and consistency (maintaining narrative logic and emotional coherence).

We defined the following agents:

Orchestrator of Interactive Stories: Acts as the central coordinator. This agent triggers each phase in the correct sequence, manages the overall flow, and ensures all updates to the knowledge graph and narrative state happen in the right order. Without this conductor, the “orchestra” of agents would quickly fall out of sync.
Rule Extraction Agent: Extracts the fundamental rules and constraints of the story world (e.g., “magic is forbidden,” “time flows forward”). This agent sets the boundaries for all subsequent events, ensuring that the narrative remains internally consistent.
Key Beat Extraction Agent: Identifies the pivotal moments or “beats” from the user’s synopsis or the evolving story. These serve as narrative milestones, guiding the rhythm and progression of the plot.
Event Extraction Agent: Detects and classifies concrete narrative events, assigning actors and consequences. This agent feeds the knowledge graph with actionable story content.
Knowledge Graph Builder: Maintains a dynamic, structured representation of all entities, relationships, rules, and events. The graph acts as both memory and source of truth, ensuring that the story doesn’t lose track of details or contradict itself as it evolves.
Prompt Enricher: Gathers the most relevant context from the knowledge graph and narrative history, then constructs a rich prompt for the language model. This ensures that every new scene is generated with full awareness of what’s come before.
Act Director: Generates the next scene using the enriched prompt. This agent is responsible for advancing the plot while respecting the current state and constraints of the story world.
Character Simulator: Simulates the emotional and behavioral responses of characters to the latest events. This keeps character arcs believable and emotionally consistent.
Player Action Handler: Integrates the user’s decisions into the narrative. It updates the knowledge graph and narrative history, ensuring that user choices have real, lasting impact on the unfolding story.

Here is a brief overview of how all these agents collaborate in a typical interactive session.

Preparation Phase
- The Orchestrator initializes an empty knowledge graph and sets up the narrative environment.
- The Rule Extraction Agent processes the initial synopsis or world description to establish the core rules.
- All agents are initialized and ready to process input as the story unfolds.
Narrative Base Construction
- The Key Beat Extraction Agent identifies the main plot points from the user’s synopsis.
- The Event Extraction Agent breaks down the synopsis into actionable events.
- The Knowledge Graph Builder integrates these elements into the graph, establishing the initial state of the story world.
Interactive Development Loop
- Prompt Enrichment: The Prompt Enricher queries the knowledge graph for the most relevant context (recent events, character states, world rules) and constructs a prompt for the LLM.
- Scene Generation: The Act Director uses this prompt to generate the next scene, ensuring continuity and literary quality.
- Character Simulation: The Character Simulator infers and documents how each character reacts to the new developments, updating their emotional and behavioral states.
- User Interaction: The Player Action Handler presents choices to the user, receives their input, and encodes the resulting actions as new events in the knowledge graph.
- The cycle repeats, with each agent building on the outputs and updates of the others, ensuring that every new scene is both a logical continuation and a meaningful response to user agency.
Conclusion and Consolidation
- When the user signals the end of the story, the Orchestrator triggers the finalization phase.
- The Knowledge Graph Builder ensures all narrative threads are resolved and the story is logically complete.
- The system outputs the full, coherent narrative, along with a structured map of how the user’s choices shaped the journey.

How does this stack up?

Let’s get honest: it’s easy to make grand claims about “better stories” and “more engaging AI,” but how do you actually measure narrative quality in a rigorous, meaningful way? For this thesis, the evaluation was designed to pit the multi-agent system head-to-head against a baseline approach that used the same underlying LLM (Llama-3-8B-Instruct) but without any agentic orchestration. This setup ensured that any improvement could be attributed squarely to the architecture—not to a bigger model, more data, or secret sauce under the hood.

The Metrics

The evaluation used a mix of qualitative and structured criteria, focusing on the aspects that matter most for interactive storytelling. Here’s the breakdown:

Narrative Coherence: Does the story flow logically from scene to scene, or do we get abrupt jumps and plot holes?
Protagonist Agency: Do the user’s choices and the protagonist’s actions actually shape the direction of the story in a meaningful way?
Adaptation to User Input: Does the system integrate user decisions naturally, creating real consequences and new narrative branches?
Conflict and Tension: Is the story able to build suspense, escalate challenges, and avoid flat or artificial drama?
Originality and Creativity: Are the story elements fresh and surprising, or do we just get recycled tropes?
Clarity and Literary Style: Is the writing evocative, immersive, and stylistically rich?
Thematic Consistency: Are the core themes and motifs reinforced throughout, or do they get lost along the way?
Emotional Consistency: Do the characters’ emotions evolve logically and believably?
Context Maintenance: Does the system remember important details, settings, and relationships as the story unfolds, or does it “forget” as it goes?

Each experiment involved presenting both systems with the same narrative synopsis and a sequence of user choices. Human evaluators then compared the resulting stories using these criteria, looking for both strengths and weaknesses in each approach.

The Findings

The results were clear and consistent across multiple story scenarios. The multi-agent system outperformed the baseline on almost every metric that matters for interactive storytelling.

Stories generated by the multi-agent system maintained a logical chain of cause and effect, with smooth transitions between scenes and a strong sense of narrative momentum. The protagonist’s decisions had real weight, steering the story into new territory and producing consequences that felt both meaningful and surprising. User choices were not just tacked on—they shaped the evolution of the plot, the emergence of conflict, and the protagonist’s emotional journey.

Perhaps most striking, the multi-agent system produced narratives with richer literary style and more vivid descriptions, even though it used the same LLM as the baseline. This suggests that the way you structure and feed context to the model—through agents that carefully curate, update, and enrich the narrative state—can unlock much more of the LLM’s creative potential.

The system also excelled at maintaining context and thematic unity. Key details, settings, and character motivations were carried forward across scenes, avoiding the notorious “memory loss” problem of vanilla LLMs. Emotional arcs were more believable, and the stories avoided the repetitive, cliché-driven traps that often plague automated narrative generation.

By contrast, the baseline system struggled with abrupt transitions, shallow integration of user choices, and a tendency toward flat or generic storytelling. Protagonist agency was often illusory—choices rarely changed the story in a meaningful way. The writing, while functional, lacked the immersive quality and emotional resonance achieved by the multi-agent approach. Context was easily lost, leading to inconsistencies and a less engaging experience overall.

Final Thoughts

This thesis, and all our complementary research, proves you don’t need to fine-tune, retrain, or scale up your language model to get dramatically better results. By layering a clever multi-agent system on top of a standard LLM, you unlock coherence, adaptability, and user alignment that brute-force training simply can’t deliver. The architecture is the key.

Why does this matter? Because the core problems of interactive storytelling—long-term memory, integrating unpredictable user input, keeping characters and worlds believable—are exactly the challenges we face in building safe, controllable AI everywhere. The multi-agent architecture acts as a governance layer, orchestrating the model’s raw power through explicit rules, dynamic memory, and transparent workflows. It’s not just about telling better stories; it’s about showing that smart design can outpace brute force.

If we want AI that’s powerful and safe, we shouldn’t just throw more data or compute at the problem. We should design smarter systems—ones that let us steer, audit, and trust their outputs.

If you want to check the generated stories and read the full thesis (in Spanish), check this GitHub repository.

AI-Driven Storytelling with Multi-Agent LLMs - Part I

Franco Hernández Piloto — Mon, 16 Jun 2025 11:02:42 GMT

Photo by Nick Karvounis on Unsplash

If you’ve ever tried to coax a language model into writing a long, coherent story, using just prompt engineering, you’ve probably hit a wall: characters lose their personalities, plots meander or stall, and the whole thing feels more like a sequence of clever paragraphs than a living narrative.

Precisely this topic is one major research line in my group at the University of Havana. Right now, there are three undergrad students wrapping up their theses on AI-driven story generation from different perspectives and strategies.

The common idea underlying this whole research line is that combining multi-agent systems and traditional symbolic AI with LLMs in well-designed workflows can overcome many of the limitations of pure LLM-based story generation. Crucially, we aim to explore ways of improving story generation without any form of fine-tuning or retraining—that is, no need to adjust model weights.

In this and follow up articles, I'll partner with my students to bring you a few high-level summaries of what they've done and found.

This first article (and the corresponding thesis) is about story emergence. We set loose a few characters in an AI-driven world and let them interact, to see what kinds of stories come out. But there is a catch: we want some level of control, but not too much.

The motivating question, then, was how can we introduce some mechanism for indirect control of a story while still allowing characters to evolve more or less naturally and plot points to emerge.

Let's see what Franco came up with.

Subscribe now

Why this matters

Let’s get something out of the way first: the purpose of this research is not to “solve” storytelling, nor to replace writers, artists, or the creative process itself. Storytelling is a deeply human craft, and no one here is under the illusion that a handful of LLM-driven agents will compose the next literary masterpiece.

So, why do we invest so much effort into building systems that generate stories?

The answer is both practical and strategic. Storytelling, especially in the form of simulated worlds with autonomous agents, is a uniquely demanding environment for testing the capabilities—and limits—of large language models. It’s a domain where long-term coherence, character consistency, planning, and subtle control all collide. In other words, it’s the perfect laboratory for exploring how to govern and steer the behavior of powerful generative models without sacrificing their creativity or flexibility.

This is not unlike the role that games like chess and Go played in the development of AI search and planning algorithms. Those domains were never the end goal; rather, they were controlled, well-understood environments where researchers could rigorously test new ideas. The techniques honed in those settings—like Monte Carlo Tree Search—eventually found their way into applications as far-reaching as protein folding and robotics.

In the same spirit, we use storytelling as a proving ground for strategies of indirect control, agent autonomy, and emergent behavior in LLMs. Here, we can measure and observe how different architectures balance autonomy and direction, how memory and planning affect long-term coherence, and how subtle interventions shape complex outcomes. The lessons we learn in this bounded, creative sandbox are directly relevant to much broader and higher-stakes domains: from AI assistants that must follow nuanced instructions, to multi-agent systems in logistics, education, or even critical infrastructure.

Ultimately, the goal is to develop robust, generalizable techniques for controllable and safe AI. By pushing the limits in a domain as rich and challenging as narrative simulation, we’re laying the groundwork for systems that can be trusted to act autonomously, adaptively, and in alignment with human intentions—no matter the context.

And if we get a few fun stories along the way, all the better.

Why pure LLM-driven storytelling falls short

Large Language Models (LLMs) have changed the game for natural language generation. They’re great at producing short, contextually rich responses and can even simulate dialogue or simple stories with impressive flair. But as soon as you ask them for something more ambitious—a novel-length mystery, a world populated by autonomous characters, or a story that evolves over dozens of turns—the cracks start to show.

The main issues? We identified some well-known limitations in LLM-driven storytelling.

Long-term coherence: LLMs forget what happened a few thousand tokens ago, even if their context is larger.
Character consistency: Personalities drift, motivations vanish, and “out-of-character” moments abound.
Proactivity: Agents react, but rarely plan or pursue long-term goals in a believable way. There is no planning ahead of time, it's all reactive.
Narrative control vs. autonomy: Too much authorial intervention and characters turn into puppets; too little and the story meanders or stalls. It's hard to craft just the right prompt.

These are not just academic complaints. If we want LLMs to power the next generation of interactive fiction, virtual worlds, or even collaborative writing tools, we need architectures that can balance control, coherence, and genuine emergence.

Our idea? Agents, lots of LLM agents

Our approach borrows a page from both agent-based modeling and narrative AI research. Instead of a single omniscient narrator, we simulate a society of autonomous agents—each powered by its own LLM instance, each with its own identity, memory, and goals—interacting in a shared, dynamic environment.

But here’s the twist: rather than scripting the story or directly controlling the agents, we introduce a “Director” agent. The Director never tells the agents what to do. Instead, it manipulates the environment—e.g., changing the weather, introducing objects, setting up casual events—and lets the agents interpret and react according to their personalities and memories.

Think of it as setting the stage and dropping hints, not pulling strings—more as some postmodern, emergent theater play than a traditional movie script. Actors have some constraints but they are free to pursue whatever goals they desire.

Here are the key architectural components that make the whole system come together:

LLM-driven agents: Each with its own memory (short-term and long-term reflection), planning, and perception modules.
World state: A mutable environment that records locations, objects, events, and global properties.
Action resolver: Ensures agent actions are valid and consistent with the world.
Event dispatcher: Manages what each agent perceives, maintaining a plausible flow of information.
Director: Observes the world and subtly nudges the narrative by changing the environment, not the agents.

To test all these ideas, we built a prototype implementation of the proposed architecture in Python, leveraging Google’s Gemini 2.0 Flash Lite for all LLM tasks. Each component—agent, action resolver, director—gets its own LLM instance and carefully tuned generation parameters. Memory is handled outside the LLM, with dual-level storage: agents remember both recent events and distilled reflections, which are periodically generated and used to inform future decisions.

The Director’s interventions are strictly limited: it can, e.g., change the weather or add objects to locations here and there, but never force an agent’s hand.

Timing is granular—the Director considers whether to intervene before every agent’s turn, allowing for context-sensitive, minimally intrusive direction.

How does this stack up?

To see if this architecture actually improves narrative generation, we ran head-to-head comparisons between stories generated by the multi-agent system and those produced by a monolithic LLM given the same scenario. We evaluated on several axes:

Coherence and plot progression
Character consistency
Originality and emergent plot richness
Prose quality
Narrative pacing and suspense

What did we find?

The multi-agent system produced far more believable, consistent characters and surprising plot developments. Because agents had their own memories and goals, their actions made sense and sometimes surprised even us. However, the monolithic LLM still wins on sheer polish—its stories are more linear, its prose more refined, and its pacing more controlled.

There’s an inherent tension here. You can have tight, well-structured prose (monolithic LLM) or you can have emergent, believable characters and plots (multi-agent simulation), but getting both at once remains a challenge.

Some of the most compelling moments came from the Director’s indirect interventions:

Dropping a key object in the right room led an agent to discover it and set off a chain of events that advanced the plot.
Changing the weather or introducing an environmental clue shifted the focus of a conversation or escalated a conflict, without ever breaking the agents’ autonomy.
Sometimes, interventions were ignored or interpreted in unexpected ways—an important reminder that true emergence is unpredictable.

Limitations and Next Steps

No system is perfect. The current prototype has its share of constraints:

Agents can only interact with objects in their current location.
There’s no persistent, explicit inventory management, so long-term planning with items is limited to what LLMs can remember.
Agent-to-agent interactions are mostly conversational; direct state changes (like “killing” another agent) aren’t supported yet (we'd need to improve the action resolver to understand these intentions).
The Director’s toolkit is intentionally minimal, which limits the subtlety and richness of its interventions.

Future work should expand environmental manipulation, improve memory persistence, and explore hybrid approaches that combine the strengths of both architectures.

Final Thoughts

This work is less about “solving” narrative generation and more about charting a new direction. By treating characters as autonomous agents and narrative direction as an emergent property of environmental context, we get stories that feel less like scripts and more like living worlds. The trade-off is real, but so is the potential: with the right architecture, LLMs can do more than just write—they can simulate societies, surprise their creators, and perhaps even teach us something about how stories really emerge.

If you’re interested in the technical details or want to see some sample stories, check out the full thesis, and read a few generated stories.

And if you have ideas for making these systems even richer, leave us some comments.

Towards Reliable, Consistent, and Safe LLM-based Agents

Alejandro Piad Morffis — Sat, 24 May 2025 15:32:45 GMT

Photo by Mitchell Luo on Unsplash

Why is it so hard to build conversational AI agents that can reliably solve complex problems, follow strict safety rules, and coordinate seamlessly with other agents? The answer, I believe, lies in three foundational challenges that any robust AI system must overcome: reasoning, governance, and orchestration. These are not just abstract technical terms—they represent real, practical hurdles that block the path to trustworthy, domain-specific AI assistants capable of handling multi-turn conversations in business-critical settings.

Reasoning is about more than just generating fluent text. It demands the ability to solve complex problems that often require composing multiple tools and performing iterative reasoning loops. An AI agent must decide which tools to call, in what order, interpret intermediate results, and know when to stop. Without this capability, AI responses remain shallow, inconsistent, or outright wrong when faced with intricate tasks that humans solve by chaining logical steps and external resources.

Governance tackles a different but equally important problem: ensuring the AI follows strict guidelines for safety, reliability, and ethical behavior throughout the conversation. This means preventing harmful outputs, avoiding jailbreaking attempts, and maintaining alignment with domain-specific rules. Crucially, governance also requires transparency and auditability so that AI decisions can be inspected and held accountable—an absolute must in regulated or sensitive environments.

Orchestration is the glue that holds everything together in complex AI ecosystems. It involves managing interactions among multiple specialized agents and tools in loosely coupled, distributed systems. Effective orchestration enables these diverse components to collaborate autonomously, handling multi-turn workflows and evolving conversational states without brittle dependencies or centralized control. Without it, scaling AI assistants beyond simple single-agent setups becomes impractical.

In this article, I will explore these three core problems in detail, laying bare the challenges for building useful conversational AI systems. Understanding them precisely is the first step toward building systems that are not only intelligent but also safe, reliable, and scalable. I will also propose my own vision for how to tackle these challenges today, working within and around all the current limitations of LLMs. Finally, I will introduce a Python framework designed to tackle reasoning, governance, and orchestration head-on, providing principled building blocks for the next generation of conversational AI.

The problems

Let's start by analyzing these three core issues in detail: reasoning, governance, and orchestration. I will outline why I think these are fundamental roadblocks to building truly effective LLM-based agents and systems. Later, I will explain what I think are ways to move forward with the tools we have today.

Reasoning

At first glance, reasoning in the context of AI agents might seem like simply knowing what each individual tool or API endpoint does. But the real challenge lies in the higher-level, tacit knowledge of how to combine these atomic tools effectively to solve complex domain-specific problems. Domain experts intuitively understand sequences of tool calls, conditional logic, and iterative steps required to reach a solution.

However, this kind of procedural expertise is rarely captured in low-level tool descriptions, which typically focus on inputs, outputs, and basic functionality. Expecting a language model to infer the correct combination and sequencing of tools solely from these isolated descriptions is unrealistic. The nuanced decision-making about when and how to chain tools together is a form of knowledge that is difficult to extract or emerge naturally from atomic tool specs alone.

Beyond the difficulty of tool composition, reasoning also suffers from the lack of structure in the outputs generated by LLMs. When all you have is natural language, it becomes extremely challenging to reliably perform multi-step inference, especially when reasoning requires loops, conditional branches, or repeated evaluation of intermediate results. Natural language is inherently ambiguous and unstructured, making it hard to enforce consistency or correctness across iterative reasoning steps. Without explicit control flow or a formal mechanism to track state, attempts to encode complex reasoning purely in natural language prompts tend to be brittle, error-prone, and difficult to validate or debug.

This means reasoning in conversational AI takes more than knowledge of individual tools or fluent text generation. It requires bridging the gap between atomic tool capabilities and high-level procedural knowledge, as well as moving beyond unstructured natural language to support systematic, verifiable inference processes involving loops, conditions, and multi-step tool use.

Governance

Governance is one of the toughest nuts to crack when building conversational AI for real-world applications. At its core, governance means making sure your AI agent follows strict company policies, legal regulations, and ethical guidelines—all the time. These aren’t just vague suggestions; they’re hard constraints designed to keep users safe, protect sensitive data, and ensure compliance with industry standards. The problem is that maintaining this level of discipline over multi-turn conversations is incredibly difficult. Agents tend to drift from prescribed behavior as conversations grow longer, and even minor lapses can have outsized consequences in business-critical or regulated environments.

Add to this the constant threat of jailbreaking and prompt injection attacks—clever ways users or adversaries try to trick the AI into ignoring its guardrails or producing harmful outputs. It’s like trying to keep a fortress secure when the attackers keep inventing new siege engines. Because conversational agents operate in open-ended, adversarial settings, they must be robust against a wide spectrum of malicious inputs. Preventing these exploits isn’t just about patching holes; it requires a proactive, systematic approach to detect, block, and mitigate attempts to subvert the AI’s intended behavior.

Finally, governance demands transparency and auditability. It’s not enough for an AI to “just behave.” We need to understand why it made a particular decision or gave a certain answer. This is essential not only for debugging unexpected behavior but also for ensuring fairness, building user trust, and meeting regulatory requirements. Imagine trying to explain a loan denial or a medical recommendation without a clear, traceable rationale—this is where transparency becomes a non-negotiable. Without it, deploying AI in sensitive domains is a leap of faith rather than a calculated risk.

Orchestration

If reasoning is the brain, and governance the rulebook, orchestration is the central nervous system of a conversational AI system. At first glance, it might seem like the most straightforward problem—after all, distributed systems aren’t new. But here’s the catch: when you’re dealing with LLM-powered agents that combine stochastic reasoning with deterministic tool use, even “simple” coordination becomes deceptively complex.

Let’s break it down. Imagine you’re building an email assistant ecosystem with three agents: one fetches messages, another summarizes content and extracts actionable items, and a third adds events to calendars. Each agent operates autonomously, but they need to collaborate seamlessly. The fetch agent might process thousands of emails hourly, the summarizer needs to handle variable-length content, and the calendar agent must interface with multiple external APIs. Now scale this to hundreds of specialized agents working asynchronously across time zones and user bases.

The real challenge isn’t just making these agents work—it’s making them work together in a system that’s both flexible and bulletproof. You want to add new agents dynamically (say, integrating a billing system, or reading a new source like Slack) without destabilizing existing workflows. You need horizontal scaling: if email volume spikes, spinning up more fetch agents should be as easy as launching new processes. Crucially, the system must remain distributed, avoiding single points of failure while maintaining coherent state across interactions.

This demands a delicate balance. Traditional microservice architectures solve similar problems, but LLM-powered agents introduce new wrinkles. First, conversational agents often carry context across interactions, unlike stateless API calls. Also, asynchrony is crucial—an agent might need to pause mid-task waiting for user input or external service responses. All of this with agents that range from simple deterministic tools (like a calendar API wrapper) to complex reasoning modules making probabilistic decisions.

The solutions

Let’s get to the heart of the matter: how do we actually solve the thorny problems of reasoning, governance, and orchestration in conversational AI? My thesis is that the answer lies in three key ideas that work together to bridge the gap between high-level intelligence and low-level execution, while ensuring control and scalability.

First, skills act as a powerful abstraction that captures domain expertise on how to combine multiple tools into coherent workflows. Second, structured reasoning moves us beyond free-form language outputs to well-defined, machine-readable formats. Finally, asynchronous message passing provides the backbone for orchestrating multiple agents working on different tasks.

Together, these concepts form a principled foundation for building conversational AI systems that are not just smart, but safe, reliable, and scalable. In the sections ahead, I’ll unpack each idea and show how they collectively address the core challenges we’ve laid out.

Skills

A skill is a semi-structured workflow that captures domain knowledge about how to solve a specific problem by combining multiple tools and prompts. It acts as the crucial bridge between the high-level reasoning capabilities of an LLM and the granular, atomic operations exposed by individual APIs or services. Skills encode not only what needs to be done but how to do it—capturing procedural knowledge that’s often tacit and difficult to extract from isolated tool descriptions alone.

Skills can be as flexible or as restrictive as necessary, blending natural language prompts with traditional programming constructs like conditionals and loops. On one end of the spectrum, you might have a skill that’s essentially a general-purpose chat interface: it provides the LLM with some basic instructions and lets it generate free-form responses with minimal procedural control. This kind of skill is highly flexible but offers limited guarantees about behavior or output structure.

On the other end, consider a skill designed for a complex enterprise application, such as an ERP system. This skill might invoke several API endpoints in sequence, carefully checking each response with targeted prompts to verify correctness. It uses conditionals to decide which tools to call next, loops to handle iterative processes like paginated data fetching, and error handling to manage unexpected results. Here, the skill acts like a finely tuned program, encoding domain-specific workflows that ensure reliability, adherence to business logic, and precise control over the agent’s actions.

By combining prompts with executable code, skills provide a powerful abstraction that closes the gap between the conceptual reasoning of LLMs and the concrete, deterministic operations required to solve real-world problems.

Structured Reasoning

Structured reasoning is about asking the LLM to respond not with free-form text, but with well-defined, machine-readable objects—think JSON with explicit fields. Why? Because this approach lets us do three crucial things. First, we can verify the output of each reasoning step before moving on, catching errors early instead of letting them cascade. Second, we can write procedural code that ties reasoning steps together using conditionals and loops, turning the AI’s output into a controlled, repeatable workflow rather than a one-shot guess. Third, structured outputs make it crystal clear what reasoning paths the LLM is exploring, which is invaluable for transparency, debugging, and governance.

This concept fits perfectly with skills, which encode the procedural knowledge of how to use specific tools in a specific order. Structured reasoning provides the scaffolding that lets skills define complex workflows reliably.

For example, consider the ReAct reasoning paradigm, where the LLM’s response includes distinct concepts like observation, thought, and action. Instead of parsing ambiguous text, we can get a structured object where procedural code can check if the agent should loop, invoke a tool, or stop. This makes multi-step reasoning systematic and auditable, rather than brittle and opaque. In short, structured reasoning transforms the AI’s “chain of thought” from a fuzzy narrative into a precise, verifiable sequence of steps.

Asynchronous Collaboration

To solve orchestration, let’s consider the goal of having multiple specialized agents working together to solve complex problems. These agents will be autonomous programs—not just chatbots—that fetch data, summarize content, generate insights, and update databases independently. To enable this kind of collaboration, we need an asynchronous architecture based on message passing. Why? Because it decouples agents in time and function, allowing each to operate at its own pace without waiting on others, making the system more robust, scalable, and flexible.

First, asynchronous message passing ensures robustness. If one agent fails or needs to restart, it doesn’t bring down the entire system. Other agents continue processing messages independently, so the system remains available and resilient. Second, scalability becomes straightforward: to handle increased load, you simply add more agents of the same type. For example, if the volume of requests spikes, you spin up more fetch agents without disrupting the rest of the workflow. Third, agents remain loosely coupled by communicating through typed messages—clearly defined requests and responses that ensure everyone understands what’s being asked or reported. This loose coupling means you can add or remove agents dynamically without breaking the system.

Returning to our email assistant example, the fetch agent pulls messages from various inboxes and posts them to a shared message board. The summarizer agent reads those messages, extracts key points, and posts summaries as new messages. The calendar agent listens for actionable items and schedules events accordingly. If the workload increases, more fetchers or summarizers can be added seamlessly. Introducing a new data source, like Slack, is as simple as adding an agent that posts Slack messages in the same format. Want to add sentiment analysis? Just introduce another agent that processes existing messages without disrupting the rest.

This asynchronous, message-driven architecture turns a complex multi-agent system into a flexible, scalable, and resilient network. It enables conversational AI ecosystems that can grow organically, adapt to changing demands, and maintain smooth operation even when individual components fail or change—exactly what real-world deployments require.

Putting it all together

Now let's put this whole vision together. My proposal is to tackle any sufficiently complex problem—of the type we've been talking—with a tapestry of specialized agents, each built for a concrete task. Some of these agents may be conversational, handling user queries, managing dialogue, and providing explanations. Others are more traditional, working quietly in the background to fetch data, update records, or trigger workflows. Still others leverage LLMs not for chat, but for specific NLP tasks: summarizing documents, extracting structured information, or generating reports. What unites them all is a common foundation built on skills and structured reasoning.

Each agent, whether it’s orchestrating a conversation or crunching through a batch of emails, is powered by domain-specific skills that encode exactly how to combine tools and prompts to solve the task at hand. Structured reasoning ensures every step is explicit, verifiable, and traceable—so you always know not just what the system did, but why it did it. This transparency is invaluable for debugging, auditing, and demonstrating compliance, while the skills themselves serve as living documentation of domain expertise, written and maintained by experts.

These agents don’t operate in isolation. They’re coordinated via asynchronous message passing, exchanging typed requests and responses through shared message boards. This architecture allows each agent to function independently, scaling up or down as needed, and makes the entire system robust to failures—if one agent crashes or needs to be updated, the rest keep humming along. It’s a flexible, loosely coupled ecosystem where adding new capabilities is as simple as introducing a new agent that speaks the same message language, and where complex domain problems are solved through the emergent collaboration of many specialized parts.

The result is a system that’s not just intelligent, but, hopefully, also safer, more reliable, and more adaptable than existing solutions. Traceability and transparency are baked in at every level, thanks to structured reasoning and explicit skill design, and not duct-taped as a forethought to ensure compliance. Strict governance is enforced through domain-specific skills authored by experts, ensuring that every action is aligned with business rules and regulatory requirements, and can be updated at any moment with ease.

If this vision resonates with you, consider ARGO—a Python framework for LLM agents built from the ground up around these key principles of agent-based reasoning, governance, and orchestration. ARGO is intentionally unopinionated: it has zero dependencies on any LLM framework, doesn’t tie you to specific backend or communication technologies, and lets you implement any agentic or reasoning paradigm you can imagine, from simple CoT and ReAct to all forms of dynamic planning and problem-solving. I like to think of it as the FastAPI of LLM agents: simple, modular, and designed for real-world flexibility.

Now, don't get me wrong. all of this is still in active development, so I'm not claiming the problems of reasoning, governance, and orchestration are solved. Furthermore, even if this vision crystallizes in its best possible form, there are still potentially unsurmountable limitations in LLMs regarding explainability, reasoning, and control, that may require some fundamental breakthrough. But, for all of their inherent and current limitations, I still believe LLMs are one of the most powerful technologies we have today to build truly transformative computational systems.

If you’re interested in building the next generation of reliable, transparent, and governable AI systems, I humbly think the paradigm explained in this article is a reasonable path forward, and, by extension, ARGO may be a project worth watching—and perhaps even contributing to :)

If there is enough interest, I can write an article on concrete implementations of real-world use cases leveraging these principles. Just let me know in the comments.

Understanding Large Language Models

Alejandro Piad Morffis — Wed, 19 Mar 2025 11:03:18 GMT

The following article is extracted from Chapter 1 of Mostly Harmless AI, a 160-page book on LLMs and everything you can, and cannot do with them. The book is in beta stage, meaning it has basically the final structure and content, but major revisions are still in progress. You can get it today at a 50% discount and ensure access to all future versions, including huge discounts on printed versions, as well as a community of like-minded readers.

What is a (Large) Language Model?

In machine learning, language modelling means guessing how likely a given sentence is.

For example, "the sun rises in the east and sets in the west" is a typical sentence with a high subjective probability. You would probably agree this is a sentence you’re likely to hear at least one. But a sentence with random words that don't mean anything has a low probability of ever being uttered by anyone, or written in a book.

Language modelling can be tricky because it's hard to say how likely a sentence is to “exist”. What does it even mean? In machine learning, we use a collection of texts called a corpus to help with this. Instead of the abstract, ontological question, we might ask something much more straightforward: How likely is it for this sentence to appear in all the written text, for example, on the internet?

However, if we only looked at sentences that already exist on the internet, language modelling wouldn't be very useful. We'd just say a sentence is either there or not, with a probability of 0 or 1. So instead, we can think about it in statistical terms like this: If the internet was made and erased many times, how often would this sentence show up?

To answer this question, we can think about whether a word will likely come after a group of words in a sentence. For example, "The sun rises in the east and sets in the..." What word would most likely come next? We want our language model to be able to guess that word.

Thus, we need to know how often a given word appears after a group of words. If we can do that, we can find the best word to complete the sentence. We keep doing this repeatedly to create sentences, conversations, and even full books.

Now, let's talk about the most common way to make this language modelling program work in practice. It's called statistical language modelling. We start with lots of text and learn how words correlate with other words. That is, we estimate the correlation of each word with a given context.

In simple terms, a context is a group of words around a specific word in a sentence. For example, in the sentence "the sun rises in the east and sets in the west," the word "east" is in the context of "{ the, sun, rises, and, sets }." If we look at many sentences, we can find words that are often in the same context. This helps us understand which words are related to each other.

For example, if we see "the capital of France is Paris" and "the capital of the United States is Washington," we can learn that Paris and France, as well as Washington and the United States, are related. They have the same relationship: being the capital of a country. We might not know what to call this relationship, but we can know it's the same type.

Statistical language modelling is thus making a model that can guess how often a word appears in a certain context, by using lots of data. This doesn't necessarily mean it truly understands the meaning of a sentence. But if we use enough data, it starts to look like the model can indeed capture at least some of the semantics.

The Simplest Language Model: N-grams

We've been building statistical language models since the early days of AI. The n-gram model is one of the simplest ones, storing the probability of each n-gram's occurrence. An n-gram is a collection of n words that appear together in common sentences. For example, in a 2-gram model, we count how many times pairs of words appear together in a large corpus, creating a table showing their frequency.

As we increase the n-grams to 3, 4, or 5, the collection of all n-grams becomes extremely large. Before the deep learning revolution, Google built a massive n-gram model from the entire internet with up to 5-grams. However, since the combination of all 5 words in English is huge, it only stored probabilities for the most common combinations.

This simple model counts words in a strict context when they're within a specific window size. It's very explicit, as each n-gram's probability or frequency is recorded. To compress this model further, we use embeddings.

Word Embeddings

An embedding is a mapping of some object—say, a word—to an n-dimensional vector of real numbers. Embeddings aim to transform semantic properties from an original space—i.e., words—into numerical properties of the embedding space. That is, we want words that occur together in context to map to similar vectors and form clusters in the embedding space.

Word2Vec, in 2011, was the first massively successful use of embeddings. Google trained a large embedding model using statistics from text all over the internet and discovered an amazing property: directions in the embedding space can encode semantic properties.

For instance, if you go take France and Paris, the vector needed to add to the word “France” to reach “Paris” is similar to the vector needed to add to the word “United States” to reach “Washington”. The semantic property has-capital was encoded as a specific direction in this space. Many other semantic properties were found encoded this way, too.

This was an early example of how encoding words in a dense vector space can capture some of their semantics.

Contextual Word Embeddings

The issue with Word2Vec is its assignment of a unique vector to each word, regardless of context. As words have different meanings in different contexts, many attempts were made to create contextual embeddings instead of static ones.

The most successful attempt is the transformer architecture, with BERT being the first example. The first transformer paper revolutionized natural language processing in artificial intelligence, providing a single tool to tackle various NLP problems.

A transformer is a neural network that generates a embedding of each word in the input text by considering the entire content of the input. This means each word's embedding changes according to its context. Additionally, a global embedding for an entire sentence, paragraph, or general fragment of text can be computed.

Why does this matter? Neural networks are among the most powerful machine learning paradigms. We can find embeddings for text, images, audio, categories, and programming code using a single representation. This enables machine learning across multiple domains using a consistent approach.

With neural networks, you can transform images to text, text to image, text to code or audio, etc. The first idea of the transformer was to take a large chunk of text, obtain an embedding, and then use a specific neural network for tasks like text classification or translation. But then, you can build sequence-to-sequence architectures that allow a neural network to receive a chunk of text, embed it into a real-value vector, and generate a completely different chunk of text from it.

For example, you can encode a sentence in English with a transformer that embeds it into a real-value vector and then decode it with another transformer that “speaks” French. The real-value vector in the middle represents the meaning of the text independent of language. So, you can have different encoders and decoders for various languages and translate any language pair.

A remarkable phenomenon is that you can train embedings in pairs of languages like English-Spanish and German-French and then translate from English to French without ever training on translations from English to French. This is due to using a shared internal representation for all languages. The sequence-to-sequence transformer is a fundamental piece behind technologies like ChatGPT. The next step is training it on massive amounts of text and then fine-tuning it for specific tasks.

Large Language Models

Large language models are the latest development in statistical language modelling, evolving from N-Gram models, embeddings, and transformers. Thanks to innovations that efficiently accommodate thousands of words in memory, these advanced architectures can compute contextual embeddings for extensive text contexts. This capacity has increased continuously, with the first version of ChatGPT holding something like 4000 words, while recent models hold anything from 30 thousand to a couple million words in the context!

A significant change is the scale of data on which these models are trained. BERT was trained on a vast dataset for its time, but it pales in comparison to GPT-2, 3, and 4. Large language models learn from a massive amount of internet text, including technical texts, books, Wikipedia articles, blog posts, social media, news, and more. This exposure to diverse text styles and content allows them to understand various mainstream languages.

Large language models, like GPT-2, generate text by predicting the next word in a sentence or paragraph, just like all previous language models. But when you combine the massive scale of the data and computational resources put into making these beasts of language models, and some clever tricks, they become something completely beyond what anyone thought possible.

GPT-2 was a huge leap forward in terms of coherent text generation. Given an initial prompt—say, the introductory paragraph of a fictional story—the model would generate token after token creating a mostly coherent story full with fictional characters and a plot. After a while it would start to diverge, of course, but for short fragments of text, this was already mindblowing.

However, things really exploded with GPT-3. At this size, emerging capabilities like "in-context learning" appeared, and this is where our story really begins.

How do LLMs work?

A generative language model, at its core, is just a statistical machine learning model trained to predict the continuation of a text sequence. Essentially, it's a prediction machine. You input a text prefix, run it through the model, and receive the most likely next token--a token is more or less a word or component of a word.

Actually, you don't really get just the next most likely token. The model provides a distribution across all possible tokens, giving you the probability of each one being the next continuation.

To use an LLM, we start with user input, like a query or text prefix, and run the model to get the next input token. We append it to the sequence and repeat the whole process until reaching a maximum number of tokens or the model predicts a special STOP token.

There are choices to make in this process. Choosing only the most likely continuation can quickly lead to repetitive predictions. Instead, you can choose from the top 50 most likely tokens at random, weighted by their probability. This injects some variety in the generated and is the reason why, for the same prompt, you can get different albeit semantically similar responses.

There are a few key parameters in this sampling process: the top K tokens to choose, the cumulative probability, and the temperature, which is the most relevant. The temperature is a parameter that affects the weights of the tokens you will pick for continuation. If the temperature is 0, you'll usually choose the most likely token. If it's higher, probabilities are smoothed out, making it more likely to choose less probable tokens. This increases the model's variability.

That's why some call high-temperature "creative mode" and low-temperature "precise mode." It has nothing to do with actual precision or creativity, just how deterministic the response to a given prompt will be.

From this perspective, you can already see why some people say language models are "just autocomplete on steroids". Indeed, that is the gist of their mechanics: you're completing a text sequence by adding one token at a time until you decide to stop. However, this is just scratching the surface. There is so much more involved in getting these models to behave in a useful way, and we will talk about some of those aspects in the next section.

But before moving on, here is a key insight from this explanation of how LLMs work: A language model always performs a fixed amount of computation per token.

This means that whatever limited form of "reasoning" can happen in an LLM, the depth and complexity of that reasoning is directly proportional to the number of total tokens the model processes. This implies two things:

If the input prompt is larger, the model will perform more computation before starting to compute its answer. This is part of the reason why more detailed prompts are better. But crucially, if the output is larger, the model is also doing more computation.

This is why techniques like chain-of-thought—and basically anything that makes a model "talk more"—tend to improve their performance at some tasks.

They have more compute available to do whatever reasoning they can do. If you ask a model a quick question and instruct them to give a one-word answer, the amount of compute spent producing that answer is proportional to just the input size. But if you ask the model to produce a step-by-step reasoning of the answer before the final answer, there is a higher chance you'll get a better answer just by virtue of spending more computation.

At the risk of anthropomorphizing too much, I like to summarize this insight as follows: LLMs only think out loud. If you want them to think better, get them to talk more.

So, this is how a language model works from a user perspective. Let's see how you build one.

How to Train your Chatbot

How do you make your language model work? There are three main steps.

Pre-training

The first step is called self-supervised pretraining. In this step, you take a raw transformer architecture with uninitialized weights and train it on a massive amount of data to predict the next token. You use a large corpus of data, such as news, internet blog posts, articles, and books, and train the model on trillions of words.

The simplest training method is next token prediction. You show the model a random text and ask it what the next token is. Take a random substring from the dataset, remove the last token, show the prefix to the model, and ask for likely continuations. Compute a loss function to determine how mistaken the model was in its predictions and adjust it slightly to improve future predictions.

So far, this is a standard machine learning approach. We call it self-supervised learning because the targets are not given by humans, but chosen automatically from the input. But deep down, this is just supervised learning at scale.

Now, that being said, scaling this training process to billions of parameters and trillions of tokens presents a massive engineering challenge. No single supercomputer in the world can handle training GPT-4 from scratch, so you must resort to distributed systems to split the model across used across hundreds or thousands of GPUs for extended periods of time, synchronizing different parts of the model across multiple computers is crucial for efficient training. This just means, while the conceptual part of training an LLM is pretty straightforward, it is nothing short of an engineering prowess to get build like GPT-4.

Once pre-training is completed, you have what is called a "base model", a language model that can continue any sentence in a way that closely resembles existing text. This model is already extremely powerful. Give it any prefix of text with any content whatsoever and the model will complete it with a mostly coherent continuation. It's really autocompletion on steroids!

However, these base models, as powerful as they are, are still very hard to prompt. Crucially, they do not understand precise instructions, mostly because their training data doesn't contain a lot of examples of instructions. They are just stochastic parrots, in a sense. The next step is to get tame them.

Instruction tuning

At this point, the LLM already has all the knowledge in the world somewhere hidden in its weights--metaphorically speaking--but it is very hard to locate any concrete piece of knowledge. You must juggle with transforming questions into the right prompts to find a pattern that matches what the model has seen.

The way to solve this problem is to include another training phase, but this time much shorter and focused on a very well-curated dataset of instructions and responses. Here, the quality is crucial, much more than the quantity. You won't teach the model anything new, you will just tune it to expect instruction-like inputs and produce answer-like outputs.

Once finished, you have what's called an instruction-tuned model. These models are much more robust and easy to prompt compared to the base model, and this is the point where most open-source models end. But this is not the end of the story.

Instruction-tuned models are still not suitable for publicly-facing products for one crucial reason: they can be coerced into answering anything at all, including producing biased, discriminatory, or hate speech and instructions on how to build bombs and deadly poisons.

Given base models are trained on the whole Internet, they are full of all the good and bad you can read online--although some effort is put into cleaning the pretraining dataset, but it's never enough. We must teach the model that some questions are better left unanswered.

Preference tuning

The final step is to fine-tune the model to produce answers that are more closely aligned with user preferences. This can and is primarily used to avoid biased or hate speech, and to reject any questions that are deemed unethical by the developers training the model. However, it also has the effect of making the model more polite in general, if you wish so.

The way this process works is to turn the problem from supervised learning into the real of reinforcement learning. In short, the main difference is that, while in supervised learning we give the model the correct answers (as in instruction tuning), in reinforcement learning we don't have access to ground truth answers.

Instead, we use an evaluator that ranks different answers provided by the LLM, and a feedback loop that teaches the LLM to approximate that ranking. In its original inception, this process was performed with a human evaluator, thus giving raise to the term "reinforcement learning with human feedback", but since including humans makes this process slower and more more expensive, smaller organizations have turned to using other models as evaluators.

For example, if you have one strong model, like GPT-4, you can use it to rank responses by a smaller, still in-training model. This is one example of a more general concept in machine learning called "knowledge distillation" in which you attemp to compact the knowledge of a larger model into a smaller model, gaining in efficiency without sacrificing too much in performance.

And finally, we have now something that works like GPT-4. The process was long and expensive: a massive pretraining following by a carefully curated instruction tuning and a human-backed preference tuning. This is the reason why so few organizations have the resources to train a state-of-the-art large language model.

What can LLMs do?

Now that we understand how language models are built, let's turn our attention to their capabilities. As we've seen so far, base models are, ultimately, just autocompletion models. Given an initial prefix, they can produce a mostly coherent continuation that is plausible as far as the data and the training procedure allow.

But autocompletion is far from the only task you can do with LLMs. As we will see in this chapter, a sufficiently powerful autocompletion engine can be coerced into performing many disparate tasks. Combine this with task-specific fine-tuning, and you can turn a chatty, hallucination-prone LLM into a powerful tool for many domains.

We will start by examining what base models can do since, ultimately, all fine-tuning can do is unlock existing capabilities, making them easier to prompt. Then, we will survey many specific tasks for which LLMs can and have been used.

What can base models do?

As cool as it sounds, autocompletion on steroids doesn't ring like anything smart, right? Well, it turns out that if you are very, very good at completing any text prefix, that implies you must be good at a wide range of cognitive tasks.

For example, suppose you want to build a question-answering engine. Take a question like "Who is the current president of the United States" and turn it into a prompt like "the current president of the United States is...". If you feed this to a powerful base LLM, the most likely continuation represents the correct answer to the question. This means autocomplete on steroids gives you question answering for free.

And you can do this for a whole lot of tasks. Just turn them into an appropriate prefix and continuation. Do you want to translate a sentence? Use the prompt like "An English translation of the previous sentence is..." Do you want to summarize a text? Use a prompt like "A summary of the previous text is..." You get the point.

But it goes much further than that! The scientists at OpenAI discovered that models the size of GPT-3 and above were capable of inferring the semantics of a task given examples without explicitly telling them what the task was. This is called in-context learning, and it works wonders. For example, if you want to use an LLM for sentiment analysis, you can use a prompt like the following.

Comment: This movie was so good!
Sentiment: Positive

Comment: This movie really sucks.
Sentiment: Negative

Comment: The book was better.
Sentiment: Neutral

Comment: I couldn't stop looking at the screen!
Sentiment:

That is, you build a prompt with a few examples of inputs and outputs and feed that to the LLM, leaving the last input unanswered. The most likely continuation is the right answer to the last input, so provided the base model has seen similar tasks in its training data, it will pick up the pattern and answer correctly most of the time.

In-context learning is a surprising discovery at first, but when you look deep down, it makes total sense. Since base LLMs are completion machines, provided they have seen examples of some arbitrary task in their training set, all you need to do is come up with a text prefix that makes the model "remember" that task. And that prefix is often just a set of examples of a given task because that is actually what is stored in the LLM weights: a loosely and implicitly connected set of similar text fragments.

In a sense, the input to the LLM is a key to retrieving a part of its training set, but not in an accurate way. Since LLMs only store correlations between words, anything you "retrieve" from an LLM is a fuzzy approximation and aggregation of several (possibly millions) of similar training examples. For this reason, we say base models already "know" everything, but it's very hard for them to "remember" it, because you have to find the right key--i.e., the right context prefix.

But what if we could teach the LLM that some arbitary instruction is equivalent to the right key for a given task? That is exactly what instruction tuning is about. By showing the LLM input/output pairs of, this time, precise instructions and the corresponding answer, we are rewiring some of its parameters to strengthen the correlation between the instruction and the response.

In a sense, fine-tuning is like finding a path between the input space and the output space in the base model's fuzzy web of word correlations and connect those two subspaces of words with a shortcut, so next time you input the instruction, the LLM will "remember" where is the appropriate answer.

If this sounds overly anthropomorphic, it is because we have stretched the analogies a bit to make it easier to understand. In reality, there is no "remembering" or "knowing" happening inside a large language model, at least not in any way akin to how human memory and reasoning work. I have written extensively about this difference and its implications and will continue to do so in future posts.

For the time being, please be cognizant that any analogy between LLMs and human brains is bound to break pretty soon and cause major misunderstandings if taken too seriously.

Use cases for fine-tuned LLMs

With proper fine-tuning in a concrete domain, you can turn LLMs into task-specific models for a huge variety of linguistic problems. In this section, we'll review some of the most common tasks for which LLMs can be deployed.

When discussing the use cases of fine-tuned LLMs, we don't talk about an "input prefix" anymore because even if, technically, that is still what we are feeding the LLM, the response is not necessarily a direct, human-like continuation of the text. Instead, depending on which dataset it was fine-tuned, the LLM will respond with something that looks more like an answer to a question or an instruction than a pure continuation. Actually, if you give a fine-tuned LLM like GPT-4 an incomplete text prefix, it will often reply back with something like "I didn't understand you entirely, but it appears what you are trying to do is [...]" instead of casually continuing where you left.

Thus, it is often best to interpret this process as "prompting" the LLM with an instruction, and this is the reason why the input text is called a "prompt", and the process of designing, testing, and optimizing these prompts is called, sometimes undeservedly, "prompt engineering".

Text generation

The simplest, most straightforward use case for large language models is of course text generation, whether for fictional content as for technical articles, office work, homework, emails, and anything in-between. But instead of using a base model, where you have to provide a prefix to continue, an instruction-tuned model can be instructed directly to write a paragraph, passage, or even a short essay on a given topic. Depending on how powerful and well-trained the model is, you can even provide hints about the intended audience, the complexity of the language to use, etc.

Text generation--and all instructions in general--often works better the more descriptive the prompt. If you simply ask the LLM to "tell me a fairy story", yes, it will come up with something plausible, and it might even surprise in the good way. But you most likely want to have finer control over the result, and thus crafting a well-structured and informative prompt is crucial. In @sec-prompting we will learn the most basic strategies to create effective prompts.

A common issue in text generation, especially in longer formats, is that the LLM can and will often steer away from the main points in the discourse. The longer the response, the most likely some hallucinations will happen, which may be in the form of incoherent or plain contradictory items, e.g., characters acting "out of character" if you're generating fiction.

A battle-tested solution for generating coherent, long-form text is the divide-and-conquer approach. Instead of asking for a full text from the begining, prompt the LLM to first generate an outline of the text, and then, sequentially, ask it to fill in the sections and subsections, potentially feeding it with previously generated content to help it mantain consistency.

Summarization

Summarization is one of the most common and well-understood use cases of LLMs. In a sense, it is a special case of text generation--what isn't, right?--but it has specific quirks that merit a separate discussion. In general, LLMs excel at summarizing. After all, that's what they've been implicitely trained to do: construct a statistical model of the whole internet, which is rather, in a sense, a summary of the whole human knowledge.

However, summarization isn't a trivial problem. Besides the usual concerns about the audience, complexity of the language, style, etc., you will probably also want to control which aspects of the original text the LLM focuses on. For example, rather than a simple compactation of the text, you might want a summary that emphasizes the consequences of whatever is described in the original text, or that highlights and contrasts the benefits and limitations. This is a more abstract form of summary that produces novel value, beyond just being a shorter text.

There are important caveats with summarization, though. LLMs are very prone to hallucination, and the more you push the boundary between a plain summary and something closer to critical analysis, the more the LLM will tend to ignore the original text and rely on its own pre-trained knowledge.

And just like before, the best way to counteract any form of rebellious generation is to be very intentional in your prompt and make it as structured as necessary. For example, you can first ask the LLM to extract the key points, advantages, and limitations. Then, ask it to cluster the advantages and limitations according to your criteria. Only then can we ask it to provide a natural language summary of that semi-structured analysis. This gives you finer control over the end result and will tend to reduce hallucinations while being easier to debug since you can see the intermediate steps.

Translation & style transfer

The text-to-text transformer architecture (the precursor and core component of the modern language model) was originally designed for translation. By encoding the input sentence into a latent space of word correlations detached from a specific language and then decoding that sentence in a different vocabulary, these models achieved state-of-the-art translation in the early 2018s. The more general notion of style transfer is, deep down, a translation problem, but instead of between English and French, say, between technical and plain language.

Modern LLMs carry this capability, and will be more than enough for many practical translation tasks. However, beware that plenty of studies show that LLM translation are often poorer in many linguistic notions from professional translations. Translation is an art, as much or more than it is a science. It involves a deep knowledge of the cultural similarities and differences between readers of both languages, to correctly capture all the nuances that even a seemingly simple phrase can encode.

That being said, LLMs can help bridge the gap for non-native speakers in many domains where you don't need--or can't hope for--a professional translation. An example is inter-institutional communication, e.g., emails from co-workers who don't speak your native language. In these cases, you must also be careful nothing important is lost in translation, literally, but as long as everyone is aware of the limitations, this is one of the most practical use cases for LLMs.

Structured generation

Continuing with the topic of text generation capabilities, our next stop is generation from structured data. This is one specific area where LLMs come to mostly solve a long-standing problem in computer science: to generate human-sounding explanations of dry, structured data.

Examples of this task are everywhere. You can generate a summary of your calendar for the day, and pass it to a speech synthesis engine, so your personal assistant can send you every morning an audio message reminding you what you have to do, with cute linguistic cues like "Oh, and on the way to the office, remember to pick up the your wife's present." We will see an example of this functionality in @sec-planner.

Other examples include generating summaries of recent purchases for a banking app or product descriptions for an online store—basically anywhere you'd have a dashboard full of numbers and stats, you can have an LLM generate a natural language description of what's going on. You can pair this capability with the super skills LLMs have for question answering (at least when the answer is explicit in the context) to construct linguistic interfaces to any of number of online services or apps.

Text classification

Text classification is the problem of categorizing a text fragment—be it a single sentence, a whole book, or anything in between—into one of a fixed set of categories. Examples vary from categorizing comments as positive/neutral/negative, determining if an email is spam or not, or detecting the tone and style of a text, to more specific tasks like extracting the intended intention from a user, e.g., chatting with an airline bot.

To have an LLM correctly and robustly classify your text, it is often not enough to just instruct it and provide the intended categories. The LLM might come up with a new category you didn't mention just because it makes sense in that context. And negative instructions, in general, don't work pretty well. In fact, LLMs are lousy at interpreting negative instructions precisely because of the underlying statistical model. We will see in @sec-reasoning why this is the case.

Instead of a dry, zero-shot instruction, you can improve the LLM classification capabilities substantially with a few examples (also called a k-shot instruction). It works even better if you select the examples dynamically based on the input text, a procedure that eerily similar to k-NN classification but in the world of LLMs. Furthermore, many LLMs tend to be chatty by design, and will often fail to provide a single word classification even if you instruct it to. You can mitigate this by using a structured response prompt, as seen in @sec-prompting.

Structured data extraction

A generalization of text classification is the problem of structured data extraction from natural language. A common example is extracting mentions of people, dates, and tasks in a text, for example, a transcription from a video meeting. In the more general case, you can extract any entity-relation schema from natural text and build a structured representation of any domain.

But this capability goes much further. If you have any kind of structured input format--e.g., an API call for any online service--you can instruct (and probably k-shot) an LLM to produce the exact JSON-formatted input given a user query. This is often encapsulated in modern LLM providers i a functionality called "function calling", whic which we will explore in @sec-function-calling.

As usual, the main caveat with structured generation is the potential for subtle hallucinations. In this case they can be in two forms. The simplest one, when the LLM fails to produce the expected format by, e.g., missing a key in JSON object or providing an invalid type. This type of error is what we call a syntactic hallucination and, although anoying, is often trivial to detect and correct, even if just by retrying the prompt.

The second form of hallucination is much more insidious: the response can be in the right format, and all values have the right type, but they don't match what's in the text. The LLM hallucinated some values. The reason this is a huge problem is because detecting this form of semantic hallucination is as hard to solve as hallucinations in general. As we'll see in @sec-hallucinations, we simply have no idea how to ensure an LLM always produce truthful responses, and it might be impossible even in principle.

Question answering

Question answering is one of the most surprising capabilities of sufficiently large language models. To some extent, question answering can be seen as a form of retrieval, where you ask about some facts explicitly mentioned in the training set. For example, if you ask, "Who wrote The Illiad" it is not surprising, given what we know of LLMs, that a fine-tuned model can easily generate "Homer" as the most plausible response. The sentence "Homer wrote The Illiad" must have appeared thousands of times in different ways in the training set.

But modern LLMs can go way beyond simply retrieving the right answer to a trivia question. You can ask questions that involve a small set of reasoning steps, combining facts here and there to produce a response that is not, at least explicitly, in the training set. This is rather surprising because there is no explicit reasoning mechanism implemented in LLMs. All forms of reasoning that can be said to happen are an emergent consequence of learning to predict the next token, and that is at least very intriguing.

In any case, as I’ve argued many times, the statistical modelling paradigm has some inherent limitations that restrict the types of reasoning that LLMs can do, even in principle. This doesn't mean that, in practice, it can't work for the types of problems you encounter. But in its most general form, long-term reasoning and planning are still an open problem in artificial intelligence. I don't think LLMs alone are equipped to solve it.

You can, however, plug LLMs with external tools to enhance its reasoning skills. One of the most fruitful research lines is to have them generate code to solve a problem, and then run it, effectively making LLMs Turing-complete, at least in principle, even if in practice they may fail to generate the right code. Which leads us to the next use case.

Code de generation

Since LLMs are trained to autocomplete text, it may not be that surprising that, when feed with enough training examples of code in several programming languages, they can generate small snippets of mostly correct code. However, for anyone who codes, it is evident that writing correct code is not as simple as just concatenating plausible continuations. Programming languages have much stricter syntax rules that require, e.g., to close all parenthesis and to use explicit and very tight naming conventions. Failing to produce even a single semicolon in the right place can render a program unusable.

For this reason, it is at least a bit surprising that LLMs can code. More surprising still is that they can not only autocomplete existing code but generate code from scratch given natural language instructions. This is one of the most powerful capabilities in terms of integrating LLMs with external tools because code is, by definition, the most general type of external tool. There is nothing you can do on a computer that you can't do with code.

The simplest use case in this domain is, of course, using LLMs as coding assistants embedded in developer tools like code editors. But this is just scratching the surface. You can have an LLM generate code to solve a problem it would otherwise fail to answer correctly--e.g., perform some complex physics computations. Code generation allows an LLM to analyze large collections of data by computing statistics and running formulas. You can even have an LLM generate the code to output some chart, and voilá, you just taught the LLM to draw!

Code explanation

Code explanation is the inverse problem of code generation: given some existing code, produce a natural language explanation or, more generally, answer questions about it. In principle, this is a form of question-answering that involves all the caveats about complex reasoning we have already discussed. But it gets harder.

The problem is the majority of the most interesting questions about code cannot be answered in general: they are undecidable, meaning no algorithm can exist that will always produce the correct response. The most poignant example is the question, "Does this function ever returns?". This is the well-known Halting problem, the most famous problem in computability theory, and the grandfather of all undecidability results. Similar questions, such as whether a variable is ever assigned or a method is ever called, are also undecidable in the general case.

And this is not just a theoretical issue. The Halting problem highlights one crucial aspect of computation: in the general case, you cannot predict what an algorithm will do without running it. However, in practice, as anyone who codes knows, you can predict what lots of your code will do, if only because it is similar to code you've written before. And this is where LLMs shine: learning to extrapolate from patterns to novel specific instances, even if the general problem is unsolvable.

To top it all, we can easily imagine an LLM that, when prompted with a question that seemingly cannot be answered from the code alone, could decide to run the code with some specific parameters and observe its results, drawing conclusions not from the syntax alone but from the execution logs. A debugging agent, if you will.

Final Remarks

These are the most essential high-level tasks where LLMs can be deployed, but they span hundreds, if not thousands, of potential applications. Text classification, for example, covers a wide range of applications, just changing the classification target.

One conclusion you can draw from this chapter is that LLMs are some of the most versatile digital technologies we've ever invented. While we don't know if artificial general intelligence is anywhere near, we're definitely one step closer to general-purpose AI—models that can be easily adapted to any new domain without research or costly training procedures.

However, language modelling is not magic. The above discussion has already given us a glimpse of some of this paradigm's fundamental limitations. In future posts, we will explore how these models learn compared to humans and what this difference entails regarding their limitations.

If you want to know more about language modelling in general, and LLMs in particular, feel free to check Mostly Harmless AI. It’s jam-packed with information (most of which is published in this blog already) on the good and the ugly parts of LLMs, and lots of advice on how to get the best out of them.

The Insurmountable Problem of Formal Reasoning in LLMs

Alejandro Piad Morffis — Mon, 03 Mar 2025 12:03:44 GMT

Welcome to another issue of Mostly Harmless AI, the section of where I share educational articles for the general audience on the potential and limitations of Artificial Intelligence. This article is a bit of a mashup of several past articles, with new insights and much better organization. It is my attempt to put a final dot in the discussion of whether LLMs can or cannot reason—at least for a while. I hope you enjoy it.
This article first appeared in , a newsletter hosted by that offers deep dives into recent AI topics, and much more. Please check the original article for additional insights and opinions from Michael.
See the original article
This version has only minor syntax and grammar fixes.
— Alejandro

Reasoning models are the latest trend in AI.

It all began with OpenAI o1, a model designed to "think out loud" before arriving at answers, mimicking human thought processes in an almost uncanny way. Then came DeepSeek R1, which has garnered massive attention for its impressive performance in reasoning tasks, which sent shock waves across the whole AI industry and sparked a few conspiracy theories.

And now we're seeing the upgrade from o1 to o3 in real-time (yes, these people are really bad at naming things), another reasoning model from China, and Google and Meta throwing their hats into the fight. Everyone runs to get their "reasoning" model out as quickly as possible.

For many, these models seem to be more than just iterations of the previous paradigm; they represent a qualitative leap, particularly in tackling challenges that require logical reasoning. Across many complex benchmarks, reasoning-enhanced models have shown impressive accuracy at the cost of increased response time, which seems like a fair deal.

The big shift is trading more computation during inference—making the model "think out loud" for, sometimes, several minutes before deciding on an answer—for a decreased rate of hallucinations and reasoning mistakes and an improved capacity for producing long chains of mostly sound logical arguments to arrive at a non-trivial (read, pattern-matched) answer.

However, despite their impressive performance, these models are not infallible. Numerous examples illustrate their limitations, where they deviate from the correct thinking path, arrive at incorrect conclusions, and even occasionally contradict themselves. In particular, these models seem to struggle with formal reasoning: mathematical and logical problems that require not as much real-world or common-sense knowledge but rather the precise application of logical inference rules.

This raises some crucial questions: Is the path to flawless reasoning merely a matter of superior data and extended training? Can we bridge the gap to achieve the kind of formal reasoning from natural language we've seen in sci-fi AIs—like Data from Star Trek or the hyperrational robots in Asimov's narratives? Or is there something fundamentally limiting within the architecture of large language models that restricts their capability for comprehensive logical reasoning in natural language?

In this article, I will argue that LLMs have intrinsic limitations that hinder their reasoning abilities. The core of the argument boils down to the combination of stochastic language modelling with a fixed computational architecture, which renders LLMs incapable of provably correct general-purpose formal reasoning.

I will attempt to explain this in clear and intuitive terms and address the most compelling counterarguments to this perspective. This article summarizes and expands on my views in several previous articles. It's part of my ongoing attempt to understand where we are heading with LLMs regarding reasoning.

While much of this discussion in this article applies primarily to first-generation LLMs—anything before OpenAI o1—I will also explore how these fundamental constraints extend to newer reasoning-enhanced models. If my arguments hold, nothing short of a novel paradigm can lead to AGI. In fact, I will argue that perfect reasoning from natural language may be computationally unsolvable, even in principle.

But before moving on, let me clarify something. I understand this topic is very controversial, and I know I will receive a lot of criticism, especially from some of the most ardent believers in the immediacy of AGI.

To be clear, I'm not claiming AGI is impossible. On the contrary, I'm firmly on the computationalist side here, and I believe machines can achieve the same and perhaps even superior levels of intelligence as any living, organic being. And I'm confident LLMs will play a significant role in the AGI breakthrough. That's why I am among the many researchers working on better ways to make LLMs interact with other computational systems to enhance their capabilities. I’m just sceptical that all that’s left is throwing more GPUs and using the same techniques we already have.

With that out of the way, let's explore what it means to say that LLMs are fundamentally limited in formal reasoning. This will be a long read, so buckle up!

What Do We Mean by Reasoning in LLMs

"Reasoning" is a very loaded term, that means a lot of things for different people—even for different groups of people who should agree on these definitions. So, I will attempt to define precisely what I mean by the phrase "LLMs cannot reason.”

In the Artificial Intelligence field, when we claim LLMs (or any computational system) can or cannot "reason", we are, for the most part, not talking about any abstract, philosophical sense of the word “reason”, nor any of the many psychological and sociological nuances it may entail. There are many ways to define reasoning, from common sense to analogical to emotional to formal. In this article, I will focus on a precise, quantifiable, simplified notion of reasoning that comes straight from math.

Reasoning is, simply put, the capacity to draw logically sound conclusions from a given premise. Formal reasoning is when you follow a formalized method with hard rules to arrive at a conclusion. The most important tool for formal reasoning in math and science is logic, and it provides two fundamental reasoning modes: deduction and induction.

Induction is somewhat problematic for AI (and humans!) because it involves generalizing claims from specific instances, and thus, it requires some strong assumptions. In contrast, deduction is very straightforward. It is about applying a set of predefined and agreed-upon logical inference rules to obtain new provably true claims from existing true claims. It is the type of reasoning that Sherlock Holmes is most famous for, and the thing mathematicians do all day long when proving new theorems.

Thus, in the remainder of this article, when I say "LLMs cannot reason", I’m simply saying LLMs cannot perform logical deduction; that is, there are well-defined, computationally solvable deduction problems they inherently cannot solve. As I will attempt to convince you, this is not a value judgment or an informed opinion based on experience. It is a straightforward claim provable from the definition of deductive reasoning and the inherent characteristics of LLMs given by their architecture and functionality.

Now, if this definition of “reasoning” sounds like an oversimplification, well… in a sense it is. I'm not addressing other forms of non-formal reasoning that are also paramount in human thinking, like abduction or common-sense and analogical reasoning. But formal reasoning, and particularly deductive reasoning, is the basis of mathematical and scientific research and crucial in informed decision-making, so any attempt to build an AI system useful for these tasks must be able to perform this mode of reasoning.

Before moving on to why LLMs cannot reason, let me address the most common counterargument I encounter whenever this topic is raised in non-academic contexts.

Can Humans Truly Reason?

Here is an argument you will hear a lot, and maybe even have made yourself:

“Sure, LLMs cannot really reason, but neither can humans, right? So what's the big deal.”

I mean, humans can be stupendously irrational. We are prone to so many biases that, while useful in the biological and sociological context in which we evolved, in the modern world, are often more than not an obstacle to rational, logical thinking.

Despite this, the argument that since humans are not perfectly rational, it is OK for LLMs to also not be is flawed on many levels, so let’s unpack it.

First, while humans can make errors in reasoning, the human brain definitely possesses the capacity for open-ended reasoning, as evidenced by the more than 2000 years of solid math we have collectively built. Moreover, all college students—at least in quantitative fields—at some point have to solve structured problem-solving exercises that require them to apply logical deduction to arrive at correct conclusions, such as proving theorems.

So, while humans can be pretty stupid at times, we are certainly capable of the most rigorous reasoning when trained to do so.

But even more importantly, this assertion is a common case of whataboutism. Why does the fact humans can’t do something immediately make it OK for a piece of technology to fail at it? Imagine we did this with all our other tech. Sure, that aeroplane fell down and killed 300 people, but humans can’t fly. Or yes, that submarine imploded, but humans can’t breathe underwater. Or that nuclear power plant melted, but humans can’t stand 3000 degrees of heat, so what’s the big deal?

Obviously, we don’t do that. We compare any new piece of technology with our current best solution, and only if the new thing improves upon the old—at least on some metrics—do we consider the investment worthwhile. We replaced horses with cars because cars improved the previous best solution to individual transportation, at least on some fundamental metrics, if not all.

Granted, we often compare AI capabilities to human capabilities, but this is only because humans are the gold standard for the types of problems we often want AI systems to solve. So, we compare LLMs' capacity to generate creative stories, engage in open-ended dialogue, or provide emphatic customer assistance with humans because humans are the best at these tasks.

However, there are well-established systems—such as traditional SAT solvers—that excel in structured logical deduction and reasoning tasks. These systems are designed with rigorous validation mechanisms that ensure correctness and reliability in their outputs. They are basically flawless and incredibly fast.

So, in terms of the capability to perform purely logical, perfectly valid deduction, we already have a computational solution that sets the bar. Nothing short of provably correct deduction is good enough. The problem is these formal reasoning systems don't understand natural language, of course, and that's why we want LLMs to bridge the gap.

Why LLMs Are Incapable of Formal Reasoning

Let's move on to the main limitations of current large language models that prevent general-purpose, provably correct deductive reasoning. I'm mostly thinking about LLMs implemented in the prevalent paradigm of the Transformer architecture, but these arguments apply to anything implemented using some sort of context-limited, probabilistic, sequential language model designed to run on GPUs.

The Argument from Stochastic Modelling

The first limitation of stochastic language models regarding reasoning is precisely their stochastic nature. These models generate outputs based on probabilistic predictions rather than deterministic logical rules. This means that even a well-structured prompt can and will yield different responses on different occasions due to the randomness of the sampling process.

Now, stochastic sampling is a necessary feature of LLMs, and not a bug. It is the fundamental reason that LLMs can produce varied, plausible, humanly-sounding text at all. It is also the only way we know how to capture in software, at least approximately, the many nuances and ambiguities of natural language.

That being said, stochastic sampling is a problem for deductive reasoning. For starters, an LLM might arrive at a wrong conclusion purely by chance, even if the right answer was the most likely continuation. This is, once again, the infamous problem of hallucinations.

To alleviate this, we may attempt to set the sampling temperature to zero, effectively forcing the model to fix the output for a given input. Thus, there will be no more different answers to the same prompt.

However, the underlying model is still probabilistic; we’re just greedily sampling the most likely continuation. The problem is that the mapping between input and output hinges on a probabilistic distribution that encodes correlations between elements in the input and corresponding elements in the output.

The reason this is problematic is simple: any probabilistic model of language that can generate new sentences that didn't exist in the training data—that is, that can generalize at all—has to be, by definition, a hallucination machine. Or, to use ’s much better terminology: a confabulation machine.

You can think about it like this: if you want a language model to generate novel sentences, then it must be able to produce sentences it doesn't really know, but somehow believes are correct.1

The way a stochastic language model "believes" a sentence is correct, is because the probability of seeing this exact sentence is high. Crucially, for novel sentences, this probability is high because there is some similarity to actual sentences in the training set.

This means any stochastic language model that's useful at all needs to approximate correctness through plausibility. Sentences close (in a precise mathematical sense we won't get into here) to what the LLM saw in the training set are thus considered correct. Sentences that are too dissimilar to anything in the training set are considered incorrect. But the frontier is fuzzy; there is no explicit threshold to distinguish what's true from what's not. All a stochastic language model can actually model is degrees of plausibility.

And there is the problem. Math and logic are not fuzzy2. There is no plausibility involved. A reasoning chain is either correct or incorrect. And more importantly, a tiny change in a single word—like adding a single "no" somewhere in a huge prompt—can completely shift the validity of a logical claim.

Whenever you approximate crisp mathematical correctness with fuzzy plausibility, you will have true and false claims close enough to each other such that a stochastic model cannot effectively distinguish them.

Now, this discussion has important nuances, so let's analyze some interesting arguments about the role of randomness in reasoning.

Is Randomness a Bug or a Feature?

A common criticism of this argument regarding the stochastic nature of language models is that, in fact, randomness is essential in problem-solving and a crucial feature of many of the same SAT solvers I pretend to compare LLMs with. How hypocritical it is to posit randomness as a limitation when the most effective deductive reasoning algorithms are essentially random search algorithms!

This is true, but only partially, and it makes all the difference. So, let me explain.

Randomness plays a vital role in many computational problem-solving techniques, particularly in search algorithms for hard (read NP-complete or NP-hard) problems. Modern SAT solvers, for example, often employ randomized search strategies to efficiently explore vast solution spaces. By introducing randomness into the search process, these solvers can escape local optima and discover satisfactory solutions more quickly than deterministic methods might allow.

However—and here comes the crucial difference—using randomness in the search process does not imply that the entire reasoning process is inherently unreliable. Randomness is confined to the search phase of problem-solving, where it helps identify potential solutions—potential reasoning paths. However, once a candidate solution is found, a deterministic validation phase kicks in that rigorously checks the correctness of the proposed reasoning path.

This distinction between the search and validation phases is paramount in understanding how randomness contributes to effective problem-solving in general. During the search phase, algorithms may employ random sampling or other stochastic methods to explore possibilities and generate potential solutions. This phase allows for flexibility and adaptability, enabling systems to navigate complex landscapes of potential answers.

However, once a potential solution has been identified, it must undergo a validation process grounded in deterministic logic. This validation phase involves applying predefined, deterministic rules to confirm that the proposed solution meets all necessary criteria for correctness. As a result, any solution that passes this validation step can be confidently accepted as valid, regardless of how it was generated.

Here is a silly metaphor to illustrate this problem. If you sit a million monkeys in a million typewriters, at some point, one of them will randomly produce Romeo and Juliet. However, you need Shakespeare to filter the garbage out and decide which of the million pamphlets to publish.

What this means is randomness is good for exploring hypotheses but not for deciding which one to accept. For that, you need a deterministic, provably correct method that doesn’t rely on probabilities---at least if you want to solve the problem exactly.

However, in stark contrast to traditional problem-solving systems like SAT solvers, LLMs lack a robust validation mechanism. While they can generate coherent and contextually relevant responses based on probabilistic reasoning, some of which may be correct reasoning chains, they do not possess a reliable method for verifying the accuracy of those outputs. The verification process is also stochastic and as subject to hallucinations as the generation process, rendering it effectively unreliable.

Therefore, since LLMs evaluate their own outputs using the same probabilistic reasoning they employ to generate them, there is an unavoidable, although perhaps small, risk that incorrect conclusions will be propagated as valid responses.

The monkeys are also the publishers.

The Argument from Bounded Computation

The second argument concerns the computational architecture of current language models. By design, LLMs spend a fixed amount of computation per token processed. Thus, the amount of computation an LLM does before it produces the first output token is a function of just two numbers: the input size and the model size.

So, if you ask an LLM to produce a yes or no question for a logical puzzle, all the “thinking” the model can do is some fixed—albeit huge—number of matrix multiplications that only depend on the input size.

Now, consider that you have two different logical puzzles with the same input size, i.e., the same number of tokens. But one is an easy puzzle that can be solved with a short chain of deduction steps, while the other requires a much higher number of steps. Here is the kicker: any LLM will spend exactly the same amount of computation on both problems. This can’t be right, can it?

A basic result in computational complexity theory is that some problems with very small inputs seem to require an exponentially high computational cost to be solved correctly. These are NP-complete problems, and most computer scientists believe there are no efficient algorithms to solve them. Crucially, a huge number of reasoning problems fall in this category, including the most basic logical puzzle of all—determining if a given logical formula can be satisfied.

When faced with an instance of an NP-complete problem, an LLM will produce an answer after a fixed amount of computation defined solely by the input size. Now, by sheer size, some larger models might just spend enough computation to cover many smaller instances of NP-complete problems.

As it happens, a huge constant function can be larger than an exponential function for smaller inputs. But crucially, we can always find instances of NP-complete problems that require, even in principle, a sufficiently large amount of computation to surpass the computational capacity of any LLM, no matter how big.

But this means something even more profound. Ultimately, LLMs are not Turing-complete systems but essentially very large finite automata. While they can handle a wide range of tasks and produce outputs that appear sophisticated, their underlying architecture limits the types of problems they can solve.

Turing completeness is the ability of a computational system to perform any computation given sufficient time and resources. Modern computers and many seemingly simple systems, such as cellular automata, are Turing complete systems. Ironically, LLMs are not.

The reason is simple. We know from computability theory that any Turing complete system must be able to loop indefinitely. There are some problems—some reasoning tasks—where the only possible solution is to compute, and compute, and compute until some condition holds, and the amount of computation required cannot be known in advance. You need potentially unbounded computation to be Turing complete.

And this is the final nail in the coffin. LLMs, by definition, are computationally bounded. No matter their size, there will always be problem instances—which we may not be able to identify beforehand—that require more computation than is available in the huge chain of matrix multiplications inside the LLM.

Thus, when LLMs seem to tackle complex reasoning problems, they often solve specific instances of those problems rather than demonstrating general problem-solving capabilities. This might just be enough for practical purposes—we may never need to tackle the larger instances—but, in principle, LLMs are incapable of truly open-ended computation, which means they are incapable of true reasoning.

But again, there is a lot of nuance in this limitation. For starters, this analysis applies very obviously to first-generation LLMs. But newer, so-called reasoning-enhanced LLMs work a little differently: they produce a potentially unbounded sequence of "thinking tokens" before deciding on the actual answer.

So let's examine this argument from bounded computational power in more detail.

Can LLMs Perform Unbounded Computation?

A common counterargument is that LLMs can be rendered Turing complete by integrating them with external tools, such as code generators or general-purpose inference engines. It is even easier to wrap them in a recursive loop that can simply call the LLM as many times as necessary.

And this is true. You can trivially make an LLM Turing-complete, in principle, by duct-taping it with something that is already Turing-complete. You can also build a flame thrower with a bamboo stick, some duct tape, and a fully working flame thrower.

However, simply making LLMs Turing complete in principle does not guarantee that they will produce correct or reliable outputs. Integrating external tools or clever self-recursion introduces yet another layer of complexity and potential points of failure.

We need to address two main strategies in this regard: prompting and function calling. Let's tackle them one at a time.

Prompt-based Techniques for Reasoning

Chain-of-thought prompting is the most basic way to increase the computation of LLMs. By guiding models to articulate intermediate reasoning steps before arriving at a final answer, CoT prompting helps decompose complex problems into manageable parts. This method has improved performance across various reasoning tasks, such as arithmetic and commonsense reasoning.

CoT makes the LLM "think harder" by forcing the model to produce what we can consider “internal thought” tokens. Thus, we may view it as a way to perform additional computation on the input before deciding on the response.

This is precisely what modern reasoning-enhanced models like o1 and R1 are doing: automating the chain of thought prompt by baking it into the training process. However, despite its advantages, CoT prompting remains insufficient for several reasons.

On the one hand, CoT doesn't address the fundamental limitation of hallucinations. The stochastic nature of LLMs means that even with CoT prompting, outputs can vary across different runs due to randomness in the generation process. This variability can lead to inconsistent reasoning outcomes, undermining the reliability of the model's responses.

On the other hand, CoT extends the computation budget by a finite amount. To achieve true unbounded computation, we need a cyclic scheme in which the LLM is prompted to continue thinking, potentially indefinitely, until satisfied.

A potential solution for this problem is the intuitive approach of self-critique, which involves evaluating and refining an LLM's responses with the same model, using prompts that instruct the model to read its previous output, highlight potential errors, and try to correct them. A form of after-the-fact chain-of-thought, if you might.

However, research also shows significant limitations in the effectiveness of this self-critique capability.

While LLMs can generate multiple ideas and attempt to critique their initial outputs, studies indicate that they cannot often meaningfully self-correct. Research also shows that self-correction techniques in LLMs are heavily contingent on the availability of external feedback. In many cases, LLMs perform better when they have access to an external verifier or additional context rather than relying solely on their internal reasoning capabilities.

And even more interestingly, attempts at self-critique can sometimes degrade performance rather than enhance it. Studies have shown that when LLMs engage in self-critique without external validation, they may generate false positives or incorrect conclusions. If you push harder, you can easily fall into a cycle of self-reinforcement of invalid or erroneous arguments, making the LLM increasingly more certain despite it getting worse and worse.

Ultimately, this circles back to the already discussed issue of relying on randomness for validation. If we attempt to get the model to tell us when it is certain it got the answer right, we are doomed. CoT and self-critique are great strategies for exploring different hypotheses and generating multiple possibilities, but they alone cannot reliably produce a provably correct conclusion all the time.3

Function Calling and External Tools

Integrating external tools, such as reasoning engines or code generation systems, into large language models represents a promising—and, for me, the only really viable—approach to enhancing their reasoning capabilities.

Connecting LLMs to external reasoning engines or logical inference tools makes it possible to augment their reasoning capabilities significantly. These tools can handle complex logical deductions, mathematical computations, or even domain-specific knowledge that the LLM might not possess inherently. Similarly, external code generation systems enable LLMs to produce executable code for specific tasks.

By leveraging these external resources, LLMs can potentially overcome some of their inherent limitations in logical reasoning and problem-solving. For starters, an external inference engine will be Turing-complete, so we scratch that problem down, right?

Not so fast. Unfortunately, this approach has many challenges, particularly regarding the LLM's ability to generate the correct input for function calls or code execution. It all circles back to the original sin of LLMs: stochastic language modelling leads inevitably to hallucinations.

First, the effectiveness of function calling or code generation hinges on the model's ability to interpret a task accurately and generate appropriate inputs. If the model misinterprets the requirements or generates incorrect inputs (e.g., if it hallucinates part of the inputs), the external tool will produce erroneous outputs or fail to execute altogether.

While external tools can, in principle, improve the reasoning capabilities of an LLM by providing structured logic and formal verification, they cannot compensate for LLMs' basic limitations in generating reliable output. Therefore, there is no formal guarantee that the final output from this integration will be logically sound or appropriate for the context, simply because of the age-old adage: garbage in, garbage out.

Subscribe now

How About Reasoning-Enhanced Models?

Finally, let's briefly tackle what this means for state-of-the-art reasoning-enhanced LLMs, such as OpenAI's o3 and DeepSeek R1. These models employ extended "thinking" phases during inference, generating multi-step reasoning chains, critiquing their own outputs, and iteratively refining conclusions. This approach reduces hallucinations and improves accuracy on benchmarks like mathematical problem-solving and logical puzzles, often outperforming earlier models by wide margins.

The mechanics of these enhancements are based on two key innovations. First is the chain-of-thought expansion, which allows models to explore branching reasoning paths for longer durations—sometimes minutes of computation—simulating deeper deliberation. Second is implementing self-critique loops: internal validation mechanisms where the model evaluates its intermediate conclusions against problem constraints.

This article by provides a much deeper and more technically detailed description of how these models work.

These techniques do improve practical performance by a large margin. It's much better to bake CoT into the training process than simply relying on human-crafted prompts. During training, we can teach the model to automatically produce mostly correct reasoning chains by using formally verifiable training tasks, such as coding and math problems.

However, during inference, the accuracy of these models remains fundamentally limited by the probabilistic architecture of the underlying LLM. The self-critique process itself relies on the same stochastic language modelling that generates initial answers; there is no deterministic verification step during training. The same reliability issues remain, perhaps significantly decreased but not completely eliminated.

These models represent an evolution rather than a revolution. They scale up existing chain-of-thought techniques through increased computation and mirror human-like error checking with self-critique mechanisms that, ultimately, lack mathematical guarantees of correctness. Performance gains come from better exploring the model's existing knowledge space rather than new reasoning capabilities.

So, despite the remarkable empirical improvements in reasoning-enhanced models, they are more a natural evolution of existing paradigms rather than a qualitative breakthrough. Their stochastic foundation ensures occasional failures will persist despite enhanced computing budgets. They are, indeed, very powerful tools for many real-world applications where absolute formal certainty is not required, but they cannot overcome their fundamental limitations when tasked with formal deductive reasoning tasks requiring strict accuracy guarantees.

Conclusions

The purpose of this article is to convince you of two claims:

Large Language Models currently lack the capability to perform a well-defined form of reasoning that is critical for important decision-making processes, including producing novel, scientifically valuable output.
We have absolutely no idea how to solve this in the near future within the prevalent and, so far, the only scalable language modelling paradigm.

This matters because there is a growing trend to promote LLMs as general-purpose reasoning engines. As more users begin to rely on LLMs for important decisions, to perform deep research on their behalf, and in domains where they cannot validate the results, the implications of these limitations become increasingly significant. At some point, someone will trust an LLM with a life-and-death decision, potentially with catastrophic consequences.

More importantly, the primary challenges in making LLMs trustworthy for reasoning are immense. Despite ongoing research and experimentation, we have yet to discover solutions that effectively bridge the gap between LLM capabilities and the rigorous standards required for reliable reasoning. Currently, our best efforts in this area are duct tape—temporary fixes that do not address the underlying limitations of the stochastic language modelling paradigm.

Now, I want to stress that these limitations do not diminish the many other applications where LLMs excel as stochastic language generators. In creative writing, question answering, user assistance, translation, summarization, automatic documentation, and even coding, many of the limitations we have discussed here are actually features.

The thing is, this is what language models were designed for—to generate plausible, human-like, varied, not-necessarily-super-accurate language. The whole paradigm of stochastic language modelling is optimized for this task, and it excels at it. It is much better than anything else we’ve ever designed. But when we ask LLMs to step outside that range of tasks, they become brittle, unreliable, and, worse, opaquely so.

The emergence of models like OpenAI’s o1 and o3 models, DeepSeek's R1, Gemini 2.0, and the many more we'll definitely see in the short term, seem like a significant step forward. And to a large extent, they represent innovative and creative approaches, and open up new avenues of research and applications.

However, we haven't yet seen any fundamentally new paradigm in logical reasoning with LLMs. Deep down, this is “just” a way to explicitly incorporate chain of thought prompting in a fine-tuning phase and teach the model via reinforcement learning to select mostly coherent paths of deduction. It's clever scaling of what already works fairly good, but not perfectly.

Thus, while definitely an impressive technical and engineering feat, these reasoning-enhanced models, and any future models based on the same paradigm, will continue to share the same core limitations inherent to all LLMs, only mitigated using some clever tricks. Hallucinations will still hamper any attempt at definitively solving formal reasoning with LLMs. It may be a matter of degree, and for some domains, good enough might just be good enough.

Ultimately, provably correct reasoning from natural language might even be an uncomputable problem. After all, natural language semantics are anything but formal, so going automatically from informal problem descriptions to formal, provably correct solutions might be impossible, even in principle. I have some ideas about why this may be the case that I'll share in a future article.

If I'm right, then maybe all we can hope for is a very clever, very efficient, never-tired, sometimes incorrect, not fully trustworthy AI assistant—less like Data and more like Bender minus the sarcasm. But we will never be able to completely remove expert human judgment from any life-or-death decision. And that might just be the best outcome.

What do you think?

I'm using "know" and "believe" gratuitously here, in the understanding that the informed reader will not misinterpret them as positing any sort of self-awareness or conscious experience in LLMs. Words alone don't mean anything; context is everything.

Ok, there is something called fuzzy logic, and I think fuzzy logic is a much better analogue for the type of reasoning we can expect from LLMs. However, fuzzy logic is very limited in the types of inferences it can model, and it’s definitely not enough to count as “solving formal reasoning”. But that’s a story for another day.

There is another, more technical argument as to why self-critique and CoT don’t enable Turing completeness. Since LLMs are stochastic, what you get when you use them to potentially loop indefinitely is not a real “while true” loop. You get a probabilistic loop that has a nonzero chance to stop at any given moment—or else, it will loop indefinitely, which is not what you want. What you want is to loop potentially forever, but stop when you find an answer.

But for any loop that has a nonzero chance to stop at any given iteration, we can find a problem that requires stopping after a sufficiently large number of iterations, such that the probability the loop ends before required can be made as large as we want. We can always find deduction problems for which the LLM cannot “think hard” enough, simply because there is a tiny chance it will “get tired” any given second. No matter how tiny, for large enough problems, the LLM will almost always fail to think hard enough.

Why Artificial Neural Networks Are So Damn Powerful - Part I

Alejandro Piad Morffis — Sun, 02 Feb 2025 11:51:25 GMT

You already know that neural networks are everywhere, and there’s a reason for that, beyond fad. And no, it’s not simply because they are “inspired in the human brain”—we already debunked this partial myth in our previous article.

The true strength of neural networks lies primarily in their nature as mathematical constructs that are extremely flexible and powerful. This makes it relatively easy for them to adapt to nearly any domain. Additionally, they excel at leveraging vast amounts of data and computational power.

The ability of neural networks to model complex relationships and learn from vast amounts of data stems from several key mathematical properties. This article will explore these strengths, including the universal approximation theorem, the role of inductive biases in different architectures, and how layers within a network perform representation and manifold learning. We’ll also discuss the versatility of neural networks in transforming various learning objectives into optimized loss functions.

Then, in Part II, we’ll see how these networks scale impressively with increasing data and compute resources, further solidifying their position as a cornerstone of modern AI.

Subscribe now

Universal Approximators

The Universal Approximation Theorem is a cornerstone of neural networks and machine learning. It asserts that a feedforward neural network with a single hidden layer containing a finite number of neurons can approximate any continuous function on a compact subset of inputs to any desired degree of accuracy, provided that the activation function is non-constant, bounded, and continuous.

Whoa, that was a mouthful. In short, a big enough but still very simple neural network can, in theory, learn any pattern you want to with an arbitrary degree of precision.

The theorem's history dates back to the late 1980s when researchers began formalizing the mathematical foundations of neural networks. Early work by researchers like George Cybenko in 1989 demonstrated that single-layer networks could achieve this approximation capability. Since then, various versions and extensions of the theorem have been developed, solidifying its importance in understanding the theoretical underpinnings of neural networks.

What does this mean in practical terms? The Universal Approximation Theorem implies that neural networks are incredibly versatile tools capable of modelling a wide array of functions, from simple linear relationships to intricate non-linear mappings. This flexibility allows them to be effectively applied across numerous domains, such as image recognition, natural language processing, and more.

However, while the theorem guarantees the existence of such approximations, it does not provide a method for finding them efficiently in practice. Thus, while neural networks can theoretically learn any function, achieving that in real-world scenarios often requires careful design and training strategies.

Architectural Flexibility

While the Universal Approximation Theorem assures us that a sufficiently large, fully connected feedforward neural network with a single hidden layer can approximate any continuous function, this approach is often impractical for real-world applications. Instead, we can leverage specialized structures and different types of layers to create neural networks tailored for specific tasks. This architectural flexibility allows us to exploit the unique characteristics of the data and the problem domain, enhancing performance and efficiency.

One prominent example is convolutional layers, commonly used in Convolutional Neural Networks (CNNs). These layers are designed to process grid-like data, such as images. By applying convolutional filters, they can detect local patterns and features, such as edges or textures, while maintaining spatial hierarchies. This structure is particularly effective for image recognition tasks, where understanding spatial relationships is crucial.

Another example is recurrent layers, found in Recurrent Neural Networks (RNNs). These layers are specifically designed to handle sequential data, such as time series or natural language. By maintaining a hidden state that captures information from previous inputs, RNNs can effectively model temporal dependencies and context. This makes them well-suited for tasks like language modeling and speech recognition.

As a final example, remember transformers, which have revolutionized natural language processing. Unlike traditional RNNs, transformers rely on self-attention mechanisms to weigh the importance of different input elements relative to one another. This allows them to capture long-range dependencies and contextual relationships more effectively than previous architectures. Transformers have become the backbone of many state-of-the-art models in NLP, enabling tasks such as translation and text generation.

By employing these specialized layers, we can create neural networks that don't attempt to approximate arbitrary functions but rather exploit the inherent structure of the problems we are trying to solve.

Representation Learning

One of the most powerful aspects of neural networks is their ability to perform representation learning, which can be understood as a sequence of increasingly abstract feature extraction mechanisms. This process allows neural networks to automatically discover and learn relevant features from raw data without requiring manual feature engineering. Essentially, each layer in a neural network transforms the input data into higher-level representations, capturing more complex patterns as the information flows through the network.

Consider an image classification task. When we analyze what each layer of a convolutional neural network (CNN) is learning, we can observe a fascinating progression. The initial layers typically act as simple feature detectors, identifying basic elements such as edges and textures in various orientations. These early detectors are crucial for understanding the fundamental building blocks of an image.

As we move deeper into the network, these simple features begin to combine into more complex shapes and patterns. For instance, the next layers might learn to recognise geometric shapes like circles and squares by aggregating the edge information detected in the earlier layers. Further down the line, these shape detectors merge into even more sophisticated representations, such as figure-like detectors that can identify parts of objects or specific patterns.

By the time we reach layers 20 or more in a deep CNN, the network has developed a highly abstract understanding of the input data. At this stage, it can accurately detect complex objects like dogs, cars, or houses based on the intricate features it has learned to recognise throughout its architecture.

This hierarchical approach to feature extraction means that almost any neural network designed for classification tasks can be viewed as a sequence of increasingly abstract and complex feature extractors.

Manifold Learning

Manifold learning is another insightful way to interpret what neural networks are doing during the learning process. When tackling problems like image classification, we can think of it as a complex instance of a nearest neighbour problem. For example, all images of cats share certain similarities, just as images of dogs do. However, this similarity is not immediately apparent in the input domain—the pixel values—because images that represent similar concepts (like two different cats) can be quite distant from each other in terms of pixel-by-pixel distance.

To understand this better, we can posit that a high-dimensional space exists where these images are represented more meaningfully. In this space, points corresponding to similar images are close together, while those representing fundamentally different objects—like dogs, ships, or houses—are far apart. The challenge is that this "true" image space is tangled and twisted, making it difficult to identify these relationships directly.

Manifold learning refers to the ability of neural networks to find a set of transformations that project the original data from the input space (e.g., pixels in image classification) into this complex high-dimensional space where similar objects (e.g., images of cats) cluster together. If we could untangle this manifold, we could perform a simple nearest-neighbour comparison in a more meaningful context. Neural networks do that implicitly.

We can thus view deep neural networks as a series of projections into increasingly complicated manifolds. Each layer in the network transforms the input data, gradually mapping it closer to this ideal space where similar objects are grouped together. The final layer of the network, right before the softmax classification, thus contains a very twisted and tangled projection of the original input, to the point it would be unrecognizable by humans. Still, it happens to be the projection that best clusters together similar objects.

Backpropagation

Training neural networks effectively hinges on the backpropagation algorithm, a pivotal method since its introduction in the 1970s. Backpropagation allows for the fine-tuning of weights within a neural network by computing how to adjust all parameters based on the error from the previous iteration. This feedback mechanism is essential for optimizing the network's performance, as it systematically reduces the error rate by adjusting weights to improve predictions.

The power of backpropagation lies in its ability to compute gradients efficiently, regardless of the network's size or complexity. By applying the chain rule of calculus, backpropagation calculates how weight changes affect the overall error function. This means that even in deep networks with many layers, backpropagation can determine the necessary adjustments for each weight, enabling training to an arbitrary degree of precision (provided the network has enough capacity, i.e., is big enough).

In theory, this makes all neural networks trainable, but in practice, achieving effective training often requires careful management of various factors. For instance, practitioners must navigate challenges such as vanishing and exploding gradients, which can impede learning in deep networks. Additionally, hyperparameter tuning and regularization techniques are often necessary to ensure convergence and prevent overfitting. We’ll tackle these problems in Part II.

Flexible Learning Objectives

Neural networks are trained using backpropagation, which requires a well-defined loss function to measure the learning error. This loss function approximates how far off the network's predictions are from the target values. The loss function must be differentiable for gradient descent to work effectively. This allows us to compute gradients and optimize the network's weights.

However, many learning objectives are not inherently differentiable. A prime example is classification error, often referred to as 0/1 loss. This type of loss is binary: you either classify an instance correctly or incorrectly, providing no gradient information for optimization. Fortunately, we can create differentiable approximations of such non-differentiable loss functions.

For instance, the binary cross-entropy loss is a commonly used differentiable approximation for 0/1 loss in binary classification tasks. It captures the essence of correct and incorrect classifications while allowing for a continuous range of error values. This enables the model to learn more effectively by providing meaningful gradient information even when predictions are not perfect.

Similarly, other tasks have their own tailored loss functions that facilitate learning. For example, a common loss function in regression tasks is Mean Squared Error (MSE), which measures the average squared difference between the predicted and actual values.

For multi-class classification problems, neural networks often use Categorical Cross-Entropy Loss, an extension of binary cross-entropy. This loss measures the dissimilarity between the predicted probability distribution and the true distribution over multiple classes, making it particularly effective for problems with many output categories.

Hinge loss is frequently employed in tasks like binary classification, which focuses on maximizing the margin between classes. This loss is also commonly used in support vector machines and some neural networks to ensure better separation between categories.

Contrastive loss is often used in more specialized applications, such as face recognition or metric learning. This loss function helps models learn embeddings by minimizing the distances between similar pairs of data points while maximizing the distances between dissimilar ones.

Each of these loss functions is designed to suit specific learning objectives while maintaining differentiability, ensuring that gradient descent can be applied effectively to a wide range of dissimilar tasks.

Conclusions

Neural networks are incredibly powerful and flexible mathematical constructs. They can approximate any continuous function, adapt to specific tasks through specialized architectures like convolutional, recurrent, and transformer layers, and automatically extract increasingly abstract features from raw data.

Additionally, they untangle complex data relationships by projecting inputs into high-dimensional manifolds where similar items cluster together. Finally, their ability to transform diverse learning objectives into differentiable loss functions enables effective optimization via gradient descent.

And we have a very powerful and general training algorithm—backpropagation—to ensure we can effectively make use of all these mathematical properties.

But while these strengths explain their theoretical power, they don’t fully account for their practical success. In our follow-up article, we’ll explore how neural networks are perfectly suited to scale with the increasing availability of data and compute power, which is key to their dominance in modern AI.

Artificial Neural Networks Are Nothing Like Brains

Alejandro Piad Morffis — Fri, 31 Jan 2025 16:48:32 GMT

Unless you've lived under a rock for the last couple of years, you already know that neural networks are the workhorse of AI. They’re everywhere—driving advancements from voice assistants to self-driving cars. But what makes them so unique?

Neural networks are not just a passing fad. They are incredibly powerful and flexible tools that can learn complex patterns from vast amounts of data. While they were initially inspired by how our brains work, today’s neural networks have evolved far beyond that simple analogy.

But before we further explore neural networks' power, we need to clarify one thing: artificial neural networks are nothing like the brain. This crucial distinction is often overlooked.

In this article, I will attempt to demystify the notion that artificial neural networks are anything like biological brains. And then, in a follow-up article, we’ll dive into the mathematical characteristics that give them flexibility and discuss how they scale with data and compute power. By the end, you’ll have a clearer picture of why these models are at the forefront of machine learning today.

Let’s get started!

Differences between Artificial Neural Networks and Biological Brains

Believing that NNs function similarly to biological neurons can lead to misconceptions about AI. For instance, it might make you think that artificial general intelligence (AGI) is just around the corner when, in reality, we are still far from achieving that level of complexity. It can also lead to anthropomorphizing AI, attributing human-like qualities and emotions to models that are fundamentally mathematical constructs. Understanding this difference is vital for grasping both the capabilities and limitations of neural networks.

Let’s explore how these models diverge from their biological counterparts.

Historical Inspiration of Neural Networks

Neural networks trace their roots back to a time when researchers sought to understand the brain's workings. In 1943, Warren McCulloch and Walter Pitts published a groundbreaking paper titled "A Logical Calculus of the Ideas Immanent in Nervous Activity." Their goal was not to create artificial brains but to construct a mathematical model that represented the behaviour of individual neurons. They aimed to explore how simple computational units could lead to complex behaviours, laying the groundwork for what would eventually become neural networks.

This idea sparked interest in connectionism in the early days, which focused on how interconnected simple units could replicate sophisticated cognitive functions. Researchers began to realise that these artificial neurons could work together to produce intricate activity patterns, similar to how biological neurons operate in the brain. However, this is where the biological inspiration largely ends.

As the field progressed, it became clear that while NNs were inspired by biology, they diverged significantly from it. The models developed were not intended to accurately replicate the brain's structure or function. Instead, they evolved into powerful computational tools that utilize mathematical constructs and algorithms, often bearing little resemblance to their biological counterparts. Understanding this distinction is crucial as it helps dispel myths about AI and clarifies what neural networks can and cannot do.

Differences in Structure

Artificial neural networks (ANNs) are constructed from mathematically simple units that perform differentiable computations. Each artificial neuron takes inputs, applies weights, and produces an output through a mathematical function. This process is significantly simpler than what occurs in a biological neuron, which involves complex electrochemical signalling and intricate interactions with other neurons.

When we examine the scale of the brain, the differences become even more pronounced. The human brain contains approximately 86 billion neurons, each forming thousands of connections with other neurons—estimates suggest there are around 100 trillion synapses in total.

In contrast, even the largest neural networks today, like LLaMA 3, operate as directed acyclic graphs with far fewer connections. For example, LLaMA 3 has about 400 billion parameters, which is more directly comparable to the number of inter-neuron connections in the brain than the number of neurons. This figure is nothing compared to the trillions of synapses in human brains.

Moreover, the complexity of brain connections is far more diverse than that seen in ANNs. Biological neurons can form various types of synapses and exhibit different firing patterns and neurotransmitter types, leading to a rich tapestry of connectivity that supports complex cognitive functions. In contrast, ANNs typically rely on a small set of fixed mathematical functions.

Differences in Learning Mechanisms

Most importantly, the learning mechanisms in NNs differ fundamentally from those in the brain.

The human brain learns in a complex and dynamic process. It adapts by forming new connections between neurons, a phenomenon known as synaptic plasticity. This allows the brain to strengthen or weaken connections based on experience, enabling it to learn from and adapt to new information. Additionally, the brain can create new neurons through a process called neurogenesis, particularly in regions like the hippocampus, which is associated with memory and learning. This lifelong capacity for learning and adaptation is a hallmark of biological intelligence.

In contrast, artificial neural networks (ANNs) operate under a different paradigm. The primary learning mechanism for NNs is backpropagation, which uses a straightforward mathematical operation to adjust weights based on the error of predictions. During training, the network calculates the gradient of the loss function with respect to each weight, allowing it to update those weights in a direction that minimizes error. This process relies on gradient descent, a method that iteratively adjusts weights to find the optimal solution.

However, gradient descent is biologically implausible for several reasons. First, there is no evidence that the brain stores gradients in any form. Instead, learning in biological systems occurs through more nuanced mechanisms involving complex biochemical processes and feedback loops. Second, while the human brain can learn continuously throughout life, modern ANNs typically have a distinct training phase followed by an inference phase. Once trained, these models do not adapt or learn from new data unless they undergo retraining.

Additionally, while some promising research into lifelong learning algorithms and dynamic neural networks aim to mimic the brain's ability to adapt and reorganize, this area remains largely experimental and is nowhere near mainstream application. The neural networks used in practical AI systems today are predominantly static; they do not create new neurons or modify their structure based on experience. This static nature further underscores the limitations of current neural networks compared to the brain's remarkable capacity for continuous learning and adaptation.

Some Necessary Nuances

Convolutional Neural Networks (CNNs) provide an interesting case of biological inspiration in artificial intelligence. While they are indeed inspired by the architecture and essential functions of biological vision, it's important to note that CNNs are not fully biologically plausible, nor do they serve as accurate simulations of the brain's visual processing systems. Instead, they represent a pragmatic approach to leveraging insights from the natural world while optimizing for computational efficiency.

CNNs mimic certain aspects of the visual cortex, particularly in processing spatial information. They employ local connectivity and shared weights, allowing them to detect patterns and features in images effectively. This design draws from how biological neurons respond to localized areas of visual stimuli, making CNNs well-suited for tasks like image recognition. However, while they capture some essential characteristics of biological vision, they do not replicate the full complexity or diversity of neural architectures in the brain.

The key takeaway is that CNNs exemplify how we can take valuable inspirations from biology and apply them to create effective computational systems. By focusing on what is useful for solving specific tasks—like recognizing images or classifying objects—researchers can forgo unnecessary biological complexities that do not translate into computational advantages. This pragmatic approach allows us to harness the strengths of both biological insights and modern computing capabilities, paving the way for advancements in AI while acknowledging the limitations of current models compared to their biological counterparts.

Conclusions

Believing neural networks are direct analogs of biological systems can lead to dangerous myths and misconceptions. This misunderstanding can create unrealistic expectations about what AI can achieve and foster a false sense of security regarding its capabilities.

For example, one common myth is that neural networks are on the verge of achieving artificial general intelligence (AGI), which is far from the truth. In reality, the complexities of human cognition are not just a matter of scaling up current models; they involve intricate processes that we have yet to fully understand.

Another misconception is that neural networks learn and adapt like humans. While they can be trained on vast amounts of data, they cannot learn continuously throughout their lifetime or create new connections like the brain does. This static nature means that once trained, most neural networks cannot adapt to new information without retraining.

Additionally, there is a tendency to anthropomorphize AI systems, attributing them human-like qualities. This misrepresents their capabilities and obscures the ethical implications of deploying such technology in society.

As we continue to explore the powerful potential of neural networks in modern AI, it’s essential to approach these technologies with a clear understanding of their limitations and differences from biological systems. If you're interested in learning more about why neural networks are so powerful and prevalent in today's AI landscape, stay tuned for the follow-up article!