<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0" xmlns:itunes="http://www.itunes.com/dtds/podcast-1.0.dtd" xmlns:googleplay="http://www.google.com/schemas/play-podcasts/1.0"><channel><title><![CDATA[The Computist Journal: 🤖 Mostly Harmless AI]]></title><description><![CDATA[Explaining the potential and limitations of AI for everyone. ]]></description><link>https://blog.apiad.net/s/mostly-harmless-ai</link><image><url>https://substackcdn.com/image/fetch/$s_!qNGT!,w_256,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F582c72c0-c120-4ea8-ae6b-376a025250bb_1024x1024.png</url><title>The Computist Journal: 🤖 Mostly Harmless AI</title><link>https://blog.apiad.net/s/mostly-harmless-ai</link></image><generator>Substack</generator><lastBuildDate>Fri, 12 Jun 2026 19:23:37 GMT</lastBuildDate><atom:link href="https://blog.apiad.net/feed" rel="self" type="application/rss+xml"/><copyright><![CDATA[Alejandro Piad Morffis]]></copyright><language><![CDATA[en]]></language><webMaster><![CDATA[apiad@substack.com]]></webMaster><itunes:owner><itunes:email><![CDATA[apiad@substack.com]]></itunes:email><itunes:name><![CDATA[Alejandro Piad Morffis]]></itunes:name></itunes:owner><itunes:author><![CDATA[Alejandro Piad Morffis]]></itunes:author><googleplay:owner><![CDATA[apiad@substack.com]]></googleplay:owner><googleplay:email><![CDATA[apiad@substack.com]]></googleplay:email><googleplay:author><![CDATA[Alejandro Piad Morffis]]></googleplay:author><itunes:block><![CDATA[Yes]]></itunes:block><item><title><![CDATA[Mostly Harmless AI v2.0 is here!]]></title><link>https://blog.apiad.net/p/mostly-harmless-ai-v20-is-here</link><guid isPermaLink="false">https://blog.apiad.net/p/mostly-harmless-ai-v20-is-here</guid><dc:creator><![CDATA[Alejandro Piad Morffis]]></dc:creator><pubDate>Sun, 31 May 2026 11:01:58 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!xDXS!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F371bb65b-4fc0-42dd-86a2-27cb5b9f97d0_1376x768.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!xDXS!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F371bb65b-4fc0-42dd-86a2-27cb5b9f97d0_1376x768.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!xDXS!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F371bb65b-4fc0-42dd-86a2-27cb5b9f97d0_1376x768.jpeg 424w, https://substackcdn.com/image/fetch/$s_!xDXS!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F371bb65b-4fc0-42dd-86a2-27cb5b9f97d0_1376x768.jpeg 848w, https://substackcdn.com/image/fetch/$s_!xDXS!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F371bb65b-4fc0-42dd-86a2-27cb5b9f97d0_1376x768.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!xDXS!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F371bb65b-4fc0-42dd-86a2-27cb5b9f97d0_1376x768.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!xDXS!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F371bb65b-4fc0-42dd-86a2-27cb5b9f97d0_1376x768.jpeg" width="1376" height="768" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/371bb65b-4fc0-42dd-86a2-27cb5b9f97d0_1376x768.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:768,&quot;width&quot;:1376,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:839252,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://blog.apiad.net/i/199956563?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F371bb65b-4fc0-42dd-86a2-27cb5b9f97d0_1376x768.jpeg&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!xDXS!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F371bb65b-4fc0-42dd-86a2-27cb5b9f97d0_1376x768.jpeg 424w, https://substackcdn.com/image/fetch/$s_!xDXS!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F371bb65b-4fc0-42dd-86a2-27cb5b9f97d0_1376x768.jpeg 848w, https://substackcdn.com/image/fetch/$s_!xDXS!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F371bb65b-4fc0-42dd-86a2-27cb5b9f97d0_1376x768.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!xDXS!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F371bb65b-4fc0-42dd-86a2-27cb5b9f97d0_1376x768.jpeg 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Hey! Happy Sunday. Here is a quick note to remind you that <em>Mostly Harmless AI</em>, second edition, draft one, is finished. PDF and EPUB on <a href="https://store.apiad.net/l/ai">Gumroad</a>. Free online reader at <a href="https://books.apiad.net/books/mhai">books.apiad.net</a>. Both live, right now.</p><p>This book is a permanent beta, because the AI story is being created as I write these words and it&#8217;s moving just too damn fast for anyone to put it down cleanly in a book. So don&#8217;t expect the kind of polishing that comes with retrospective looks at settled topics. This isn&#8217;t that kind of book. It&#8217;s raw, but up-to-date. It contains basically everything I know and believe about AI that doesn&#8217;t <em>require</em> code or math to understand it.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://store.apiad.net/l/ai&quot;,&quot;text&quot;:&quot;Get Mostly Harmless AI - 50% off&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://store.apiad.net/l/ai"><span>Get Mostly Harmless AI - 50% off</span></a></p><h2>Sixteen chapters, mechanism-first</h2><p>Three parts, sixteen chapters, around two hundred and sixty pages, almost three hundred footnote citations.</p><p><strong>Part I &#8212; Foundations</strong> walks the mechanisms, from the symbolic systems of the 1950s to the agent loops of 2026. Classical AI, machine learning, deep learning, language modeling, generative AI, agentic AI. If you have ever wondered what <em>attention</em>, <em>gradient descent</em>, or <em>RLHF</em> actually mean (not the marketing line, the mechanism), start here.</p><p><strong>Part II &#8212; Applications</strong> shows how AI is reshaping each field: knowledge work, scientific research, software development, education, creative work, policy. Dual-audience throughout. Experts get practice and gotchas. Non-experts see why their lawyer, their doctor, and their kid&#8217;s teacher are suddenly using these tools.</p><p><strong>Part III &#8212; Dangers</strong> is the honest one. Alignment, the limits of language models, and the harms already in the world today: scammers cloning your mother&#8217;s voice, autonomous weapons that pick their own targets, hiring algorithms that quietly downrank you, jobs evaporating from under people who built careers in them. The existential-risk question is in there too, treated proportionally. </p><p>The book lands on a third position beyond maximalists and doomerists. These systems are powerful enough to demand real responsibility and limited enough that the worry should be about who deploys them, not about the silicon waking up. The future is not predetermined. Neither doomer nor utopian framings are right. We choose, and choices have responsibilities attached.</p><p><a href="https://blog.apiad.net/p/mostly-harmless-ai-the-book-that">The first-draft post from two weeks ago</a> has the longer description if you want more.</p><h2>Free online, or pay for offline</h2><ul><li><p><strong>Read it free online</strong>, in perpetuity, at <a href="https://books.apiad.net/books/mhai">books.apiad.net</a>. Clean typography, gentle dark mode, inline footnotes. No popups, no signup, no tracking. The reader was built for these books specifically.</p></li><li><p><strong>PDF and EPUB on Gumroad</strong> if you want it offline, on your e-reader, on a plane: <a href="https://apiad.gumroad.com/l/ai">apiad.gumroad.com/l/ai</a>. Launch week is 50% off.</p></li></ul><p>If the online reader is free, then why would you buy it? Well, you don&#8217;t <em>have </em>to.  But here is why you may want to do it anyway.</p><p>Writing this book took two years. The second edition took three months of intense rewriting on top of that. If you find value in the work and want more of it to exist (these books, the Computist Library, the journal you are reading), paying for the PDF is how you tell me that. I do not run ads. I do not sell your attention. Every book sale is one person telling me to keep going, and the money buys the months it takes to write the next one.</p><p>If you cannot pay, read it free. No guilt, no caveat. That option exists on purpose.</p><h2>June is algorithms month</h2><p>After today, I am putting AI writing to sleep for a couple of months, on purpose.</p><p>Not the work &#8212; I will still be building AI tools and using these systems every day. But on this journal, starting tomorrow and through July, I am switching to <em>algorithms and core computer science</em>, with <a href="https://apiad.gumroad.com/l/codex">The Algorithm Codex</a> as the spine. The Codex is the next book in the Computist Library, written in parallel with MHAI and ready for center stage. Already 200 or so pages, and 50+ algorithms in, but still very rough around the edges, in dire need of your feedback.</p><p>During June, you can expect daily posts, Monday through Friday, each anchored on a chapter. Why the fastest algorithm ever devised runs in essentially constant time. Why most programmers get binary search wrong. Why no comparison sort can beat n log n, and the loophole that lets you anyway. Why some algorithms become <em>settled facts</em> the way theorems do.</p><p>AI returns to the journal in August. The field will surely have moved a lot by then, and there will be more than a few chapters to retouch.</p><p>So if you come here for the AI material, this book is the most complete thing I have on it &#8212; go read that. If you have been curious about the algorithms side, the next two months will be the best stretch of it I publish all year.</p><p>That&#8217;s all for today. Tomorrow I&#8217;ll meet you again with a brand new article.</p><p>Until then, stay curious.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://store.apiad.net/l/ai&quot;,&quot;text&quot;:&quot;Get Mostly Harmless AI - 50% off&quot;,&quot;action&quot;:null,&quot;class&quot;:&quot;button-wrapper&quot;}" data-component-name="ButtonCreateButton"><a class="button primary button-wrapper" href="https://store.apiad.net/l/ai"><span>Get Mostly Harmless AI - 50% off</span></a></p>]]></content:encoded></item><item><title><![CDATA[Introducing Aegis: the programable multi-agent meta-harness]]></title><description><![CDATA[Or -- I did a thing and I want to show it to you]]></description><link>https://blog.apiad.net/p/introducing-aegis-the-programable</link><guid isPermaLink="false">https://blog.apiad.net/p/introducing-aegis-the-programable</guid><dc:creator><![CDATA[Alejandro Piad Morffis]]></dc:creator><pubDate>Fri, 29 May 2026 11:05:05 GMT</pubDate><enclosure url="https://images.unsplash.com/photo-1541543975512-86aad5d2cf93?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHw1fHxzaGllbGR8ZW58MHx8fHwxNzc5OTM3MzY2fDA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080" length="0" type="image/jpeg"/><content:encoded><![CDATA[<blockquote><p><em><strong>On May 31 we launch <a href="https://store.apiad.net/l/ai">Mostly Harmless AI</a> v2.</strong> This arc &#8212; how models learn, how agents work, where they break, and what it takes to build something real on top of them &#8212; is now a book, updated for May 2026. Newsletter subscribers get 50% off.</em></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://store.apiad.net/l/ai&quot;,&quot;text&quot;:&quot;Get Mostly Harmless AI - 50% off&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://store.apiad.net/l/ai"><span>Get Mostly Harmless AI - 50% off</span></a></p></blockquote><div><hr></div><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://images.unsplash.com/photo-1541543975512-86aad5d2cf93?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHw1fHxzaGllbGR8ZW58MHx8fHwxNzc5OTM3MzY2fDA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://images.unsplash.com/photo-1541543975512-86aad5d2cf93?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHw1fHxzaGllbGR8ZW58MHx8fHwxNzc5OTM3MzY2fDA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080 424w, https://images.unsplash.com/photo-1541543975512-86aad5d2cf93?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHw1fHxzaGllbGR8ZW58MHx8fHwxNzc5OTM3MzY2fDA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080 848w, https://images.unsplash.com/photo-1541543975512-86aad5d2cf93?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHw1fHxzaGllbGR8ZW58MHx8fHwxNzc5OTM3MzY2fDA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080 1272w, https://images.unsplash.com/photo-1541543975512-86aad5d2cf93?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHw1fHxzaGllbGR8ZW58MHx8fHwxNzc5OTM3MzY2fDA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080 1456w" sizes="100vw"><img src="https://images.unsplash.com/photo-1541543975512-86aad5d2cf93?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHw1fHxzaGllbGR8ZW58MHx8fHwxNzc5OTM3MzY2fDA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080" width="8192" height="5461" data-attrs="{&quot;src&quot;:&quot;https://images.unsplash.com/photo-1541543975512-86aad5d2cf93?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHw1fHxzaGllbGR8ZW58MHx8fHwxNzc5OTM3MzY2fDA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:5461,&quot;width&quot;:8192,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;two silver swords on shield&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="two silver swords on shield" title="two silver swords on shield" srcset="https://images.unsplash.com/photo-1541543975512-86aad5d2cf93?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHw1fHxzaGllbGR8ZW58MHx8fHwxNzc5OTM3MzY2fDA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080 424w, https://images.unsplash.com/photo-1541543975512-86aad5d2cf93?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHw1fHxzaGllbGR8ZW58MHx8fHwxNzc5OTM3MzY2fDA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080 848w, https://images.unsplash.com/photo-1541543975512-86aad5d2cf93?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHw1fHxzaGllbGR8ZW58MHx8fHwxNzc5OTM3MzY2fDA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080 1272w, https://images.unsplash.com/photo-1541543975512-86aad5d2cf93?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHw1fHxzaGllbGR8ZW58MHx8fHwxNzc5OTM3MzY2fDA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Photo by <a href="https://unsplash.com/@worldsbetweenlines">Patrick Hendry</a> on <a href="https://unsplash.com">Unsplash</a></figcaption></figure></div><p>For the better part of the last two months I&#8217;ve been drilling you with a bunch of inter-connected ideas, all gravitating around the notion of agentic reliability. We started with <a href="https://blog.apiad.net/p/mhai-llms">how these models learn</a>, argued about <a href="https://blog.apiad.net/p/70-years-of-ai-history-in-10-minutes">seventy years of AI history</a>, traced <a href="https://blog.apiad.net/p/its-tokens-all-the-way-down">the strange logic of prediction</a>, dissected <a href="https://blog.apiad.net/p/the-anatomy-of-a-linguistic-ai-agent">what an agent actually is</a>, then spent two posts on the edges: <a href="https://blog.apiad.net/p/how-to-write-a-cli-an-agent-will">how to write tools agents can really use</a>, and <a href="https://blog.apiad.net/p/the-80-ai-reliability-horizon">what happens when you push them to their limits</a>.</p><p>This is the last post of that arc, and it embodies my vision of where this whole Agentic AI thing is going. And to show I&#8217;m putting my money where my mouth is, this post is about the tools I&#8217;m building to bring forth that vision.</p><p>But first, why should you care?</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://blog.apiad.net/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://blog.apiad.net/subscribe?"><span>Subscribe now</span></a></p><h2>Agents aren&#8217;t just for coders</h2><p>Here&#8217;s something the AI tool industry quietly gets wrong: every major agent harness &#8212; <a href="https://claude.ai/code">Claude Code</a>, <a href="https://ai.google.dev/gemini-api/docs/gemini-cli">Gemini CLI</a>, <a href="https://cursor.com/">Cursor</a>, <a href="https://opencode.ai/">OpenCode</a> &#8212; is pitched as a developer tool. And technically, yes, they drive code.</p><p>But code is how agents <em>solve problems</em>, not what the problems are. When you ask an agent to research a topic, draft a document, reorganize a folder, schedule your week, or synthesize a dozen sources into one coherent brief &#8212; none of that is fundamentally a programming task. The coding is incidental. A historical accident of the fact that the people who built these tools happened to be programmers solving programmer problems.</p><p>So if you&#8217;re a researcher, a writer, a manager, a student, a scientist who occasionally touches a terminal &#8212; this piece is for you too.</p><h2>What&#8217;s wrong with existing tools</h2><p>I use Claude Code every day. The harnesses &#8212; Claude Code, Gemini CLI, Cursor, OpenCode &#8212; are genuinely good. The agentic loop is robust. Tool design is sharp. Context management is solved for practical purposes. I&#8217;ve opened ten-thousand-line projects and the agent knows exactly where it left off.</p><p>What&#8217;s missing is not single agent capabilities. It&#8217;s <em>coordination</em>.</p><p>Claude Code runs on Claude. Geminis CLI runs on Gemini. OpenCode allows any provider but you cannot use your existing subscriptions (which are heavily subsidized); you have to pay API rates.</p><p>If you&#8217;re running multiple agents with different models from different providers, you end up with four windows open: one per harness, one per model, one per subscription you&#8217;re already paying for. Different tools with slightly different annoying quirks and ways to do and call the same things.</p><p>But you can still make them collaborate, just not easily.</p><p>Most harnesses support sub-agents, which is a subroutine: the main agent mints a new subagent for a specific call, call it, it runs, it returns, the main agent continues.</p><p>What they don&#8217;t support is a mid-work handoff. Imagine you&#8217;re two hours into a brainstorm with Claude, a question has emerged that you cannot answer without trying some code, and now you want a second agent &#8212; different model, fresher context, perhaps Kimi from OpenCode Zen &#8212; to take over a brief coding session, and then handoff back to Claude what it found so brainstorming can continue.</p><p>That is somewhat achievable with subagents (except they run autonomously and die, you cannot, in most tools, interact with them and steer them into new directions). But what you cannot do, is have Kimi return to Claude, and stay up, waiting (with its full context alive) for a second follow up question to continue exploration.</p><p>That transfer doesn&#8217;t exist today. You have to ask Claude to produce a handoff manually, paste it in Kimi explaining the situation from scratch. The paste its response back to Claude, and so on. You have become a secretary between two AI agents.</p><p>Agents are isolated in all these tools. Two agents working on the same document will overwrite each other. There&#8217;s no locking, no merge protocol, no queue where one drops a task and another picks it up. No broadcast so all agents know when the plan changes. Agents don&#8217;t share a world &#8212; they each have their own private window into one.</p><p>I looked around before building. The closest I found: Conductor, which orchestrates multi-agent workflows &#8212; but only for Claude Code. And T3.codes, which drives any harness, closer to the spirit of what I wanted, but neither has cracked the coordination layer as I envision it.</p><h2>Introducing Aegis</h2><p>So, of course, I had to go and make my own. (Quick digression, if you&#8217;ve been reading this blog for a while you know I love to reinvent wheels if only for the learning experience, but this is a case where I genuinely couldn&#8217;t find something good enough.)</p><p>Here&#8217;s what makes Aegis different to anything out there, embodied in its slogan: the programable multi-agent meta-harness. Let&#8217;s build it from the back.</p><p><strong>Meta-harness</strong>. You&#8217;re already paying for Claude via your Anthropic subscription. Gemini via your Google account. If a new tool wants to drive both, it has two options. It can re-authenticate you through its own layer: API keys, rate limits, and lost subscription benefits. Or it can call the native tools, which already have your credentials.</p><p>Aegis takes the second path. It drives Claude Code over its <code>stream-json</code> protocol, Gemini CLI and OpenCode over the Agent Client Protocol &#8212; calling the binaries on your machine, which already have your auth. It doesn&#8217;t touch your subscriptions. You stop worrying about which model wins this month&#8217;s benchmark.</p><p>And because Aegis calls the native harnesses rather than reimplementing them, it inherits everything they&#8217;ve spent months polishing: the agentic loop, the context management, the permission model. The harness keeps owning tool use, sandboxing, model selection. Aegis owns the layer above &#8212; tabs, routing, delegation &#8212; the things a single-conversation CLI was never built to do.</p><p><strong>Multi-agent.</strong> Aegis provides six inter-agent syncronization primitives, built incrementally one above the others, to give you increasingly more powerful multi-agent capabilities.</p><p>The first primitive is a per-agent inbox. Any agent (including you) can hand another agent a message, that gets enqueued until the end of the current turn. This alone enables solving the problem we were just discussing in the previous section.</p><p>Then, on top of that, they got canvases: markdown files shared across agents with per section looks and callbacks that awake an agent when another finishes writing a section.</p><p>Then we got <em>real</em> terminals. Not an agent calling bash on a subprocess and blocking on its result. A real, shared, fully interactive terminal session that multiple agents can scan, tail, and write to, And you can too. One runs a command; another sees the output in real time and reacts. Or you run the backend and ask your agent to look at the logs when that heissenbug happens.</p><p>So far this allows you to spawn several agents and have them collaborate. But you can also have queues. Any agent can drop a task (a prompt) and the queue auto-spawns an ephemeral agent to take it, potentially calling back the emitter once done. Queues have a maximum cap on parallelism, as well as arbitrary rolling budgets on tokens and dollars so you keep control of how much work is allowed to happen without your supervision.</p><p>Agents can also be added into groups, dynamically created and destroyed on demand, by you or any other agent. Groups have a shared inbox and you can subscribe to them and get notified when the first, any, or all the agents in the group finish. This allows committee-like flows where different agents analyze a problem in parallel.</p><p><strong>Programmable.</strong> And finally, you get workflows. Deterministic Python code that drives agent calls in sequence, with branching, conditionals, and loops. Think skills, but instead a hopeful blob of markdown one agent can choose to interpret as they want, these are composable routines that drive the entire substrate in the exact level of control you desire.</p><p>When you write a complex workflow in natural language, you&#8217;re hoping the agent follows through. It might decide step two is better done differently, skip the commit because something caught its eye, or forget step three entirely. A Python workflow doesn&#8217;t forget: it runs step 1, then step 2, then step 3, and commits. You wrote <code>commit()</code> in the code; the code runs. You get the agent&#8217;s creativity at each step; the program guarantees the steps happen.</p><p>Workflows can be scheduled: declare a cron entry in <code>.aegis.yaml</code> and the substrate fires it while you sleep. They also run across machines: one agent on your laptop can enqueue a task to a remote Aegis instance on a VPS and get the result back in its inbox.</p><p><em>Quick aside</em>: Yesterday, as I was polishing this post, Anthropic announced <em>Dynamic Workflows</em> &#8212; a way to orchestrate long-lived agents over dozens of hours of work. I haven&#8217;t tested it yet, but It seems geared toward the same problem I&#8217;m trying to solve.</p><p>The difference is in the philosophy. Anthropic&#8217;s principle is to give agents as much agency as possible: trust the model, let it decide how to get there. Don&#8217;t get in the way, you stupid human. Tokens go brr. It&#8217;s the reason why all their solutions to problems are humoungos one-shot prompts.</p><p>My philosophy runs the other way. Leave agent creativity where it can do the most good &#8212; in the actual work, not in deciding whether step three happens after step four. The deterministic spine isn&#8217;t a constraint on the agent. Its what makes it work despite agent idiosincrasies, and why it works across all agents and harnesses, regardless of their intrinsic capabilities.</p><p>I&#8217;m using Aegis now for about 50-60% of my coding. It&#8217;s still rough around the edges, but it&#8217;s way more fun to use than any single CLI. There is a lot more in the box, like remote sessions, a built-in file browser, lots of metrics... but this post is already way too long. You&#8217;ll have to check it out on your own. Links at the end.</p><h2>What I intentionally left out</h2><p>Aegis has no native concept of skills. No AGENTS.md injected automatically. No memory system.</p><p>Those things are conventions, and conventions change. What belongs in an AGENTS.md today looks different from what it&#8217;ll look like in six months. Memory systems have a dozen competing designs and no consensus. If I&#8217;d baked any of that in, you&#8217;d be stuck with my choices the moment the community moved on.</p><p>Aegis has a very powerful plugin system instead (I told you, it is programable). You write a pure Python function, drop it on some folder, and it gets called anywhere in the agent lifecycle.</p><p>Want skills that activate on context? Write a plugin. Want a memory system? Write a plugin. Want to inject per-repository knowledge before every session? Write a plugin. The conventions you need are yours to build, and when they change, you change them, not me.</p><h2>Coda</h2><p>Claude Code and Gemini CLI are applications. You open them, use them, close them. Aegis is more than that. It&#8217;s a framework. You build on top of it &#8212; applications that spin up the agentic substrate automatically, pull in whichever harness fits the task, and run without anyone at the keyboard.</p><p>Picture a self-hosted Git forge where pushing a branch triggers agents: one reviews the code, one hunts bugs, one picks up open issues and starts implementing. Everyone on their own worktrees, independent, parallel, coding while you sleep. Push code; agents work.</p><p>That&#8217;s <strong>Sindri</strong>, and I&#8217;m also building it; but that&#8217;s a story for another Friday.</p><p>Aegis is open source: <a href="https://github.com/apiad/aegis">github.com/apiad/aegis</a>. <code>pip install aegis-harness</code> to start.</p><div><hr></div><p><em><strong>May closes the agentic AI arc.</strong> If you want the full conceptual foundation &#8212; model internals, agent architecture, tool design, failure modes &#8212; it&#8217;s in <strong><a href="https://store.apiad.net/l/ai">Mostly Harmless AI v2</a></strong>, launching May 31. Subscribers get 50% off.</em></p><p><em>Also check the <a href="https://store.apiad.net/l/compendium">Compendium</a> &#8212; one fixed price, every educational project I&#8217;ve built or will build, yours in perpetuity. Buy once, get everything.</em></p><p><em><strong>In June: algorithms.</strong> The other half of the computational story &#8212; sorting, searching, graphs, optimization, the classical toolkit that AI didn&#8217;t replace and won&#8217;t. All month, one idea at a time. See you there.</em></p><p><em>Until next time, stay curious.</em></p>]]></content:encoded></item><item><title><![CDATA[Why AI Agents Need Structure]]></title><description><![CDATA[A 5-step framework for solving any]]></description><link>https://blog.apiad.net/p/why-ai-agents-need-structure</link><guid isPermaLink="false">https://blog.apiad.net/p/why-ai-agents-need-structure</guid><dc:creator><![CDATA[Alejandro Piad Morffis]]></dc:creator><pubDate>Wed, 27 May 2026 10:43:12 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!vQrT!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6b579226-2570-417d-82f1-3bab6546883b_1376x768.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!vQrT!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6b579226-2570-417d-82f1-3bab6546883b_1376x768.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!vQrT!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6b579226-2570-417d-82f1-3bab6546883b_1376x768.jpeg 424w, https://substackcdn.com/image/fetch/$s_!vQrT!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6b579226-2570-417d-82f1-3bab6546883b_1376x768.jpeg 848w, https://substackcdn.com/image/fetch/$s_!vQrT!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6b579226-2570-417d-82f1-3bab6546883b_1376x768.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!vQrT!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6b579226-2570-417d-82f1-3bab6546883b_1376x768.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!vQrT!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6b579226-2570-417d-82f1-3bab6546883b_1376x768.jpeg" width="1376" height="768" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/6b579226-2570-417d-82f1-3bab6546883b_1376x768.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:768,&quot;width&quot;:1376,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:735692,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://blog.apiad.net/i/199356169?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6b579226-2570-417d-82f1-3bab6546883b_1376x768.jpeg&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!vQrT!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6b579226-2570-417d-82f1-3bab6546883b_1376x768.jpeg 424w, https://substackcdn.com/image/fetch/$s_!vQrT!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6b579226-2570-417d-82f1-3bab6546883b_1376x768.jpeg 848w, https://substackcdn.com/image/fetch/$s_!vQrT!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6b579226-2570-417d-82f1-3bab6546883b_1376x768.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!vQrT!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6b579226-2570-417d-82f1-3bab6546883b_1376x768.jpeg 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"><em>Actual footage of an AI handing off a research spec to another AI&#8212;it&#8217;ll make sense in the end.</em> (Crazy the kinds things you can see in the wild, huh?)</figcaption></figure></div><blockquote><p><em>Every post this month is on the theme of building AI agents that actually work &#8212; anchored on the second edition of <a href="https://apiad.gumroad.com/l/ai">Mostly Harmless AI</a>, 50% off during early access, where the five-phase structure this post describes is a full chapter with more failure cases, the artifact design patterns, and the context isolation mechanics in detail. You can also <a href="https://books.apiad.net/books/mhai/">read the whole book online for free</a>. More at the end.</em></p></blockquote><p>Last night you give your AI agent a clear task. It worked hard for two hours. YOu woke up to a report that is technically correct and completely useless.</p><p>I&#8217;ve had this experience enough times that I stopped blaming the model years ago. The failure is always structural, and I mean that in a very specific, diagnosable sense.</p><p>AI labs fine-tune language models to produce <em>output</em>, not to question the frame. The whole reward cycle behind how these models are trained pushes them toward helpful completions; nobody in that loop is rewarding a model for pausing mid-task and asking &#8220;have you considered that you might be framing this wrong?&#8221;</p><p><strong>Execution</strong> is the default. That setting works beautifully for atomic tasks: create a note, write a short email, implement a specific function, compare two clearly-defined options. It fails quietly, and expensively, when the real problem isn&#8217;t the task itself but the goal behind it. Here&#8217;s three examples:</p><p>First: you ask an agent to &#8220;implement user authentication&#8221; for your web application. Clear task. The agent gets to work, producing a technically sound implementation using JWT tokens, bcrypt password hashing, and session management.</p><p>Second: you ask an agent to &#8220;write a technical report on renewable energy storage.&#8221; Again, clear task. The agent downloads papers, synthesizes findings, produces a well-structured document.</p><p>Third: a family asks an agent to &#8220;plan our move to a new city.&#8221; Thorough research follows &#8212; neighborhoods, school districts, moving companies, cost estimates. In every case, the output is technically correct.</p><p>In every case, the person who asked ends up with something that doesn&#8217;t quite fit. The gap between what was asked and what was needed &#8212; that gap is what this article is about.</p><p><em>The structure is the problem, not the model.</em> </p><p>Not the model&#8217;s intelligence, not the quality of your prompt, not the temperature setting. The structure of the <em>workflow</em> you handed the agent is the thing that determines whether it answers the right question or a reasonable-sounding wrong one. In this article, I&#8217;ll show you what the right structure is. But to understand <em>why</em> it must be so, we need to see how we got there.</p><p>The first fix the industry reached for was planning. It helped. It wasn&#8217;t enough. Strap on for a story.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://blog.apiad.net/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://blog.apiad.net/subscribe?"><span>Subscribe now</span></a></p><h2>Plan First, Build Second</h2><p>The <em>structure</em> is the problem, not the model. Right? So the first fix the industry reached for was adding a <strong>planning mode</strong>.</p><p>If you&#8217;ve used any of the major agentic coding tools over the past couple of years, you&#8217;ve seen this pattern. There&#8217;s a <em>plan mode</em>: read-only, no side effects, meant for thinking through the task. Then a <em>build mode</em>: where the agent executes.</p><p>The intuition behind it is sound &#8212; separating thinking from doing is genuinely better than collapsing them into a single continuous stream. When you mix planning and execution in the same pass, the agent makes irreversible changes based on its first interpretation of the goal.</p><p>A planning mode forces a pause. You get to inspect the plan, argue with it, revise it before anything is written to disk. The plan is the first concrete artifact in the handoff chain. You can hold it in your hands &#8212; or at least on your screen &#8212; and have a real conversation about it.</p><p>Genuine progress. I don&#8217;t want to minimize it. But watch what happens with the family move.</p><p>The agent produces a thorough plan: compare three target neighborhoods, compare school districts in each, contact five moving companies for quotes, estimate total relocation costs, build a timeline. Excellent plan. The family reviews it, nods along, says &#8220;looks good,&#8221; and the agent executes.</p><p>Six months later: they&#8217;re living in a city that made complete sense on paper &#8212; except the job that drove the whole move turned out to be fully remote. They never needed to relocate at all, or they could have moved somewhere half the price with better schools.</p><p>The plan was correct. The execution was correct. The <em>goal</em> was wrong. And here&#8217;s the point: the agent that built the plan never questioned whether &#8220;plan a move to City X&#8221; was the right goal, because it was never asked to. It took the first reasonable interpretation of the prompt &#8212; &#8220;they want to move, let me help them move well&#8221; &#8212; and planned confidently inside that frame.</p><p>The software auth example is shorter but sharper. &#8220;Implement user authentication&#8221;: the agent plans for JWT tokens, bcrypt hashing, session storage. Fine plan. Sound, even. For a single-tenant web app. This is a multi-tenant SaaS product (one where dozens of customers share the same running application.)</p><p>The plan was never going to catch that, because nobody told the planning agent that multi-tenancy was a constraint. It did exactly what it was asked.</p><p>Now, the objection you might be forming: &#8220;just review the damn plan carefully before you approve it.&#8221; That objection is right about something, review does matter, bit it misses the fact that the agent that produced the plan has already <em>committed</em> to a frame.</p><p>Every question it asks, every tradeoff it surfaces, every option it presents &#8212; all of it is already shaped by its first interpretation of the goal. When you review that plan, you&#8217;re not reviewing a neutral set of options. You&#8217;re reviewing a plan that already selected its own success criteria. The frame is invisible because it was never made explicit.</p><p>Planning without exploration locks in the first reasonable goal.</p><h2>Research Before the Plan</h2><p>Once more, the <em>structure</em> is the problem, not the model. The next obvious fix is to put a <strong>research</strong> phase in front of planning.</p><p>By the time we hit mid-2025, a third mode was appearing in serious agentic setups. A <em>research phase</em>: read-only, job is to understand the problem space, not to produce a solution. The artifact it creates is a description, not a prescription &#8212; a document that maps what is known before anyone decides what to do about it.</p><p>The intuition is right again: if the planning agent doesn&#8217;t know what it doesn&#8217;t know, it can&#8217;t plan well. Research is how it finds out. For the renewable energy report, thorough research might surface the fact that the intended audience is policymakers, not engineers &#8212; which changes the vocabulary, the technical depth, and the document&#8217;s opening frame.</p><p>Real progress, again. And again, not enough.</p><p>Watch the renewable energy report closely. The agent runs a solid research phase: downloads twenty recent papers, reads industry reports, synthesizes the state of the art on battery storage, hydrogen carriers, pumped hydro, and grid-scale thermal systems. Then it transitions into planning mode &#8212; in the same context window. And here is the problem: <em>the same context window</em>.</p><p>The planning agent isn&#8217;t a fresh mind looking at a research report. It&#8217;s the same agent, carrying everything it concluded during research, now deciding how to structure the work. If the research agent concluded &#8220;battery storage is the central challenge in the energy transition,&#8221; the planning agent will structure the report around battery storage. Not because it made a new decision &#8212; because it never had the opportunity to question the prior decision. It just kept going. The research was excellent. The plan followed naturally from the research. The report answered the question the research agent found most interesting. Not necessarily the question the reader needed answered.</p><p>The steelman still stands, and research sharpens it. The steelman says: &#8220;planning alone is enough, just review the plan.&#8221; Research proves it isn&#8217;t &#8212; but not because planning is the wrong approach. Research proves it because research without context isolation just moves the lock-in one step earlier. The same agent that researched now plans. It cannot escape its own prior conclusions, not because it&#8217;s incapable of abstract reasoning, but because those conclusions are literally sitting in its context, shaping every next token it generates.</p><p>A fresh planning agent, handed only the research artifact as a clean document, can genuinely question whether the research answered the right question. It can push back. It can say &#8220;your research focused heavily on battery chemistry, but I&#8217;m not sure that&#8217;s what the audience needs.&#8221; The same agent that did the research cannot do that. Not really.</p><p>But here is the real kicker: research gives you facts; it doesn&#8217;t tell you if you&#8217;re solving the right problem. Save that thought for a minute, we need one more step.</p><h2>Review After the Plan</h2><p>The <em>structure</em> is still the problem, not the model. And to fix it, we added a new, and in hindsight, pretty obvious step. A <strong>review</strong> phase after implementation.</p><p>This is a dedicated pass where a separate agent &#8212; or the same agent in a separate context &#8212; evaluates the produced artifact against known criteria. Not just &#8220;does this code run,&#8221; but &#8220;does this code do what we intended.&#8221; The distinction from implementation is real and it matters. An implementation agent is building; a review agent is hunting for the thing that will break it.</p><p>What review actually solves is real. The software auth implementation, evaluated by a review agent, surfaces real questions: Is the JWT expiry window set appropriately for the threat model? Is the bcrypt cost factor tuned for this hardware? Are session tokens actually invalidated on logout, not just expired? These are genuine bugs a fresh pass can find. I&#8217;ve seen review agents catch the kind of subtle mistake that a second human reader catches &#8212; not because they&#8217;re smarter, but because they&#8217;re looking for problems rather than building a solution.</p><p>But watch what happens when the review agent evaluates the software auth implementation against the plan.</p><p>The plan said: &#8220;implement JWT-based authentication for the web application.&#8221; The review agent confirms: JWT is implemented correctly. Bcrypt is used. Session management is in place. The implementation passes review. It ships.</p><p>First enterprise customer tries to log in: there is no tenant isolation. Every user in the system shares a single authentication namespace. The review agent found no bugs. The implementation had no bugs. The plan specified the wrong thing. And the review agent couldn&#8217;t catch it &#8212; not because it was careless, but because review only checks &#8220;did we implement the plan correctly.&#8221; Not &#8220;was the plan the right plan.&#8221; Those are different jobs. And you can&#8217;t review your way out of the wrong spec.</p><p>This isn&#8217;t a failure of intelligence. It&#8217;s a consequence of what the review agent is handed. It receives the implementation and the plan. It has no clean access to the original problem statement &#8212; what the user actually needed, what constraints were implicit in the product, whether the product served one customer or a hundred who shared a namespace. It&#8217;s evaluating the gap between the artifact and the plan, not between the artifact and the goal.</p><p>Human code reviewers fail the same way, by the way. Code review finds style violations, off-by-one errors, missing null checks. It rarely questions the architecture decision that was made three sprints ago and embedded in every layer of the codebase. That kind of question requires a different context &#8212; a different meeting, a design review, a fresh set of eyes on the spec rather than the implementation.</p><p>Review catches errors, but only inside the frame you committed to in the planning phase. You can&#8217;t review your way out of the wrong spec.</p><h2>Name the Problem Before Solving It</h2><p>Finally, the structure IS the <em>problem</em>, not the model.</p><p>Research doesn&#8217;t ask it. Planning doesn&#8217;t ask it. Implementation doesn&#8217;t ask it. Review doesn&#8217;t ask it. The question is: <em>what problem are we solving, and how will we know when we&#8217;ve solved it?</em></p><p>That&#8217;s not a rhetorical question. It has specific, concrete answers. And those answers should be a document &#8212; a specification &#8212; produced before the plan is written. Not during planning. Not as a side effect of research. Its own phase. Its own artifact.</p><p>A specification answers four questions, and it only takes one page to do it. What is the exact output we are trying to produce? What are the hard constraints it must satisfy? What does success look like &#8212; specifically, what would we check to confirm it? What does failure look like &#8212; what would make us say this didn&#8217;t work?</p><p>These sound obvious. They are almost never answered before a planning phase begins. In my experience, the reason is that they feel like they slow you down. They don&#8217;t. They prevent six months of work in the wrong direction. Think of it as a single page you could tape to the wall &#8212; the kind you&#8217;d point at during a disagreement about whether the output succeeded or failed.</p><p>Go back to the family move. After a thorough research phase &#8212; neighborhood data, school ratings, crime statistics, cost of living comparisons &#8212; a specification phase asks: what does a successful move look like for your family? The family sits with that question. It turns out they&#8217;ve never explicitly answered it.</p><p>The husband&#8217;s answer: proximity to his aging parents, who live in a specific region of the country. The wife&#8217;s answer: their daughter getting into a specific school district that has strong arts programs. The budget answer: keeping total housing costs under a threshold that lets them maintain their current savings rate. Three explicit success criteria.</p><p>The research phase found no conflicts because it was never told what it was optimizing for. The specification phase surfaces all three criteria before the plan commits to a single city. The planning phase can now do something useful: find cities that satisfy all three criteria &#8212; or, just as importantly, discover that no city satisfies all three and surface that conflict before anyone books a moving truck.</p><p>The software auth case is fast. Specification asks: what does correctly-implemented authentication look like for this product, given who its customers are? The answer: it must support multi-tenant isolation with strict data separation, SSO for enterprise customers, and a free tier with email-only login.</p><p>Now the plan can be written for the actual product. The research phase&#8217;s work on JWT and OAuth is still valid; it just needs to be read through the lens of multi-tenancy, which the specification made explicit.</p><p>The full chain, with its five concrete artifacts, looks like this.</p><p><strong>Research</strong> produces a collection of source materials plus a descriptive state-of-the-art report &#8212; what is known about this problem space. <strong>Specification</strong> produces a success-and-failure criteria document. <strong>Planning</strong> produces a concrete step-by-step plan &#8212; how to get from here to there, given the spec. <strong>Implementation</strong> produces an evaluable artifact &#8212; code, document, report, recommendation. <strong>Review</strong> produces an evaluation report checked against the specification, not just the plan &#8212; a real answer to &#8220;did we solve the problem?&#8221;</p><p>Five documents. Five <em>handoffs</em>. Five chances to catch the wrong frame before it becomes expensive.</p><h2>We Knew it all Along</h2><p>The fun part is all of this was known, at least in principle, since before 1987.</p><p>IDEO, a by-now ultra famous design consultancy, articulated a five-phase creative process that Stanford&#8217;s d.school later codified as <strong>Design Thinking</strong>.</p><p>Tim Brown formalized the diverge-converge logic in <em>Change by Design</em>. The five phases: <strong>Empathize</strong>, <strong>Define</strong>, <strong>Ideate</strong>, <strong>Prototype</strong>, <strong>Test</strong>. If those names sound familiar given what you&#8217;ve just read, that&#8217;s not a coincidence.</p><p>Empathize is research: go wide, gather context, talk to users, understand the problem space from the outside rather than the inside. Define is specification: converge on an explicit problem statement with clear success criteria, the &#8220;how might we&#8221; question that frames everything downstream. Ideate is planning: diverge again, generate candidate solutions, explore the space of possible approaches. Prototype is implementation: produce an evaluable artifact, something you can put in someone&#8217;s hands. Test is review: evaluate the prototype against the problem statement, not just against the prototype&#8217;s internal logic.</p><p>Every phase the agentic world has been bolting on since 2023 was already named, sequenced, and justified in a framework that predates the modern web by a decade and a half. It&#8217;s a framework every Silicon Valley startup, incubater, accelerator, and VS knows in and out. It is taught in bussiness majors all over the world. It&#8217;s literally the structure of most pitch decks. But it is still missing in most agentic protocols we use every single day.</p><p>The piece most clearly missing is the Define phase &#8212; the second one, which IDEO put second for a reason: without a clear problem statement, everything downstream answers the wrong question. It&#8217;s a very old insight the field keeps rediscovering from scratch &#8212; Agile&#8217;s Definition of Done, test-driven development&#8217;s failing-test-first, specification by example. Each was the same insight under a different name.</p><p>Now, here is the strongest version of your initial objection. &#8220;You don&#8217;t need all this structure. A great prompt specifies the audience, the format, the success criteria, the constraints. Write a better prompt and you get all five phases in one go.&#8221;</p><p>Let&#8217;s take this seriously, because it&#8217;s not wrong about prompt quality. A genuinely well-crafted prompt that specifies who the output is for, what format it should take, what it must accomplish, and what would make it fail &#8212; that prompt is effectively a specification. You&#8217;re right that prompt quality matters.</p><p>But a single prompt containing research context, goal specification, a plan, and execution instructions is not a cleaner version of the five-phase process. It&#8217;s five phases collapsed into one context window, with no mechanism for each phase to question the prior one&#8217;s conclusions.</p><p>When research and planning share a context, planning can&#8217;t interrogate research. When planning and implementation share a context, implementation can&#8217;t push back on the plan. When specification and review share a context, review is already biased toward confirming the specification it helped write. Prompt quality is about what you ask.</p><p>Phase independence is about who processes each answer, and whether they can genuinely disagree with the prior step. You can write the world&#8217;s best prompt and still hand it to an agent that will execute it inside the same narrowing tunnel, compounding the same assumptions with every step.</p><p>Picture someone reading their own manuscript for the fifth time &#8212; they no longer see what&#8217;s there, only what they meant to write. It is one of the most replicated findings in human psychology: once we form a belief, we interpret subsequent evidence through the lens of that belief. We notice confirming evidence, discount disconfirming evidence, and generate hypotheses that assume the belief is correct.</p><p>This is not a weakness of intelligence. Every mind &#8212; human or artificial &#8212; interprets through the conclusions it has already drawn. The agent that researched your problem already believes things about it. When it transitions to planning, it plans in service of those beliefs. The agent that planned has a solution in mind. When it implements, it makes countless micro-decisions that serve that solution. The agent that implemented defended choices as it worked. When it reviews, it reads its own output charitably.</p><p>Context isolation breaks this chain. A fresh context hasn&#8217;t seen the prior steps. It cannot be fooled by conclusions it never drew. It reads the artifact cold, which is the only way to genuinely evaluate it.</p><p>Design Thinking&#8217;s diverge-converge logic is not about <em>what</em> each phase does. It&#8217;s about <em>who</em> does it, and whether they can arrive at it without inheriting the prior phase&#8217;s commitments.</p><h2>Start Doing This Yourself Today</h2><p>The artisanal version of this is simpler than it sounds:</p><ul><li><p><strong>Treat each phase as a distinct conversation</strong>. Start a fresh session for each one.</p></li><li><p><strong>Hand it only the artifact from the prior phase</strong> &#8212; not the prior conversation, not your running context, not a summary of what you&#8217;ve been thinking about. The artifact alone.</p></li><li><p><strong>And tell the agent explicitly what mode it&#8217;s in</strong>. &#8220;You are in Research mode. Do not propose a plan. Do not suggest solutions. Your only job is to describe the problem space and produce a research report.&#8221;</p></li></ul><p>That instruction matters. Not because the model needs to be controlled, but because explicit mode assignment prevents the agent from sliding into execution behavior when it senses a gap to fill. Models are trained to be helpful; helpfulness steers every gap toward a solution. Naming the mode is how you resist it.</p><p>The discipline lives in you, not the tool. This works with any agent, any interface. Fresh context, explicit mode, artifact handoff. That&#8217;s the whole recipe.</p><p>If you want to go further, you can make phases structurally enforced rather than just instructed &#8212; agents that literally cannot execute, subagents that receive only the artifact, automated handoffs with no shared context. Programmable harnesses give you this level of control with permission levels per skill. (If you don&#8217;t have one, call me, <a href="https://apiad.github.io/aegis">I&#8217;ll lend you one for free</a>.)</p><p>One small step you can take today: add a specification phase to whatever workflow you already use. Before your planning phase writes a plan, ask for a success-criteria document first. One page. Explicit pass/fail conditions. What would make this output a success? What would make you throw it out? Review that document before planning begins.</p><p>This single addition &#8212; inserting a define phase between research and planning &#8212; catches more failures than adding a review phase after the fact. Because it catches them before the plan commits to the wrong goal.</p><p>What not to do: don&#8217;t implement all five phases as a mechanical checklist. And don&#8217;t add phases as ornamentation &#8212; a research phase that shares a context with planning adds conversation turns, not structure. More words in the same window is not more phases. The phases are context boundaries, not steps in a recipe. A phase that doesn&#8217;t produce a concrete artifact and doesn&#8217;t hand it to a fresh context adds nothing.</p><p>One document per phase. Fresh context per phase. That&#8217;s it.</p><h2>Structure Before You Re-Prompt</h2><p>Picture the artifact chain as a physical thing. A manila folder passed from one desk to the next. The research desk produces a report, closes it, slides it across. The specification desk opens only that folder, reads it, produces a criteria document, closes it, slides it across. The planning desk never opens the research folder &#8212; it opens only the criteria document.</p><p>And so on down the line. Each desk sees exactly one prior document. Each desk produces exactly one new document. The chain is what makes independence possible. You cannot hand off a vague intention. You cannot slide a feeling across a desk. Only a document.</p><p>Context isolation is the move most pipelines skip, and it&#8217;s the one that does the most work. Every phase that shares a context with a prior phase inherits its commitments. Not because the model is lazy or wrong &#8212; because that&#8217;s how cognition works, human or otherwise. We interpret through the lens of what we already concluded.</p><p>Context isolation is cheap: start a new session, pass only the artifact. The cognitive science is unambiguous: breaking the confirmation-bias chain requires a structural break, not a better instruction. Context isolation gets skipped because it looks optional. It isn&#8217;t.</p><p>Remember, the structure is the problem, not the model. Restructure before you re-prompt.</p><p>Until next time, stay curious.</p><div><hr></div><p><em>This is the core argument of the agentic workflows chapter in the second edition of <a href="https://apiad.gumroad.com/l/ai">Mostly Harmless AI</a> &#8212; the full chapter walks the failure cascade with more cases, the artifact design problem (a research report can be thorough and still hand the wrong thing forward), and the context isolation mechanics in depth.</em></p><p><em>This specific article is new<strong> </strong>content, still not in the book, but it will land there shortly. The book is 50% off while it&#8217;s in early access, and also <a href="https://books.apiad.net/books/mhai/">free to read online</a> in a custom reader I built: dark mode, font controls, progress tracking, offline support, the works. </em></p><p><em>If you want the architecture behind these systems &#8212; how they fail, what the harness around them should look like, and what to actually do about it &#8212; that is what the book is for.</em></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://store.apiad.net/l/ai&quot;,&quot;text&quot;:&quot;Get Mostly Harmless AI - 50% off&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://store.apiad.net/l/ai"><span>Get Mostly Harmless AI - 50% off</span></a></p><p></p>]]></content:encoded></item><item><title><![CDATA[AI is doing something weird to Science]]></title><description><![CDATA[In both good and bad ways, and it won't go away.]]></description><link>https://blog.apiad.net/p/ai-is-doing-something-weird-to-science</link><guid isPermaLink="false">https://blog.apiad.net/p/ai-is-doing-something-weird-to-science</guid><dc:creator><![CDATA[Alejandro Piad Morffis]]></dc:creator><pubDate>Mon, 25 May 2026 12:40:28 GMT</pubDate><enclosure url="https://images.unsplash.com/photo-1636466497217-26a8cbeaf0aa?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwxOHx8c2NpZW5jZXxlbnwwfHx8fDE3Nzk1NjgyMzl8MA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://images.unsplash.com/photo-1636466497217-26a8cbeaf0aa?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwxOHx8c2NpZW5jZXxlbnwwfHx8fDE3Nzk1NjgyMzl8MA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://images.unsplash.com/photo-1636466497217-26a8cbeaf0aa?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwxOHx8c2NpZW5jZXxlbnwwfHx8fDE3Nzk1NjgyMzl8MA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080 424w, https://images.unsplash.com/photo-1636466497217-26a8cbeaf0aa?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwxOHx8c2NpZW5jZXxlbnwwfHx8fDE3Nzk1NjgyMzl8MA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080 848w, https://images.unsplash.com/photo-1636466497217-26a8cbeaf0aa?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwxOHx8c2NpZW5jZXxlbnwwfHx8fDE3Nzk1NjgyMzl8MA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080 1272w, https://images.unsplash.com/photo-1636466497217-26a8cbeaf0aa?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwxOHx8c2NpZW5jZXxlbnwwfHx8fDE3Nzk1NjgyMzl8MA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080 1456w" sizes="100vw"><img src="https://images.unsplash.com/photo-1636466497217-26a8cbeaf0aa?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwxOHx8c2NpZW5jZXxlbnwwfHx8fDE3Nzk1NjgyMzl8MA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080" width="4032" height="3024" data-attrs="{&quot;src&quot;:&quot;https://images.unsplash.com/photo-1636466497217-26a8cbeaf0aa?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwxOHx8c2NpZW5jZXxlbnwwfHx8fDE3Nzk1NjgyMzl8MA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:3024,&quot;width&quot;:4032,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;a chalkboard with some writing on it&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="a chalkboard with some writing on it" title="a chalkboard with some writing on it" srcset="https://images.unsplash.com/photo-1636466497217-26a8cbeaf0aa?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwxOHx8c2NpZW5jZXxlbnwwfHx8fDE3Nzk1NjgyMzl8MA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080 424w, https://images.unsplash.com/photo-1636466497217-26a8cbeaf0aa?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwxOHx8c2NpZW5jZXxlbnwwfHx8fDE3Nzk1NjgyMzl8MA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080 848w, https://images.unsplash.com/photo-1636466497217-26a8cbeaf0aa?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwxOHx8c2NpZW5jZXxlbnwwfHx8fDE3Nzk1NjgyMzl8MA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080 1272w, https://images.unsplash.com/photo-1636466497217-26a8cbeaf0aa?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwxOHx8c2NpZW5jZXxlbnwwfHx8fDE3Nzk1NjgyMzl8MA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Photo by <a href="https://unsplash.com/@artturijalli">Artturi Jalli</a> on <a href="https://unsplash.com">Unsplash</a></figcaption></figure></div><p>Picture Donald Knuth. Eighty-eight years old, the man who wrote <em>The Art of Computer Programming</em> by hand in TeX, which he invented himself to be able to write his own books. The father of algorithmic analysis. The most laudeated living computer scientist. A legend, and a well-known AI skeptic. </p><p>Now picture him reading a printed chat log between a fellow colleague and <em>Claude Code</em>. Not skimming. Reading it in detail, because there is something genuinely baffling about it.</p><p>The log belongs to Filip Stappers, a mathematician who ran thirty-one coding explorations with Claude Opus 4.6, systematically probing a class of combinatorial objects Knuth had spent <em>decades</em> thinking about. Exploration 15 surfaced something unexpected: a structural pattern nobody had written down. </p><p>Knuth read it, judged it valid, proved it correct by hand, and wrote a paper about it. He called it &#8220;Claude&#8217;s Cycles.&#8221; Knuth noted, with his characteristic precision, that he&#8217;ll &#8220;have to revise his opinions about generative AI&#8221; one of these days.</p><p>Most accounts that open with this scene take one of two off-ramps.</p><p>The first off-ramp is the <strong>replacement narrative</strong>. AI is now the scientist. The model had the insight; Stappers just ran the prompts; Knuth read the output and judged it true. <em>Discovery has been automated</em>. We are, depending on your temperature, either liberated or obsolete.</p><p>The second off-ramp is the <strong>stochastic-parrot dismissal</strong>. It&#8217;s just a language model predicting tokens. It doesn&#8217;t understand combinatorics; it doesn&#8217;t understand anything. Stappers did the science; Claude shuffled plausible-sounding symbols. Attribute the discovery to the researcher, not the autocomplete.</p><p>Both off-ramps feel satisfying. Both are wrong. And they&#8217;re wrong in the same way: they&#8217;re answering the question &#8220;did AI do the science?&#8221; That&#8217;s the wrong question. The interesting object is not the agent. <em>It&#8217;s the loop.</em></p><p>The loop looks like this: a human poses a question; a model proposes candidates; a verifier filters the candidates; a human curates what survives. Round and round. What Stappers and Claude did is not fundamentally different in shape from what Tao and Lean are doing, or what the GNoME pipeline does in materials science, or what AlphaFold did for protein structure. The shape is the same. <em>The loop does the discovery.</em></p><p>I want to be clear about what that means, because it&#8217;s easy to hear it as either a compliment to AI or a dismissal. It is neither. It&#8217;s an empirical claim about where the causal action lives. Not in the model, not in the human, but in the <em>interaction structure</em> between them. Get that structure right and you get science. Get it wrong and you get confident nonsense at scale. The details of what &#8216;wrong&#8217; looks like are worth walking through carefully, and we&#8217;ll do that a couple of sections down.</p><p>But before we can dismiss either off-ramp, we need to walk four recent cases. Because in every one of them &#8212; Claude&#8217;s Cycles, Tao and Lean, AlphaFold, GNoME &#8212; the replacement narrative is not just philosophically confused. It is empirically wrong. The loop does the discovery.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://blog.apiad.net/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading The Computist Journal! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><h2>Four Cases, One Shape</h2><p>In each of the following cases, the loop does the discovery. I&#8217;m going to give you four of them across four domains, because I think the pattern only becomes undeniable when you see it that many times. Same shape, different materials.</p><h3>Claude&#8217;s Cycles</h3><p>You already have the scene. Stappers runs thirty-one explorations. Not one inspired conversation, thirty-one numbered, documented, methodical probes. Claude proposes. Stappers evaluates. Knuth reads the surviving logs, verifies the mathematical claims by hand, and writes. One human author. One credited co-explorer. The question was Stappers&#8217;; the verification was Knuth&#8217;s; the curation of which thirty-one explorations were worth pursuing was human throughout. The model was the proposer.</p><p>That word matters, and I want to be precise about it. <em>Proposer.</em> Not discoverer, not author, not scientist. The one that generates candidates fast enough that the verifier can find something in the haystack.</p><h3>Tao and Lean</h3><p>Terence Tao, Fields Medal, the current standard of living mathematical genius, has been publicly working through what it looks like to use LLMs for mathematical research. His account in the <em>Notices of the American Mathematical Society</em> in 2025 is careful and specific, and I appreciate the care because it&#8217;s easy to overread these things in both directions.</p><p>Here&#8217;s what you&#8217;re actually looking at: an LLM proposes proof steps, intermediate claims, candidate lemmas, reformulations of the problem. Lean&#8217;s type-checker is the verifier. And Lean&#8217;s type-checker cannot be fooled. It either accepts a proof term or it doesn&#8217;t. There&#8217;s no &#8220;plausible-sounding but wrong&#8221; in Lean. The system rejects garbage instantly and silently. What the human does &#8212; Tao, or anyone in his lab &#8212; is curate: which of the surviving proof steps are worth following, which directions are worth pursuing. The loop is tight. The proposer is creative and unreliable; the verifier is reliable and uncreative; the human operates between them.</p><p>I&#8217;d reframe that: the breakthrough isn&#8217;t that LLMs can now do mathematics. It&#8217;s that the loop can now run fast enough and cheaply enough to be useful.</p><h3>AlphaFold</h3><p>I want to include this one even though it&#8217;s older and better-known, because it&#8217;s the case where people most confidently say &#8220;AI solved the problem.&#8221; And that confidence is telling.</p><p>The problem was fifty years old. How does a protein fold? You have the amino acid sequence; you want the three-dimensional structure; that structure determines function; that function determines everything downstream in biology and drug design. AlphaFold 2 proposes structures. Experimental crystallography verifies them: X-ray diffraction, cryo-electron microscopy, techniques that require real equipment, real samples, real physics. You cannot hallucinate a crystal structure into being; the protein either folds the way the model says or it doesn&#8217;t, and the experimental method tells you which. AlphaFold 3 extended this to molecular complexes: DNA, RNA, small-molecule ligands binding to proteins.</p><p>I think this is worth sitting with: human researchers curated which structures were worth solving, which proteins mattered, which findings were publishable. AlphaFold proposed. Nature verified. Humans curated.</p><h3>GNoME and A-Lab</h3><p>This is the one I find most clarifying, because the verifier here is the most brutally physical. You can&#8217;t argue with a crystal.</p><p>GNoME &#8212; Google DeepMind&#8217;s Graph Networks for Materials Exploration, if you want the full name &#8212; generated 380,000 candidate stable crystal structures. That&#8217;s the proposal stage. Then A-Lab, UC Berkeley&#8217;s autonomous laboratory, took 58 of those candidates and actually tried to synthesize them. Robotically. With real chemicals, real furnaces, real diffraction equipment to check what came out. Forty-one novel materials were successfully synthesized in 17 days.</p><p>Think about the verifier in that loop. Not a type-checker, not a scoring function. A physical robot in a real lab mixing real compounds at real temperatures and asking whether the crystal forms. You cannot fake a crystal. If A-Lab says it synthesized a novel stable material, it synthesized a novel stable material. The model proposed; the physical world verified; human researchers curated which structures were interesting enough to attempt.</p><p>I want to be blunt here: the pattern is not subtle. In every case: model proposes, independent verifier filters, human curates. The model is never the verifier. The model is never the question-poser. The model occupies exactly one slot in the loop, and it&#8217;s the slot that benefits most from creativity, speed, and a high tolerance for being wrong.</p><p>The replacement narrative requires you to believe the model is doing all three jobs. It isn&#8217;t. Empirically, demonstrably, across four domains, it isn&#8217;t.</p><p>Before we anatomize what changed inside the loop, it is worth noting that this shape is fifty years old. Because that changes what we should find surprising.</p><h2>The Loop Has Is Older Than You Think</h2><p>The loop does the discovery today in the same way it did the discovery in 1976.</p><p>I want to be somewhat insistent about this, because I think the historical amnesia around computational assistance in science is part of what makes AI discourse so unmoored. This isn&#8217;t new. What&#8217;s new is narrower than you think, and understanding exactly what changed is the only way to correctly evaluate what it means.</p><h3>Appel and Haken, 1976</h3><p>You probably know this one: the four-color theorem says any map can be colored with four colors such that no adjacent regions share a color. Mathematicians had been trying to prove it since 1852. In 1976, Kenneth Appel and Wolfgang Haken proved it. Using a computer. Their proof involved reducing the problem to 1,482 configurations and verifying each one computationally. No human read every step. No human could. The proof was real; the verification was mechanical.</p><p>The mathematical community was genuinely unsettled. The theorem wasn&#8217;t wrong. But the proof didn&#8217;t fit the usual epistemological frame. You couldn&#8217;t follow it the way you follow a conventional proof. You had to trust the computer. That discomfort was the first serious confrontation with what I&#8217;m calling the loop: humans posed the problem, humans designed the reduction, a computer verified the 1,482 cases, humans accepted the surviving result.</p><p>Same shape. Fifty years earlier.</p><h3>Hales and Flyspeck, 1998&#8211;2014</h3><p>Thomas Hales proved Kepler&#8217;s sphere-packing conjecture in 1998. The conjecture: the way you&#8217;d intuitively stack cannonballs, face-centered cubic packing, is in fact the densest possible arrangement. The proof relied on computer enumeration so extensive that the referees couldn&#8217;t verify it. They were &#8220;99% confident&#8221; and said so, which is an unusual thing for mathematical referees to say.</p><p>Sixteen years later &#8212; <em>sixteen years</em> &#8212; the Flyspeck project completed a formal verification of the proof in HOL Light, a proof assistant. I want to dwell on that: sixteen years. The loop had a very slow verifier in the middle. It still worked. The result was real.</p><p>Sixteen years from claim to verified closure. I find that humbling. We talk about AI accelerating science as though &#8220;fast&#8221; is a new property of loops. The loop has always done the discovery. Sometimes slowly.</p><h3>AI Feynman, 2020</h3><p>You won&#8217;t usually see this one in the AI-in-science timelines, but it belongs here. Silviu-Marian Udrescu and Max Tegmark built a system called AI Feynman that uses symbolic regression (searching the space of mathematical expressions) to recover physics equations from data. Feed it measured relationships between physical quantities; it proposes the equation. Tested on 100 equations from the Feynman Lectures on Physics. Human scientists posed the problems; the system proposed expressions; formal mathematical checks filtered them.</p><p>This is 2020, two years before the moment people usually date as the AI-in-science inflection. I find it clarifying: the loop is the same.</p><h3>What 2022 Actually Changed</h3><p>Here&#8217;s what changed. One thing in the loop changed, and it&#8217;s the proposer slot.</p><p>Before roughly 2022, the proposer in loops like these was domain-specific, narrow, and hand-engineered. AlphaFold&#8217;s architecture was designed from the ground up for protein structure prediction. The Flyspeck enumeration was written for Kepler&#8217;s problem. AI Feynman&#8217;s symbolic regression engine was built for recovering physics equations. The verifiers were already strong: formal proof checkers, physical experiments, crystallography. The curators were already human. But building a proposer required significant domain-specific engineering effort for each new application.</p><p>What changed, and I think this is the key move, is that <em>the proposer slot is now increasingly occupiable by general-purpose large language models that can be directed with natural-language specifications</em>. </p><p>The same model that helps Stappers explore combinatorial objects can, with different prompts, propose protein structures, generate proof steps, suggest material candidates. It&#8217;s not that the model does these things well in some absolute sense. It does them well enough that a strong verifier can find the real results in the output. And FunSearch, DeepMind&#8217;s system for mathematical discovery, goes a step further: it uses LLMs to generate the search strategy itself. AlphaEvolve extends this to evolving the algorithms. The proposer writes the proposer.</p><p>I want to flag what didn&#8217;t change: verifier reliability, human curation, question-posing. Knuth still read the logs. Tao still decides which lemmas are worth pursuing. The IMO committee still wrote the problems. The physical world still decides whether a crystal forms.</p><p>So: what exactly is the loop, and what changed inside it? The proposer and verifier are different jobs &#8212; you can confuse them, and that confusion is the mistake that both off-ramps make.</p><h2>Anatomy of the Loop</h2><p>The loop does the discovery. But what is the loop, exactly? It has four roles: <em>poser</em>, <em>proposer</em>, <em>verifier</em>, <em>curator</em>. They are not interchangeable.</p><p>I want to be specific about each, because vagueness here is where confused takes come from. When someone says &#8220;AI did the science,&#8221; they&#8217;re usually collapsing all four roles into one and attributing them to the model. When someone says &#8220;it&#8217;s just a tool,&#8221; they&#8217;re usually denying that the proposer role is meaningfully distinct from, say, a search engine. Both collapses are wrong.</p><h3>The Proposer Role</h3><p>I want to define this carefully, because the proposer role is where most confusion lives. The proposer generates candidates. That&#8217;s it. It doesn&#8217;t need to know which candidates are correct; it doesn&#8217;t need to understand the domain at the level required to verify; it doesn&#8217;t need to bear accountability for wrong answers. It just needs to produce output that the verifier can evaluate. Fast. In large quantities. With enough breadth that interesting things appear in the distribution.</p><p>This is a real and important job. Generating candidates well, in a domain, at the right level of specificity, with the kind of variety that makes the verifier&#8217;s job tractable, is genuinely hard. What LLMs are good at is producing plausible candidates in natural-language-specified domains. That&#8217;s useful precisely because &#8220;plausible&#8221; is a different bar from &#8220;correct.&#8221; The proposer buys lottery tickets; the verifier checks which ones won.</p><h3>Galactica, or: What Happens Without a Verifier</h3><p>In November 2022, Meta released Galactica, a large language model trained on scientific literature. I remember when this dropped: the pitch was that it could reason about science, write papers, explain concepts, generate hypotheses. It was fluent. Confident. It had read more papers than any human alive.</p><p>It was retracted in three days.</p><p>Galactica produced plausible-sounding but fabricated citations. It generated chemistry that was wrong. It stated incorrect facts with the same calm certainty it used for correct ones. The model had no verifier. It had no external check against which its proposals were filtered. It was a proposer talking directly to readers who were treating it as a verifier.</p><p>That&#8217;s the failure mode. &#8220;LLMs are bad at science&#8221; is the wrong lesson. Galactica was a capable proposer deployed without a loop. Dump the proposer&#8217;s output directly into the world, skip the verifier, and you get confident nonsense. The problem wasn&#8217;t the model. The problem was the missing loop.</p><h3>The Verifier Role</h3><p>I want to be careful here, because this is where the asymmetry lives. The verifier is not creative. The verifier must be <em>right</em>. These are the two properties that matter.</p><p>Lean&#8217;s type-checker does not hallucinate. It processes a proof term and either accepts it or rejects it, and its answer is correct by construction: it&#8217;s checking against the formal rules of type theory, and it doesn&#8217;t have opinions or moods or bad days. A crystal either forms or it doesn&#8217;t. X-ray diffraction either confirms the predicted structure or it doesn&#8217;t. These verifiers are not AI systems. They&#8217;re physics, or mathematics, or formal logic.</p><p>Here&#8217;s a slogan worth keeping: <em>the verifier is the one that matters.</em> A loop with a weak proposer and a strong verifier still produces valid science &#8212; it&#8217;s just slow, because it needs more proposals before one survives &#8212; this is just what &#8220;normal science has been so far&#8221;, slow proposer, strong verifier. </p><p>A loop with a strong proposer and a weak verifier produces Galactica. The asymmetry is important. You can have crap proposals and still win, as long as the verifier is robust. You cannot have a weak verifier and still win, no matter how impressive the proposer.</p><h3>Tao&#8217;s Insight on Composability</h3><p>Tao articulated this in his 2025 <em>Notices</em> piece, and I think it&#8217;s genuinely the right way to think about it: the value of the LLM-plus-formal-verifier combination comes from combining complementary weaknesses. LLMs are creative but unreliable. Lean is reliable but cannot be creative; it needs to be told what to check. Individually, each is limited. Together, the loop covers both limitations: the LLM proposes, Lean certifies, and you get results that are both novel and guaranteed correct.</p><p>This is not a new insight in the philosophy of science or even in computational science. It&#8217;s the same structure as Appel-Haken: creative human mathematicians posed and structured the problem, mechanical verification checked the cases. What&#8217;s new is that the LLM makes the creative-but-unreliable proposer slot much cheaper and more general to fill. You don&#8217;t need a domain expert to hand-engineer each proposer; you need a prompt and a general-purpose model.</p><h3>What&#8217;s Genuinely New Since 2022</h3><p>Let me be specific, because this matters for evaluating claims. Three things changed:</p><p>First, <strong>open-ended program synthesis</strong>. The proposer can now write code, not just fill templates. FunSearch and AlphaEvolve don&#8217;t just suggest candidate solutions; they generate the search strategy itself, which then searches for solutions. The proposer proposes how to propose. That&#8217;s a qualitative shift.</p><p>Second, <strong>cross-domain transfer at useful fidelity</strong>. The same general-purpose model can propose protein structure candidates, proof steps, crystal candidates, combinatorial patterns, all with natural-language specification. You don&#8217;t need to rebuild the proposer for each domain. The moat that used to come from hand-engineering a domain-specific proposer is largely gone.</p><p>Third, <strong>tight LLM-to-formal-verifier loop latency</strong>. The Flyspeck project took sixteen years. The Lean loop that Tao describes closes in seconds. When the loop runs fast, you can iterate fast, which means you can attempt more ambitious problems and accumulate signal faster about which directions are productive.</p><p>What hasn&#8217;t changed: humans pose the problems; the verifier is still not an LLM; which questions are worth asking remains entirely human. Knuth read the logs. Tao decides which lemma is promising. The IMO committee wrote the problems. The GNoME team decided that stable crystal structures were the target. The question of <em>what to look for</em> has not been automated.</p><p>With the anatomy clear, we can finally steelman both off-ramps seriously. And explain precisely why they are both wrong.</p><h2>Why Both Extremes Are Wrong</h2><p>The loop does the discovery, but what does that mean for who gets the credit?</p><p>I want to take both positions seriously, because I think each is tracking something real. Dismissing either one without engagement is how you end up with a position that sounds crisp but falls apart in the edge cases.</p><h3>The Maximalist Steelman</h3><p>Knuth is eighty-eight years old. He has spent decades thinking about the class of combinatorial objects Stappers was exploring. He hadn&#8217;t found the pattern that surfaced at exploration 15. Stappers, presumably, hadn&#8217;t either, or there wouldn&#8217;t have been thirty-one explorations looking for it.</p><p>Here&#8217;s the maximalist claim: if <strong>surprise</strong> is constitutive of discovery, if discovery is the moment something genuinely unexpected becomes known, then the model contributed something essential. It&#8217;s not that the model was incidentally involved in a process humans could have completed. It&#8217;s that the specific result didn&#8217;t exist in any human mind before the model generated it. A system that reliably surfaces what experts miss is doing something scientists do. The fact that the model can&#8217;t feel pride or bear responsibility doesn&#8217;t settle whether it participated in discovery.</p><p>The maximalist is pointing at something real. The model wasn&#8217;t executing a search strategy that a human had pre-specified. It was navigating a space in ways that produced genuinely unexpected output. That matters.</p><h3>The Dismissive Steelman</h3><p>A hammer drives nails. We don&#8217;t say the hammer built the house. Excel executes arithmetic the analyst specified; we don&#8217;t credit Excel with the financial model. AlphaFold was trained on structural biology data that human researchers collected over decades; the training objective was designed by human engineers; the evaluation criteria were set by human scientists. The model is a sophisticated tool. Sophisticated tools don&#8217;t discover; they execute.</p><p>The dismissive case is not purely rhetorical. There&#8217;s a serious accountability point underneath it: when a model-generated finding turns out to be wrong (and some will), who is responsible? If you&#8217;ve attributed agency to the model, you&#8217;ve obscured the accountability chain. The human who deployed the loop without a strong verifier, who curated poorly, who published without checking: that human is responsible. Diffusing credit into the model diffuses accountability too, and that&#8217;s a practical problem, not just a philosophical one.</p><p>The dismissive steelman is also tracking something real. The model didn&#8217;t choose the question. It didn&#8217;t decide the result mattered. It didn&#8217;t design the verifier. It didn&#8217;t stake its reputation on the finding.</p><h3>The Verdict</h3><p>Both steelmans are right about something; both are wrong about the structure.</p><p>The maximalist is right that the model contributes something essential, something that wouldn&#8217;t have been there without it. Wrong that this constitutes independent discovery: you cannot have independent discovery without the ability to pose a question, which requires caring about the answer, which requires the kind of intentionality the model doesn&#8217;t have.</p><p>The dismissive is right that the model doesn&#8217;t choose questions, doesn&#8217;t bear accountability, and shouldn&#8217;t be treated as an author in the full sense. Wrong that this makes it a mere tool in the hammer-and-nail sense. AlphaTensor, DeepMind&#8217;s system that found faster matrix multiplication algorithms, didn&#8217;t just execute a search the engineers specified. It found an algorithm that reconfigured what experts believed was achievable. That&#8217;s not executing; it&#8217;s navigating a combinatorial space in a way that produces genuine surprise. The hammer never surprises you &#8212; except when you&#8217;re not looking and it hits your finger, which by the way, AI can also do, and with a louder bang.</p><p>The right frame is, I think, the <strong>AI lab member</strong>. Indispensable, capable, sometimes surprising. Not a hammer; not a principal investigator, not a replacement for another human. Just a genuinely, qualitatively new kind of entity. An entity that occupies the proposer slot in the loop and does it better than anything that occupied that slot before, but doesn&#8217;t touch the verifier, doesn&#8217;t pose the questions, and doesn&#8217;t bear the accountability that comes with authorship.</p><p>If AI is a lab member and not a scientist, what does that do to the publishable paper, and to the metrics we use to measure science?</p><h2>But What Happens to the Paper </h2><p>The loop does the discovery. But the paper still needs to be written, and the paper is what the institution counts.</p><p>This is where I get more pessimistic, or at least more cautious. The loop is good news for science-the-activity. It&#8217;s more complicated news for science-the-institution.</p><h3>Discovery and Paper Count Were Already Decoupled</h3><p>Park, Leahey, and Funk published in <em>Nature</em> in 2023 a careful empirical study of scientific disruption, measuring across decades of papers and patents how often new work <em>displaces</em> the prior literature versus <em>consolidates</em> it. Their finding: disruption has been declining since 1945. Not linearly, not dramatically, but consistently. Meanwhile, paper count has been exploding.</p><p>The interpretation I find most compelling: the processes that generate papers and the processes that generate genuine advances were already decoupled before AI. A lot of papers are small increments, confirmations, replications, applications of known methods to new datasets. That&#8217;s not waste; science needs that infrastructure. But it means paper count was already a noisy proxy for discovery rate.</p><p>What an AI proposer does to this: it makes generating candidate findings cheaper. Cheaper to generate means more candidates, means more papers. Discovery and paper count decouple further.</p><h3>Goodhart as Accelerant</h3><p>Goodhart&#8217;s Law: when a measure becomes a target, it ceases to be a good measure. Academic institutions have been targeting paper count, citation count, journal impact for decades. Paper mills, factories producing fake or low-quality research for pay-to-publish journals, were already a pre-AI problem. Ioannidis documented the replication crisis in 2005 and again in 2018; most published findings in some fields don&#8217;t replicate, and that was before AI made generating plausible-sounding results cheaper.</p><p>The AI proposer is an accelerant deployed into a system already optimizing for the wrong thing. It makes the Goodhart problem worse. A lot worse. If you can generate a thousand candidate papers in the time it used to take to generate ten, and the verifier in your loop is peer review (slow, inconsistent, and famously gameable), then you have a problem that is structurally different from the pre-AI problem in scale.</p><p>I want to be clear: this is not an argument against AI-in-science. It&#8217;s an argument that the institution of science has a verifier problem, and AI proposers make that verifier problem more urgent. The fix is not to slow the proposer. The fix is to build better verifiers.</p><h3>The Optimistic Edge</h3><p>Here&#8217;s where I land, and I&#8217;m genuinely somewhat optimistic about this: if generating candidates is cheap, the scarce skill shifts upstream.</p><p>The researchers who will matter most in this environment are not the ones who can generate the most proposals. They&#8217;re the ones who can pose the right questions, who can identify which problems are worth solving before they know the answer, and the ones who can build strong verifiers. Question-selection and verifier-design become the competitive moat. That&#8217;s a real and important skill. It&#8217;s harder to fake than generating a paper. And it&#8217;s the skill that the loop most depends on.</p><p>If I&#8217;m hiring for a lab in 2026, I&#8217;m not looking for &#8220;AI scientist&#8221; as a job description. I&#8217;m looking for people who can look at a domain and say: <em>here is what a correct answer would have to look like, and here is how I would know if I found it.</em> That&#8217;s what the verifier is. Building it is hard. It requires deep domain knowledge, epistemological clarity, and the kind of judgment that comes from years of thinking carefully about what you&#8217;re actually trying to know.</p><h3>A Diagnostic Heuristic</h3><p>Let me give you something practical. When you read an AI-in-science result, a press release, a paper, a breathless tweet thread, ask this: <strong>what was the verifier, and who built it?</strong></p><p>If the verifier is Lean, or a crystal, or experimental replication with pre-registered protocols, or a physical measurement with known error bars: trust the result. The proposer&#8217;s reliability doesn&#8217;t matter much; the verifier caught the garbage. The finding is real regardless of how the candidates were generated.</p><p>If the verifier is not named, if the paper says &#8220;we used GPT-4 to generate hypotheses and evaluated them with GPT-4 to assess plausibility,&#8221; you&#8217;re looking at Galactica with extra steps. The proposer and the verifier are the same system. The loop is not closed. Be skeptical.</p><p>That diagnostic question is the institutional choice this moment is forcing. What was the verifier, and who built it? Let&#8217;s land it.</p><h2>The Verifier Is the One That Matters</h2><p>&#8220;I&#8217;ll have to revise my opinions about generative AI one of these days.&#8221;</p><p>Knuth said that. Not &#8220;AI did the science,&#8221; not &#8220;it&#8217;s just autocomplete.&#8221; Something more specific: <em>I may have underestimated what this thing can do in the right loop.</em> The loop did the discovery. The human owned the question and the verifier. The model was the proposer. And the result was real.</p><h3>The Investment Gap</h3><p>Here&#8217;s the uncomfortable institutional reality: almost all current investment is in the proposer. Larger models, more parameters, cheaper inference, better fine-tuning, faster generation. The race to make the proposer better is well-funded, well-publicized, and moving fast.</p><p>The verifier is comparatively neglected.</p><p>Formal verification tools like Lean exist, but they&#8217;re hard to use, require significant expertise, and don&#8217;t cover most scientific domains. Physical verification (A-Lab-style robotic synthesis) is expensive and slow relative to the speed at which the proposer can generate candidates. Experimental replication is underfunded as a scientific activity; it&#8217;s less prestigious than novel claims. The referee system in academic publishing was designed for a world where <em>generating</em> candidates was the hard part. It was not designed for a world where a model can generate ten thousand plausible candidates in an afternoon.</p><p>Hiring &#8220;AI scientists&#8221; misframes the institutional need. The need is for researchers who can pose hard questions and build reliable verifiers. The &#8220;AI lab member&#8221; frame points to what needs managing: not the model, but the loop. And the loop&#8217;s bottleneck right now is the verifier.</p><p>I should mention that this piece is drawn from the science chapter of <em><a href="https://store.apiad.net/l/ai">Mostly Harmless AI</a></em>, a book I&#8217;m writing about what AI actually does versus what the headlines claim. If you&#8217;re finding this useful, the book is where the longer argument lives.</p><h3>What Comes Next</h3><p>Next week: AI in creative output. Art, literature, music. Where the verifier question changes shape entirely. Because in mathematics and materials science, you can at least define what &#8220;correct&#8221; means. In creative domains, the verifier problem is not just harder. It&#8217;s constitutively different. What does it even mean for a creative proposal to be <em>correct</em>? I have some thoughts, and I think they&#8217;re going to make the science case look straightforward by comparison.</p><p>Stay curious.</p><div><hr></div><p><em>P.S. There&#8217;s a Subscribe button somewhere below this. I&#8217;m told it does something useful. My understanding of subscription mechanics is below the level of a confident stochastic parrot. I believe it works, but I haven&#8217;t checked every step. Click it anyway. The verifier here is whether you keep showing up next Monday.</em></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://blog.apiad.net/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://blog.apiad.net/subscribe?"><span>Subscribe now</span></a></p><p></p>]]></content:encoded></item><item><title><![CDATA[The 80% AI Reliability Horizon]]></title><description><![CDATA[The real number every AI engineer should be tracking]]></description><link>https://blog.apiad.net/p/the-80-ai-reliability-horizon</link><guid isPermaLink="false">https://blog.apiad.net/p/the-80-ai-reliability-horizon</guid><dc:creator><![CDATA[Alejandro Piad Morffis]]></dc:creator><pubDate>Thu, 21 May 2026 14:28:29 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!jhaA!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F791d9975-e296-4f78-a682-e797785b86e8_1376x768.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!jhaA!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F791d9975-e296-4f78-a682-e797785b86e8_1376x768.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!jhaA!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F791d9975-e296-4f78-a682-e797785b86e8_1376x768.jpeg 424w, https://substackcdn.com/image/fetch/$s_!jhaA!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F791d9975-e296-4f78-a682-e797785b86e8_1376x768.jpeg 848w, https://substackcdn.com/image/fetch/$s_!jhaA!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F791d9975-e296-4f78-a682-e797785b86e8_1376x768.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!jhaA!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F791d9975-e296-4f78-a682-e797785b86e8_1376x768.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!jhaA!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F791d9975-e296-4f78-a682-e797785b86e8_1376x768.jpeg" width="1376" height="768" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/791d9975-e296-4f78-a682-e797785b86e8_1376x768.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:768,&quot;width&quot;:1376,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:674811,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://blog.apiad.net/i/198700078?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F791d9975-e296-4f78-a682-e797785b86e8_1376x768.jpeg&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!jhaA!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F791d9975-e296-4f78-a682-e797785b86e8_1376x768.jpeg 424w, https://substackcdn.com/image/fetch/$s_!jhaA!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F791d9975-e296-4f78-a682-e797785b86e8_1376x768.jpeg 848w, https://substackcdn.com/image/fetch/$s_!jhaA!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F791d9975-e296-4f78-a682-e797785b86e8_1376x768.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!jhaA!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F791d9975-e296-4f78-a682-e797785b86e8_1376x768.jpeg 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"><em>Adapted from Friedrich, <a href="https://en.wikipedia.org/wiki/Wanderer_above_the_Sea_of_Fog">&#8220;Wanderer above the Sea of Fog&#8221;</a> (c. 1818), Kunsthalle Hamburg &#8212; the horizon you can see is not the horizon you get to stand on. Rendered with Nano Banana 3 via <a href="https://github.com/apiad/mosaico">mosaico</a>.</em></figcaption></figure></div><blockquote><p><em>Every post on the blog this month is on the theme of agent reliability, anchored on the second edition of <a href="https://apiad.gumroad.com/l/ai">Mostly Harmless AI</a> &#8212; 50% off during early access &#8212; where the Limitations chapter walks all seven failure modes that compound into the curve below. You can also <a href="https://books.apiad.net/books/mhai/">read the whole book online for free</a>. More at the end.</em></p></blockquote><p>The headline number you&#8217;ve seen on every AI-progress chart &#8212; &#8220;model X completes two-hour tasks half the time&#8221; &#8212; is the <strong>50% reliability horizon</strong>. That number is moving fast. Doubling every seven months, per METR&#8217;s time-horizon work. It&#8217;s the curve on every AI-progress chart, the one conference talks lean on, the one that lands in investor decks.</p><p>The number that decides whether you can actually deploy an agent is a different one.</p><p>The <strong>80% reliability horizon</strong> &#8212; the task length at which an agent finishes well enough that you would not feel the need to check &#8212; sits 70&#8211;80% below the 50% figure, and it moves up far more slowly. That gap is the difference between demo and deploy. The 50% is what passes the eval. The 80% is what survives the afternoon you weren&#8217;t watching. Not two hours. Thirty minutes you&#8217;d hand off.</p><p>I want to be precise about what I&#8217;m not arguing. I&#8217;m not arguing agents are broken or that AI progress isn&#8217;t real. It is real, and it&#8217;s fast. I&#8217;m making the narrower claim that the 50% and 80% horizons move at different speeds, and that the 80% is the one that matters when someone else&#8217;s data is on the line. This post is the math behind the gap, why it&#8217;s structural, and what you can do about it.</p><p>If you&#8217;re building on agents, you are building on the 80% horizon. The 50% number is for the marketing deck.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://blog.apiad.net/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://blog.apiad.net/subscribe?"><span>Subscribe now</span></a></p><h2>Two horizons, two stories</h2><p>METR&#8217;s methodology is clean: take a population of tasks with measured human completion times, then find the longest task a given model clears at success rate X. Do that for X = 50%, and you get the 50% horizon. Do it for X = 80%, and you get a different curve. A different story.</p><p>The 50% horizon has been doubling roughly every seven months. Late 2025, it sat around a couple of hours for software tasks. That&#8217;s the curve that makes headlines. That&#8217;s the curve you&#8217;ve seen on every slide. Striking.</p><p>The 80% horizon sits roughly 70&#8211;80% below. The same agent that clears a two-hour task half the time clears a half-hour task four-times-in-five. Not two hours. Thirty minutes. And that gap doesn&#8217;t close at the same rate. It moves slowly, stubbornly, for reasons that are mathematical before they are engineering.</p><p>So you have two curves growing at different speeds. The 50% horizon is the curve of capability: what can this system do, under ideal conditions, at least sometimes. The 80% horizon is the curve of trust: what can this system do reliably enough that you&#8217;d hand it a production key and walk away.</p><p>They are not the same curve. And they do not close the same way.</p><p>The longer your task horizon, the wider the gap between can-do-it-sometimes and can-be-trusted-with-it. The mechanical reason is one piece of math.</p><h2>Probability arithmetic</h2><p>Here&#8217;s the setup. A language model you&#8217;re calling spends a fixed compute budget per output token. Each step in a multi-step process has some per-step success probability <em>p</em> that is strictly less than one. The model is stochastic, the world is noisy, context degrades.</p><p>String <em>n</em> steps together, and the probability that all of them succeed is roughly <em>p<sup>n</sup></em>. That&#8217;s it. That&#8217;s the math.</p><p>Here&#8217;s what that looks like with actual numbers. Suppose your agent is excellent: <em>p</em> = 0.99 per step. That&#8217;s a 99% success rate on any single action. Compound it over 100 steps: 0.99<sup>100</sup> &#8776; 0.37. You&#8217;ve gone from near-certain to worse-than-a-coin-flip without anyone making an error. Now drop to <em>p</em> = 0.95 (still quite good, still 95% per step). Over 100 steps: 0.95<sup>100</sup> &#8776; 0.006. Six in a thousand runs succeed.</p><p>This is not a gap you close with next year&#8217;s training run. It is the shape of any probabilistic process operating in sequence over time. The curve doesn&#8217;t change when you improve <em>p</em>; it just shifts outward.</p><p>Reasoning models &#8212; the o-series, R1, extended-thinking variants &#8212; are valuable here. They buy you a higher per-step <em>p</em>, and they let you spend more steps at that higher rate. Both matter. But they push the curve outward. They do not change its shape.</p><p>Two pieces of evidence you should hold next to that math. <em>GSM-Symbolic</em> (Apple, 2024): perturb a math problem the model has seen (swap a name, change a number) and accuracy collapses even when the reasoning path is identical. The model has memorized the route, not the reasoning. <em>Faith and Fate</em> (Microsoft, 2023): transformer accuracy degrades with computational-graph depth even when each individual sub-step is solvable in isolation. Depth itself is the failure axis. More steps means more surface for <em>p</em> &lt; 1 to accumulate.</p><p>Reasoning models buy you a higher per-step <em>p</em> and more steps to spend. They do not change the shape of the curve.</p><h2>Where the chain gets long</h2><p>Agents are exactly the setup that makes <em>p<sup>n</sup></em> painful.</p><p>Think through a typical agent run: read prompt, plan, call tool, read result, call tool again, critique output, adjust plan, call final tool, write response. Seven steps if you&#8217;re being generous. A real production agent reaches hundreds. Each step is one more <em>p</em> rolled. Each tool call is one more chance the orchestrator hands the tool the wrong arguments &#8212; garbage in, deduction out.</p><p>Self-critique doesn&#8217;t repair this &#8212; and you can verify the result yourself if you&#8217;ve tried it. Huang and colleagues (2024) showed that intrinsic self-correction without an external oracle signal actually degrades performance. The model talks itself out of correct answers as often as it talks itself in. The paradox is clean: if the model could recognize the error, it would not have made it. Asking it to introspect on failures is asking the broken compass to check itself.</p><p>So let&#8217;s put numbers on a real scenario. An agent that succeeds on each of five steps 95% of the time lands at 0.95<sup>5</sup> &#8776; 0.77. Decent. Not great, but workable. Now extend that same agent to a fifty-step trajectory: 0.95<sup>50</sup> &#8776; 0.08. Eight runs out of a hundred finish correctly.</p><p>The demo ran five steps. The deploy runs fifty. The demo and the deploy are two different machines.</p><p>That&#8217;s the 80% horizon you&#8217;ll actually feel in production. It&#8217;s not a philosophical concern about AI reliability in the abstract. It&#8217;s the arithmetic of what happens when you take a stochastic generator and ask it to maintain a chain of reasoning over a long enough trajectory that <em>p<sup>n</sup></em> has time to do its work.</p><h2>What you can actually do</h2><p>Three mitigations. Each one genuine, and each one with a ceiling you should know before you commit.</p><p><strong>Verifier-shaped tasks.</strong> Where the output can be checked deterministically (arithmetic, code that compiles and runs, SQL that parses, a formal proof), you can recover trust that the probabilistic generator alone cannot provide. A SAT solver beats an LLM on deductive closure every time. The architecture that wins here is LLM-proposes-candidate, deterministic-system-signs-off. The generator explores the space; the verifier approves the exit. This is, incidentally, the same pattern Monday&#8217;s post on the seventy-year argument named: a deterministic shell around a stochastic core, applied at the task level rather than the system level. The twist is that not every task has a fast verifier. Code that runs is checkable; code that runs <em>correctly for all future inputs</em> is not.</p><p><strong>Retrieval-augmented generation.</strong> If the fact your agent needs is no longer arbitrary recall but lives in a curated document the model is required to cite, then Kalai and Vempala&#8217;s 2024 lower bound on calibrated hallucination does not apply to that fact. Most agent failures upstream of a tool call are recall failures the agent doesn&#8217;t know it&#8217;s making; retrieval changes the error mode from confident confabulation to visible gap. RAG turns a free-running generator into a paraphrase-and-summarize system over a known corpus. The reach of the system is now bounded by the reach of the index. But anything outside that index is back to pure <em>p</em> &lt; 1 territory.</p><p><strong>Narrow the horizon.</strong> The cheapest move is the one nobody wants to make: don&#8217;t deploy your agent on a fifty-step trajectory. Cut it to five. Hand off to a human at the boundary. At five steps with <em>p</em> = 0.95 you&#8217;re at 0.77; at fifty steps you&#8217;re at 0.08. That&#8217;s not a small difference. That&#8217;s the difference between a tool that works and a demo that occasionally works. Now, this trades <em>autonomy</em> for <em>reliability</em>. That trade is worth making in most production contexts right now. Whether it&#8217;s worth making in your context is a product question, not a research question.</p><h2>Watching the right number</h2><p>The 50% number will keep doubling and you should track it. That is real progress and worth watching closely.</p><p>But it is not the number your users feel. The number your users feel is whether the agent finished their task well enough that they didn&#8217;t have to re-run it, check its work, or clean up after it. The difference between &#8220;I tried that AI agent thing and it was magic&#8221; and &#8220;I tried that AI agent thing and it broke my Friday&#8221; is roughly the distance between the 50% horizon and the 80% horizon at your task length.</p><p>The shape of the next several years of agent engineering is already visible in the mitigations you&#8217;ll be reaching for: deterministic verifiers around stochastic generators, retrieval around recall, short trajectories with human handoffs where the math demands it. Not because agents are weak. They&#8217;re remarkable. But the <em>p<sup>n</sup></em> curve doesn&#8217;t care about benchmark scores. It cares about chain length.</p><p>One number, slowly creeping upward, every quarter. Watch that one.</p><p>Until next time, <strong>stay curious</strong>.</p><div><hr></div><p><em>If the 80%-horizon framing landed, the second edition of <a href="https://apiad.gumroad.com/l/ai">Mostly Harmless AI</a> walks the seven failure modes that produce the curve &#8212; the calibrated-hallucination lower bound, the U-shaped attention curve, the reversal curse, the depth ceiling on deduction, the rest. <strong>50% off during early access.</strong> You can also <a href="https://books.apiad.net/books/mhai/">read the whole thing online for free</a> in a custom reader I built &#8212; dark mode, font controls, offline support, the works.</em></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://store.apiad.net/l/mhai&quot;,&quot;text&quot;:&quot;Get Mostly Harmless AI - 50% off&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://store.apiad.net/l/mhai"><span>Get Mostly Harmless AI - 50% off</span></a></p><p><em>And if you want everything I&#8217;ve written, plus everything I&#8217;m going to write, that&#8217;s <a href="https://apiad.gumroad.com/l/compendium">the Compendium</a>. One purchase, in perpetuity.</em></p>]]></content:encoded></item><item><title><![CDATA[It's Tokens all the Way Down]]></title><description><![CDATA[How language models understand image, audio, and video.]]></description><link>https://blog.apiad.net/p/its-tokens-all-the-way-down</link><guid isPermaLink="false">https://blog.apiad.net/p/its-tokens-all-the-way-down</guid><dc:creator><![CDATA[Alejandro Piad Morffis]]></dc:creator><pubDate>Tue, 19 May 2026 10:26:38 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!Gnl8!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F213e8caa-6892-4379-a3da-13ce32828faa_1376x768.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Gnl8!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F213e8caa-6892-4379-a3da-13ce32828faa_1376x768.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Gnl8!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F213e8caa-6892-4379-a3da-13ce32828faa_1376x768.png 424w, https://substackcdn.com/image/fetch/$s_!Gnl8!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F213e8caa-6892-4379-a3da-13ce32828faa_1376x768.png 848w, https://substackcdn.com/image/fetch/$s_!Gnl8!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F213e8caa-6892-4379-a3da-13ce32828faa_1376x768.png 1272w, https://substackcdn.com/image/fetch/$s_!Gnl8!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F213e8caa-6892-4379-a3da-13ce32828faa_1376x768.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Gnl8!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F213e8caa-6892-4379-a3da-13ce32828faa_1376x768.png" width="1376" height="768" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/213e8caa-6892-4379-a3da-13ce32828faa_1376x768.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:768,&quot;width&quot;:1376,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1907923,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://blog.apiad.net/i/198385396?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F213e8caa-6892-4379-a3da-13ce32828faa_1376x768.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Gnl8!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F213e8caa-6892-4379-a3da-13ce32828faa_1376x768.png 424w, https://substackcdn.com/image/fetch/$s_!Gnl8!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F213e8caa-6892-4379-a3da-13ce32828faa_1376x768.png 848w, https://substackcdn.com/image/fetch/$s_!Gnl8!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F213e8caa-6892-4379-a3da-13ce32828faa_1376x768.png 1272w, https://substackcdn.com/image/fetch/$s_!Gnl8!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F213e8caa-6892-4379-a3da-13ce32828faa_1376x768.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p></p><blockquote><p><em>Part of the run-up to the second edition of <a href="https://apiad.gumroad.com/l/ai">Mostly Harmless AI</a> &#8212; 50% off during early access &#8212; where this is the spine of a new chapter on generative and multimodal AI. You can also <a href="https://books.apiad.net/books/mhai/">read the whole book online for free</a>. More at the end.</em></p></blockquote><p>One morning, not so long ago, perhaps you asked Claude (or Gemini, or ChatGPT) to do something for you, and decided it was easier to just give it a picture of it than explain the whole thing. Perhaps it was &#8220;how do I cook this thing?&#8221; or &#8220;what building is that?&#8221;  or &#8220;do this homework for me, please, please, my live depends on it&#8221;.  Then you uploaded the picture, and back came a textual response.</p><p>Not happy with what the bot understood, you decided a thorough explanation was owed. But, alas, since all we got is a couple fatty fingers for typing, you decided it was best if you explained it with your own voice. And again, uhms and ehms notwithstanding, you again got a full response back, this time with an audio voice over. </p><p>Ten years ago, this simple dance of back-and-forth multimodal information would have required four separate research fields, each with its own conferences, its own vocabulary, and its own priesthood. They have quietly become one single thing. It&#8217;s all tokens all the way down. Language has subsumed all modalities. This is how.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://blog.apiad.net/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://blog.apiad.net/subscribe?"><span>Subscribe now</span></a></p><h2>The recipe never cared what it was eating</h2><p>Strip &#8220;generative AI&#8221; down to the one idea doing all the work and you get a single sentence: look at a big pile of examples, learn the distribution that produced them, then draw new samples from it. That is the whole trick. It is what a language model does, and it is the only thing a language model does. It is also what an image model does, and an audio model, and a video model.</p><p>The recipe is indifferent to what the examples are. Text is a one-dimensional run of symbols. An image is a two-dimensional field of colour. Audio is a pressure wave sampled tens of thousands of times a second. Video is all of that, plus time, which is why it is the hardest. Four different shapes of data, one identical question asked of each: <em>given what I have seen so far, what plausibly comes next?</em> The machinery that answers that question does not need to know whether &#8220;next&#8221; means a word, a patch of pixels, or a slice of waveform. It only needs the data turned into a sequence of countable things. </p><p><em>Tokens.</em></p><p>So the thing we have been calling a language model was never really about language. Or, put better, it was never about <em>written </em>language. It turns out, language is something far more powerful. </p><p>Ask any linguist, and they&#8217;ll say any set of sequences of distinct symbols (<em>tokens</em>) can be modelled as a language. It doesn&#8217;t matter what your symbols are&#8212;letters, words, patches of images, numbers in a math formula, whatever&#8212;language is just the structure around them, what makes some sequences valid and others nonsense. </p><p>This is the key idea. All else is (incredibly good) engineering.</p><h2>A decade of building the same machine, separately</h2><p>It did not look that way while it was happening. For about a decade every modality got its own bespoke contraption, and each one looked like its own discipline.</p><p>Image people had <strong>generative adversarial networks</strong>: a forger and a detective locked in a training duel, the forger getting better at faking until the detective could no longer tell. The beautiful idea buried in there &#8212; and the one that survived the technique itself &#8212; was the <em>latent space</em>: a compressed interior map of &#8220;all possible faces,&#8221; where walking in a straight line morphs one plausible face smoothly into another. GANs were temperamental, prone to collapsing into a single good fake and refusing to leave, and by the early 2020s they had lost the lead. The latent-space intuition outlived them and runs underneath everything that came after.</p><p>Then <strong>diffusion</strong> took over image generation with a trick that sounds like it shouldn&#8217;t work. Take a real photo, add a little static, add a little more, keep going until it is pure snow. Now train a network to undo one step of that. To make a new image, start from snow and run the undo, over and over, until something coherent surfaces. It is sculpture by removing noise instead of removing marble, and it is what powers essentially every image generator you have used.</p><p>Audio had its own separate lineage: speech-to-text built one way, text-to-speech another, music a third. Text had the large language models, off in their own enormous-budget corner of the field. Four communities, four sets of architectures, four sets of war stories. If you had asked, in 2021, whether the image people and the language people were building the same machine, both sides would have laughed.</p><h2>CLIP quietly knocks out the wall</h2><p>The crack in the wall came from a 2021 model whose job sounds almost too modest to matter: teach one system that the word <em>dog</em> and a photograph of a dog are talking about the same thing.</p><p>The way you do that is to train a text encoder and an image encoder <em>together</em>, on hundreds of millions of caption-and-picture pairs, with one instruction: put a picture and its true caption close together in a shared space, and shove mismatched pairs apart. What you get at the end is a single space where &#8220;a photo of a golden retriever&#8221; and an actual photo of a golden retriever land as neighbours. Text and pixels, in the same room, with the same coordinates.</p><p>That sounds like a party trick for image search. It was the hinge the whole field turned on. Once text and images live in one space, text can <em>steer</em> image generation &#8212; point the diffusion process at the region of the space that means &#8220;golden retriever in a spacesuit,&#8221; and let it denoise toward there. Every text-to-image system you have used is, under the paint, that move. And the deeper implication was harder to ignore than the application: if you can put two modalities in one space, the wall between them was never structural. It was just a wall nobody had walked through yet.</p><h2>Tokens all the way down</h2><p>Here is where it lands. By the mid-2020s the bespoke machines stopped being separate machines.</p><p>The move is almost embarrassingly direct. Tokenise everything. Text already broke into tokens. Cut an image into a grid of patches and treat each patch as a token. Run audio through a neural codec that emits discrete chunks, and those are tokens too. Now you do not have a text stream and an image stream and an audio stream. You have one stream of tokens that happen to have come from different alphabets, and you train a single model on the only objective that was ever in play: predict the next token, whatever kind it is.</p><p>A model trained that way reads and writes everything, because to it there is no &#8220;everything&#8221; &#8212; there is just the sequence and the next position in it. You have used these. The one that holds a spoken conversation with sub-second latency, looks at the photo you paste in, and writes you a paragraph back is not a language model bolted to an image model bolted to a speech model. It is one model that was never told these were different problems.</p><p>Which is why the question that organised the field for a decade &#8212; <em>is this a language model or an image model?</em> &#8212; has quietly stopped having an answer. It is the same machine. The only thing that ever changed between text and pixels and sound was the alphabet, and the transformer emitting the next token has never cared which alphabet it is spelling in. It is tokens all the way down. &#8220;Language modelling&#8221; was a local name for something with no allegiance to language at all: modelling sequences of anything we can count.</p><h2>The honest part</h2><p>It would be easy to end on the astonishment, and the astonishment is real. One model, every modality, falling out of one stubbornly simple objective applied to a wider and wider definition of &#8220;token&#8221; &#8212; that is one of the genuinely beautiful results of the decade, and the kind of unification that does not come along often.</p><p>But unification is not the same as understanding, and I am not going to let the elegance smuggle that past you. A system that can place &#8220;dog&#8221; next to a dog in its latent space has learned the statistics of how dogs are described and depicted. </p><p>Whether it has learned what a dog <em>is</em> is a different question, and the convergence story does not answer it. It just makes the question apply to every modality at once instead of only to text. The machine got more general. It did not get more grounded necessarily. Both of those are true at once, and the interesting work of the next few years lives in the gap between them.</p><p>Until next time, stay curious.</p><div><hr></div><p><em>This is the core argument of a new chapter in the second edition of <a href="https://apiad.gumroad.com/l/ai">Mostly Harmless AI</a> &#8212; the full chapter walks GANs, diffusion, CLIP, audio, and native multimodality with the scenes and citations this post had to cut, and it is 50% off during early access. The whole book is also <a href="https://books.apiad.net/books/mhai/">free to read online</a>. If you want the rest of the argument &#8212; how these systems are trained, where they break, and what to actually do about it &#8212; that is what the book is for.</em></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://store.apiad.net/l/mhai/fiftyoff&quot;,&quot;text&quot;:&quot;Get it (50% off) for life&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://store.apiad.net/l/mhai/fiftyoff"><span>Get it (50% off) for life</span></a></p><p></p>]]></content:encoded></item><item><title><![CDATA[70 Years of AI History in 10 Minutes]]></title><description><![CDATA[A summary to the updated zero-th chapter of Mostly Harmless AI v2.]]></description><link>https://blog.apiad.net/p/70-years-of-ai-history-in-10-minutes</link><guid isPermaLink="false">https://blog.apiad.net/p/70-years-of-ai-history-in-10-minutes</guid><dc:creator><![CDATA[Alejandro Piad Morffis]]></dc:creator><pubDate>Mon, 18 May 2026 11:40:34 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!uPIg!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F11712488-c3b8-451c-903d-be69a0286d5f_3820x2964.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!uPIg!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F11712488-c3b8-451c-903d-be69a0286d5f_3820x2964.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!uPIg!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F11712488-c3b8-451c-903d-be69a0286d5f_3820x2964.jpeg 424w, https://substackcdn.com/image/fetch/$s_!uPIg!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F11712488-c3b8-451c-903d-be69a0286d5f_3820x2964.jpeg 848w, https://substackcdn.com/image/fetch/$s_!uPIg!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F11712488-c3b8-451c-903d-be69a0286d5f_3820x2964.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!uPIg!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F11712488-c3b8-451c-903d-be69a0286d5f_3820x2964.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!uPIg!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F11712488-c3b8-451c-903d-be69a0286d5f_3820x2964.jpeg" width="1456" height="1130" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/11712488-c3b8-451c-903d-be69a0286d5f_3820x2964.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1130,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:2468838,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://blog.apiad.net/i/198246760?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F11712488-c3b8-451c-903d-be69a0286d5f_3820x2964.jpeg&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!uPIg!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F11712488-c3b8-451c-903d-be69a0286d5f_3820x2964.jpeg 424w, https://substackcdn.com/image/fetch/$s_!uPIg!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F11712488-c3b8-451c-903d-be69a0286d5f_3820x2964.jpeg 848w, https://substackcdn.com/image/fetch/$s_!uPIg!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F11712488-c3b8-451c-903d-be69a0286d5f_3820x2964.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!uPIg!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F11712488-c3b8-451c-903d-be69a0286d5f_3820x2964.jpeg 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"><em>Raphael, <a href="https://en.wikipedia.org/wiki/The_School_of_Athens">&#8220;The School of Athens&#8221;</a> (1509&#8211;1511), Apostolic Palace &#8212; Plato points up to the eternal forms (the rule-followers); Aristotle&#8217;s palm presses down to the empirical world (the pattern-finders).</em></figcaption></figure></div><blockquote><p><em>Every post on the blog this month is on the theme of agent reliability, anchored on the second edition of <a href="https://apiad.gumroad.com/l/ai">Mostly Harmless AI</a> &#8212; 50% off during early access &#8212; where the history below is the full ~8,000-word opening chapter, with 70+ references and all the scenes this post had to cut. You can also <a href="https://books.apiad.net/books/mhai/">read the whole book online for free</a>. More at the end.</em></p></blockquote><p>Seventy years ago, two men sat in two different rooms and disagreed about what a thinking machine should look like. Neither of them has been proven right. Both have been proven half-right, several times, in alternation, for the whole of my lifetime and most of yours.</p><p>I think the entire history of AI is that one argument, still going.</p><p>The first camp wanted to build minds out of rules. Feed the machine enough knowledge in a logical enough form, and reasoning falls out of the logic. It called itself a lot of things over the decades &#8212; symbolic AI, knowledge-based systems, good old-fashioned AI &#8212; but its home is <strong>rationalism</strong>. The second camp wanted to build minds out of examples. Feed the machine enough data, in any messy form whatsoever, and behavior falls out of the statistics. It also kept renaming itself &#8212; connectionism, machine learning, deep learning &#8212; but its home is <strong>empiricism</strong>. Same goal, a machine that does what intelligent people do. Sixty years of disagreement about how.</p><p>Here&#8217;s the ending, spoiled early, because this isn&#8217;t a thriller. The argument did not produce a winner. It produced a marriage. The chatbots, the image generators, the agents writing code while you sleep &#8212; none of them is one side beating the other. They&#8217;re both sides, finally forced to share a workshop. Let me walk you through how we got there. Fast.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://blog.apiad.net/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://blog.apiad.net/subscribe?"><span>Subscribe now</span></a></p><h2>Both seeds, one summer</h2><p>They were planted within five years of each other. In 1943, McCulloch and Pitts wrote down a neuron as a weighted sum with a threshold &#8212; twelve pages, the seed of the empiricist branch. In 1950, Turing refused to define <em>thinking</em> and proposed a behavioral test instead, a question both camps could chase. In 1956, ten people spent a summer at Dartmouth, coined the phrase <em>artificial intelligence</em>, and planned to crack language and reasoning in a few months. (We are still working on it.) In 1957, Rosenblatt built the <strong>Perceptron</strong>, the first machine that learned from examples, and the <em>New York Times</em> announced it would soon walk, talk, and be conscious of its own existence.</p><p>Two foundational myths, in the ground, in the same decade. The rest is which one got watered.</p><h2>The rationalists win the first round</h2><p>And they win it convincingly. In the 1950s and 60s compute is tiny and data, in the sense of millions of labeled examples, does not exist. </p><p>What you <em>can</em> do is write a program that does something specific and inspect every step of it. So the symbolic camp gets the better results and the better tools. Newell and Simon&#8217;s theorem-prover. McCarthy&#8217;s LISP. Weizenbaum&#8217;s ELIZA &#8212; four pages of pattern-matching that understood nothing, and that people confided in anyway. (Hold onto ELIZA. The field will relearn that exact lesson about six more times.) Winograd&#8217;s SHRDLU, fluent and thoughtful inside a closed world of colored blocks.</p><p>The catch was always the world. SHRDLU&#8217;s blocks could all be known, listed, reversed. The real world has rain, and grandparents, and the smell of coffee, and you cannot list it. In the closed world of symbols, symbols were enough. The next decade was about discovering, painfully, that the world is not closed.</p><h2>The cost of winning too hard</h2><p>In 1969, Minsky and Papert published <em>Perceptrons</em> and proved a single-layer network can&#8217;t compute XOR. </p><p>The proof was correct. It was also narrow &#8212; they admitted multi-layer networks could do it, nobody just knew how to train them yet. But the field was hungry for a verdict, and it read the book as one. Funding for neural networks collapsed. Rosenblatt died two years later in a boating accident, on his 43rd birthday. The algorithm that would resurrect his branch didn&#8217;t arrive at scale until 1986. Seventeen years of silence.</p><p>Modern AI runs on the work of people who weren&#8217;t born when Minsky and Papert published. The reason their work came so late is that the field they&#8217;d return to had been kept near-dead for two decades. The symbolic camp&#8217;s victory was real. The field paid for it. It will pay that bill again.</p><h2>The rationalist trap</h2><p>Through the 1970s and 80s the symbolic branch found something that made money: <strong>expert systems</strong>. </p><p>MYCIN matched infectious-disease specialists. XCON saved DEC tens of millions a year. The thesis was clean and seductive &#8212; intelligence is rules plus facts; hire the expert, extract the rules, ship the system. And these systems were <em>legible</em>. You could read every rule, audit the reasoning, fix the wrong line. (Your favorite large language model cannot do this. We&#8217;ll come back to that another day.)</p><p>Two problems killed it. Common sense turns out to be unrepresentable in rules &#8212; <em>birds fly, except penguins, except baby penguins, except dead ones</em> &#8212; and the rules contradict each other faster than you can write them. And then there&#8217;s Cyc: in 1984 Doug Lenat set out to hand-encode all of common-sense knowledge, estimated ten years, and is still at it forty-two years later. It is the most thoroughly humbling monument in the history of cognitive science. </p><p>By the late 80s the money dried up and the Second AI Winter set in. The field was tired of the rationalists.</p><h2>The empirical rebellion</h2><p>We&#8217;re on 1986: backpropagation, in <em>Nature</em>, multi-layer networks are finally trainable. </p><p>Then the empiricist branch spends fifteen years not scoring one big win but a thousand small ones. Support vector machines. Random forests. Boosting. Statistical methods quietly eating one application after another, including the symbolic camp&#8217;s home turf &#8212; language, where IBM&#8217;s speech team found that every time they <em>fired</em> a linguist, the system improved.</p><p>Why now? Three things are moving together, slowly. Compute grew. The internet started producing data in volumes nobody had imagined. And the methods were simple enough to scale with both. </p><p>In 2019 Richard Sutton would name this <em>The Bitter Lesson</em>: across seventy years, the general method that scales with compute beats the clever hand-engineered one, every time. It&#8217;s bitter because it tells researchers their hard-won taste gets steamrolled by someone with more GPUs. It is <em>mostly</em> right.</p><p> The thing that complicates it is the thing symbolic AI was good at all along &#8212; but I&#8217;m getting ahead of myself.</p><h2>The earthquake</h2><p>Now jump to September 2012. </p><p>AlexNet &#8212; eight layers, two gaming GPUs, a couple of training tricks &#8212; drops the ImageNet error rate ten points below the nearest hand-engineered system. A ten-point gap isn&#8217;t an improvement. It&#8217;s a different category of result. Within six months every computer-vision lab on Earth has pivoted. AlexNet is, by a wide margin, the single most consequential paper in modern AI.</p><p>Then it cascades, almost too fast to track. Sequence-to-sequence translation. GANs. Atari from raw pixels. In 2016 <strong>AlphaGo</strong> beats Lee Sedol at a game with more board positions than there are atoms in the universe &#8212; and almost nobody notices that inside it is a deep network (empiricist) wrapped around a tree search (symbolic). The marriage is <em>already there</em>, in 2016, hiding in plain sight. In 2017, &#8220;Attention Is All You Need&#8221; introduces the Transformer, and every model in your chat window today descends from that one paper.</p><h2>The crown jewel nobody talks about</h2><p>The most consequential AI system of the modern era is not a chatbot. It doesn&#8217;t write poems. It&#8217;s in the bloodstream of structural biology.</p><p>Predicting a protein&#8217;s 3D shape from its amino-acid sequence is a fifty-year-old problem. The rationalist approach &#8212; simulate the physics &#8212; was beautiful and almost completely intractable. For twenty years the field&#8217;s hardest benchmark plateaued at a score around 40. In 2020, DeepMind&#8217;s <strong>AlphaFold 2</strong> scored above 92 on that exact tier. The grand challenge was, for practical purposes, solved. Hassabis and Jumper got the 2024 Nobel in Chemistry for it &#8212; <em>the only AI work so far to produce a Nobel-level scientific breakthrough</em>. </p><p>Read the citation. It isn&#8217;t about AI as a technology. It&#8217;s about a problem that got finished while the people whose careers were defined by it slept. The chatbots get the headlines. The image generators get the lawsuits. The protein folder got the world. Remember that the next time someone wants to tell AI is ChatGPT.</p><h2>The synthesis</h2><p>Now the marriage. The word <em>agent</em> did not come from machine learning. It came from classical, symbolic AI in the 1970s and 80s: a system that perceives its environment, deliberates, picks an action, acts, observes, loops. The architecture was right. The brain was missing. Pure symbolic computation could never model a world with grandparents and coffee in it, so the agent shell sat there for decades, structurally correct and operationally empty. Cyc, again, is the long sad proof.</p><p>The empiricists borrowed the same word in the 2000s &#8212; in reinforcement learning, an <em>agent</em> is a learned policy. DQN was an agent. AlphaGo was an agent. A new brain, slotted into the old shell. Spectacular, and narrow. An AlphaGo cannot make you a sandwich.</p><p>In 2024 the cognition slot gets filled a third time, by a general-purpose reasoning language model. The shell is still the seventy-year-old symbolic frame: perceive, deliberate, pick an action with a name and a meaning &#8212; <code>read_file</code>, <code>run_tests</code>, <code>send_email</code> &#8212; act, observe, loop. The brain is now an LLM. From the empiricist side the system inherits flexibility: it has read enough of the world that you don&#8217;t have to tell it what a file is, or what an angry customer sounds like. From the symbolic side it inherits structure: the actions have names, the consequences are bounded, the trajectory is auditable. The model can hallucinate; the system can&#8217;t run <code>rm -rf</code> unless somebody wired that action in and granted it.</p><p>The 1970s symbolic agent could never reason. The 2010s RL agent could never generalize. The 2026 agent does both &#8212; badly, often clumsily, but for the first time at the same time. You can watch this happen most clearly in software development right now: a language model at the core, a harness of tools around it, a test suite as the verifier, a human reviewing the diff. All four layers, on a laptop, today. Software is the canary. The same pattern is already moving toward research, then education, then everything whose feedback loops are fast enough.</p><p>So here is the closing claim, the techno-pragmatist version of the whole story. The seventy-year argument did not produce a winner. It produced three layers: a learning substrate that absorbed the written record of humanity, a symbolic shell that makes it accountable, and a human frame that decides what the whole thing is <em>for</em>. The first two are engineering. The third is the only one that was ever really about us.</p><p>The synthesis exists. What we do with it is still up to us.</p><p>Until next time, <strong>stay curious</strong>.</p><div><hr></div><p><em>This post is the speedrun &#8212; the book&#8217;s ~8,000-word opening chapter compressed down to its spine. The full version in the second edition of <a href="https://apiad.gumroad.com/l/ai">Mostly Harmless AI</a> has every scene with its characters, the eras this post skipped, 70+ references, and the agentic stack the rest of the book then takes apart mechanism by mechanism. It&#8217;s the book I wish someone had handed me when I was trying to make sense of the noise &#8212; and it&#8217;s 50% off while it&#8217;s in early access. You can also <a href="https://books.apiad.net/books/mhai/">read the whole thing online for free</a> in a custom reader I built and am rather proud of: dark mode, font controls, progress tracking, offline support, the works.</em></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://store.apiad.net/l/mhai/fiftyoff&quot;,&quot;text&quot;:&quot;Get it (50% off)&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://store.apiad.net/l/mhai/fiftyoff"><span>Get it (50% off)</span></a></p><p><em>And if you want the whole catalog of everything I&#8217;ve written, plus everything I&#8217;m going to write, that&#8217;s <a href="https://apiad.gumroad.com/l/compendium">the Compendium</a>. One purchase, in perpetuity.</em></p>]]></content:encoded></item><item><title><![CDATA[How Large Languages Models Are Really Made]]></title><description><![CDATA[The full road from text data to reasoning models, explained visually with zero math or code.]]></description><link>https://blog.apiad.net/p/mhai-llms</link><guid isPermaLink="false">https://blog.apiad.net/p/mhai-llms</guid><dc:creator><![CDATA[Alejandro Piad Morffis]]></dc:creator><pubDate>Mon, 11 May 2026 15:51:34 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!gdqf!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb83eb00a-2839-4312-8300-5fbf71095e63_1376x768.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!gdqf!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb83eb00a-2839-4312-8300-5fbf71095e63_1376x768.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!gdqf!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb83eb00a-2839-4312-8300-5fbf71095e63_1376x768.jpeg 424w, https://substackcdn.com/image/fetch/$s_!gdqf!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb83eb00a-2839-4312-8300-5fbf71095e63_1376x768.jpeg 848w, https://substackcdn.com/image/fetch/$s_!gdqf!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb83eb00a-2839-4312-8300-5fbf71095e63_1376x768.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!gdqf!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb83eb00a-2839-4312-8300-5fbf71095e63_1376x768.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!gdqf!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb83eb00a-2839-4312-8300-5fbf71095e63_1376x768.jpeg" width="1376" height="768" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/b83eb00a-2839-4312-8300-5fbf71095e63_1376x768.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:768,&quot;width&quot;:1376,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!gdqf!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb83eb00a-2839-4312-8300-5fbf71095e63_1376x768.jpeg 424w, https://substackcdn.com/image/fetch/$s_!gdqf!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb83eb00a-2839-4312-8300-5fbf71095e63_1376x768.jpeg 848w, https://substackcdn.com/image/fetch/$s_!gdqf!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb83eb00a-2839-4312-8300-5fbf71095e63_1376x768.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!gdqf!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb83eb00a-2839-4312-8300-5fbf71095e63_1376x768.jpeg 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"><em>After Joseph Racknitz&#8217;s 1789 <a href="https://en.wikipedia.org/wiki/Mechanical_Turk">cutaway engraving</a> of Wolfgang von Kempelen&#8217;s Mechanical Turk; generated with Nano Banana 2.</em></figcaption></figure></div><p>You type a message to an AI assistant and it answers. The answer isn&#8217;t looked up. It isn&#8217;t scripted. The model generated it, character by character, guided by a single mathematical question: <em>what comes next?</em></p><p>That question is the foundation of every language model ever built. A <strong>language model</strong> is a probability distribution over text &#8212; a function that, given a sequence of words, assigns a probability to every possible continuation. &#8220;The cat sat on the mat&#8221; scores higher than &#8220;the mat sat on the cat&#8221; not because a language model understands what cats do, but because the first sequence appears in human text and the second doesn&#8217;t. The model has compressed the co-occurrence patterns of an enormous corpus into its weights, and that compression is what produces a score.</p><p>Generative: you give the model a prefix, it samples the highest-probability next token, appends it, samples again, and repeats until a stop token arrives. The multi-paragraph response you got from ChatGPT this morning was that loop running a few hundred times. No lookup table. No if-else tree. No pre-scripted answers. Just: <em>given all of this text, what is most likely to come next?</em></p><p>Here&#8217;s the thing I find quietly strange about this: it works. A procedure this simple &#8212; assign probabilities, sample the maximum &#8212; has produced the most influential technology of the last decade. What makes it work isn&#8217;t the procedure. It&#8217;s everything that goes into building a probability distribution that&#8217;s actually <em>good</em>. Good enough to write coherent paragraphs. Good enough to reason about code. Good enough to pass the bar exam and explain quantum mechanics in language your parents can follow.</p><p>Getting there took decades of compounding ideas. The arc is what this piece covers &#8212; from the crudest possible approximation of &#8220;probability over text&#8221; to the current frontier, where models are learning to think.</p><p>Each section of what follows is best understood as a response to the failure of the previous one. N-gram models worked until they didn&#8217;t. Neural embeddings fixed the part that broke. Pretraining scaled the fix to the size where it became genuinely impressive. Instruction tuning made the result useful for the first time. Preference learning fixed what instruction tuning couldn&#8217;t. Reasoning models added something nobody was sure was trainable at all.</p><p>Seven steps. One direction.</p><blockquote><p><em>Every post on the blog this month is on the theme of agent reliability, anchored on the second edition of <a href="https://apiad.gumroad.com/l/ai">Mostly Harmless AI</a>, where the engineering details that don&#8217;t fit a blog post live. You can also <a href="https://books.apiad.net/books/mhai/">read the whole book online for free</a> in a custom reader I built. More at the end.</em></p></blockquote><div><hr></div><blockquote><p>This post may be truncated in your email. <a href="https://blog.apiad.net/p/mhai-llms">Read it online</a> for the best experience.</p></blockquote><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://blog.apiad.net/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://blog.apiad.net/subscribe?"><span>Subscribe now</span></a></p><h2>The n-gram intuition</h2><p>The simplest possible implementation of &#8220;probability over text&#8221; is a lookup table.</p><p>Take a large corpus &#8212; a hundred million words will do to start. For every three-word sequence (<strong>trigram</strong>) in that corpus, record which word follows it most often, and with what frequency. &#8220;The quick brown&#8221; &#8594; &#8220;fox,&#8221; nine times out of ten, because Project Gutenberg is full of that particular sentence. &#8220;The capital of&#8221; &#8594; &#8220;France&#8221; thirty percent of the time, &#8220;Germany&#8221; twelve, &#8220;England&#8221; eleven, and so on through the geography. For every trigram you&#8217;ve seen, you have a probability distribution over what comes next.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!CQfE!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F60bf3e3f-4e4b-4115-9ccf-54ee5a394136_1600x1021.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!CQfE!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F60bf3e3f-4e4b-4115-9ccf-54ee5a394136_1600x1021.png 424w, https://substackcdn.com/image/fetch/$s_!CQfE!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F60bf3e3f-4e4b-4115-9ccf-54ee5a394136_1600x1021.png 848w, https://substackcdn.com/image/fetch/$s_!CQfE!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F60bf3e3f-4e4b-4115-9ccf-54ee5a394136_1600x1021.png 1272w, https://substackcdn.com/image/fetch/$s_!CQfE!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F60bf3e3f-4e4b-4115-9ccf-54ee5a394136_1600x1021.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!CQfE!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F60bf3e3f-4e4b-4115-9ccf-54ee5a394136_1600x1021.png" width="1456" height="929" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/60bf3e3f-4e4b-4115-9ccf-54ee5a394136_1600x1021.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:929,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!CQfE!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F60bf3e3f-4e4b-4115-9ccf-54ee5a394136_1600x1021.png 424w, https://substackcdn.com/image/fetch/$s_!CQfE!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F60bf3e3f-4e4b-4115-9ccf-54ee5a394136_1600x1021.png 848w, https://substackcdn.com/image/fetch/$s_!CQfE!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F60bf3e3f-4e4b-4115-9ccf-54ee5a394136_1600x1021.png 1272w, https://substackcdn.com/image/fetch/$s_!CQfE!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F60bf3e3f-4e4b-4115-9ccf-54ee5a394136_1600x1021.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"><em>A trigram model is a frequency dictionary: each three-word context maps to a distribution over possible next words. The catch is that a 50,000-word vocabulary admits ~126 trillion possible trigrams, most of which never appear in any corpus. Made with <a href="https://github.com/apiad/tesserax">tesserax</a>.</em></figcaption></figure></div><p>Now generate text. Give the model &#8220;The quick brown&#8221; and it continues: &#8220;fox.&#8221; Give it &#8220;jumped over the&#8221; and it continues: &#8220;lazy.&#8221; Three words in and it&#8217;s generated &#8220;The quick brown fox jumped over the lazy&#8221; &#8212; and if you&#8217;re lucky, it lands &#8220;dog&#8221; and you&#8217;ve reproduced a famous sentence entirely from corpus statistics. Locally, it&#8217;s plausible. You could read a sentence of this and not immediately know you&#8217;re looking at a machine.</p><p>The problems start fast. By the third sentence, the model has no idea it was talking about a fox. It only remembers the last three words. You ask it to continue &#8220;The fox had been running from the&#8221; and it has no idea that a fox is involved, or that running happened, or that there&#8217;s a pursuit in progress. It just has three words and a lookup table. The output is <em>statistically English</em>. It is not coherent.</p><p>This is the <strong>Markov assumption</strong>: the next word depends only on the last N words, not on the full history of the text. For N=3, it&#8217;s a trigram model. You can increase N &#8212; five-gram models were standard in commercial speech recognition for years &#8212; but the table explodes. Fifty thousand words in the English vocabulary gives 50,000&#179; possible trigram contexts, roughly 125 trillion entries. At N=10, the number of possible sequences exceeds the estimated number of atoms in the observable universe. The table can never be complete enough to cover the distribution.</p><p>There&#8217;s a real engineering solution to the &#8220;we haven&#8217;t seen this exact trigram&#8221; problem: smoothing and interpolation. Estimate the probability of an unseen N-gram from shorter sub-sequences. <strong>Hidden Markov models</strong> formalised this in a probabilistic framework that, by the 1990s, had enough polish to power industrial speech recognition and early machine translation. I don&#8217;t want to undersell it &#8212; it worked. It was genuinely useful. It just topped out.</p><p>The wall is fundamental. Real language has dependencies that can be arbitrarily long. <em>&#8220;The man who sold the car that had been parked in front of the house where my grandmother lived was finally found.&#8221;</em> The subject of &#8220;was finally found&#8221; is seventeen words and three nested clauses back. No N-gram model reaches it. You need something that can condition on the full context &#8212; or at least compress the full context intelligently &#8212; rather than amnesiacally forget everything more than N words ago.</p><p>You need a model that generalises from sequences it has seen to sequences it hasn&#8217;t. A lookup table can only interpolate from what it&#8217;s seen before. What you need is something that has <em>understood the pattern</em> deeply enough to extrapolate.</p><p>N-gram models work until they don&#8217;t &#8212; and they don&#8217;t beyond a few words.</p><h2>Words as numbers</h2><p>Neural networks can learn the compression n-gram models can&#8217;t achieve. But they have a hard prerequisite: they operate on numbers. Words are symbols. Before a neural network can do anything useful with text, you need to represent words as vectors. The naive approach throws away everything that matters.</p><p>The obvious first attempt is <strong>one-hot encoding</strong>. Vocabulary of 50,000 words; each word is a vector of length 50,000 with a single 1 and 49,999 zeros. &#8220;Cat&#8221; is at position 4,312; &#8220;dog&#8221; is at position 17,846; &#8220;carburetor&#8221; is somewhere else entirely. The problem: nothing in this representation suggests that &#8220;cat&#8221; and &#8220;dog&#8221; are more similar to each other than either is to &#8220;carburetor.&#8221; The distance between every pair of one-hot vectors is identical. You&#8217;ve handed the network a symbol system with no structure, and it has to reconstruct the structure from scratch &#8212; spending enormous capacity learning that cats and dogs are both animals, that both appear near &#8220;fur&#8221; and &#8220;vet,&#8221; that &#8220;cat food&#8221; and &#8220;dog food&#8221; are structurally related &#8212; before it can learn anything about how language actually works.</p><p>The key insight that resolved this came from linguistics, not machine learning, and I think it&#8217;s underrated as an idea. J.R. Firth, writing in 1957: <em>&#8220;you shall know a word by the company it keeps.&#8221;</em> The <strong>distributional hypothesis</strong>. Words that appear in similar contexts &#8212; near similar neighbouring words, in similar grammatical positions &#8212; tend to have similar meanings. &#8220;Cat&#8221; and &#8220;dog&#8221; both appear near &#8220;pet,&#8221; &#8220;feed,&#8221; &#8220;vet,&#8221; &#8220;owner,&#8221; &#8220;fur,&#8221; &#8220;collar.&#8221; The context is a fingerprint of the meaning. Encode that fingerprint in a vector and you have a representation where similar words land close together in space.</p><p><strong>Word2Vec</strong> (<a href="https://arxiv.org/abs/1301.3781">Mikolov et al., 2013</a>) turned this into a training procedure. Train a shallow neural network to predict a word from its surrounding context words, or vice versa. Force each word&#8217;s representation down into a dense vector of, say, 300 floating-point numbers. Train on a billion words of text. The network learns that words appearing in similar contexts should have similar representations, because that&#8217;s what makes the prediction task cheaper. Words with similar distributional patterns end up with similar vectors &#8212; not because anyone programmed that, but because it follows from the objective.</p><p>The result that made people pay attention: <strong>vector arithmetic encodes semantic relationships</strong>. Take the vector for &#8220;king,&#8221; subtract the vector for &#8220;man,&#8221; add the vector for &#8220;woman.&#8221; The nearest vector in the resulting space is &#8220;queen.&#8221; Paris minus France plus Italy is approximately Rome. Try it yourself: it works because the structural relationship between &#8220;king&#8221; and &#8220;queen&#8221; is parallel to the relationship between &#8220;man&#8221; and &#8220;woman&#8221; in how the four words co-occur with everything around them. No one wrote these analogies in. The geometry of the space mirrors the structure of meaning, because both are implicit in how words appear together in natural language.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!KXh1!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe75fb3c1-35e7-49f9-915b-6bdd8334bef2_1600x1387.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!KXh1!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe75fb3c1-35e7-49f9-915b-6bdd8334bef2_1600x1387.png 424w, https://substackcdn.com/image/fetch/$s_!KXh1!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe75fb3c1-35e7-49f9-915b-6bdd8334bef2_1600x1387.png 848w, https://substackcdn.com/image/fetch/$s_!KXh1!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe75fb3c1-35e7-49f9-915b-6bdd8334bef2_1600x1387.png 1272w, https://substackcdn.com/image/fetch/$s_!KXh1!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe75fb3c1-35e7-49f9-915b-6bdd8334bef2_1600x1387.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!KXh1!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe75fb3c1-35e7-49f9-915b-6bdd8334bef2_1600x1387.png" width="458" height="396.97527472527474" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/e75fb3c1-35e7-49f9-915b-6bdd8334bef2_1600x1387.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1262,&quot;width&quot;:1456,&quot;resizeWidth&quot;:458,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!KXh1!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe75fb3c1-35e7-49f9-915b-6bdd8334bef2_1600x1387.png 424w, https://substackcdn.com/image/fetch/$s_!KXh1!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe75fb3c1-35e7-49f9-915b-6bdd8334bef2_1600x1387.png 848w, https://substackcdn.com/image/fetch/$s_!KXh1!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe75fb3c1-35e7-49f9-915b-6bdd8334bef2_1600x1387.png 1272w, https://substackcdn.com/image/fetch/$s_!KXh1!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe75fb3c1-35e7-49f9-915b-6bdd8334bef2_1600x1387.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"><em>The relationship between</em> king <em>and</em> queen <em>is parallel to the one between</em> man <em>and</em> woman <em>&#8212; both fall out of how the four words co-occur with everything around them. Made with <a href="https://github.com/apiad/tesserax">tesserax</a>.</em></figcaption></figure></div><p>I find this genuinely strange, in the best possible way. You trained a network to do a simple word-guessing task on flat text, and the side effect was an algebra of concepts. The geometry was always latent in the co-occurrence patterns. Word2Vec just made it legible.</p><p>Modern language models don&#8217;t use Word2Vec as a separate preprocessing step &#8212; the embedding representations are learned jointly with the rest of the network during training on text. But Word2Vec&#8217;s intuition is <em>why</em> learned embeddings work at all. Once language is geometry, gradient descent has a surface to grip. You can compute distances, optimise them, stack arbitrarily deep networks on top, and train the whole thing end-to-end.</p><p>Embeddings are how we lie to neural networks in a useful way. We pretend words are points in space, so the math works out.</p><h2>Pretraining</h2><p>Now scale it.</p><p>Take a deep neural network &#8212; not the shallow two-layer thing in Word2Vec, but a transformer with dozens or hundreds of layers, billions of parameters, and an attention mechanism in every one of them. Feed it next-token prediction across the entire accessible internet: Wikipedia, GitHub, every book ever digitised, every forum thread, every research paper, every recipe, every political argument, every user manual for every piece of machinery ever manufactured. Same objective the n-gram model had: <em>given what came before, what comes next?</em> Except now the model has billions of parameters to compress the patterns into, the training signal is trillions of tokens, and the architecture is built to handle arbitrarily long context.</p><p>The architecture is what made everything else possible. The <strong>transformer</strong> (<a href="https://arxiv.org/abs/1706.03762">Vaswani et al., 2017</a>) uses <strong>self-attention</strong> as its core operation. For each token in the input, self-attention computes relevance weights over every other token in the sequence &#8212; learned weights, computed from the data, different for each token, different in each layer. A pronoun can attend strongly to the noun it refers to, twenty positions back. A closing argument can reach back to the premise from the opening paragraph. There is no fixed window; the model considers, in principle, the full context at every step.</p><p>This is what broke the n-gram scaling wall. Not a bigger lookup table. Not smarter interpolation. A learned, flexible attention mechanism that compresses long-range dependencies into the model&#8217;s weights rather than trying to enumerate every possible context sequence. The key property, and it&#8217;s the one I keep coming back to: <em>soft</em>. Self-attention doesn&#8217;t pick one relevant token; it blends all of them with learned weights. The whole sequence contributes to every prediction, with a learned notion of how much each part matters.</p><p>The other critical property is <strong>self-supervised learning</strong>. There are no human-provided labels anywhere in pretraining. The text itself is the training signal. Show the model &#8220;The capital of&#8221; and ask it to predict &#8220;France.&#8221; It&#8217;s wrong; the gradient flows; the weights update. Show it three trillion tokens; let the gradient flow three trillion times. The entire digitised corpus of human knowledge is your training set, with zero labelling cost, because the next token is always right there.</p><p><strong><a href="https://arxiv.org/abs/2001.08361">Kaplan et al., 2020</a></strong> measured loss as a function of model size, dataset size, and compute over seven orders of magnitude. The result: loss falls as a clean power law across all three dimensions. Double the parameters, get a predictable drop in loss. Double the training data, same. Scale is not a bet on something uncertain; it is a known return on investment, measured and re-measured across a staggering range.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!-3Ei!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8adcc87d-c1ca-467b-b091-9e17e61cd52a_1600x944.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!-3Ei!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8adcc87d-c1ca-467b-b091-9e17e61cd52a_1600x944.png 424w, https://substackcdn.com/image/fetch/$s_!-3Ei!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8adcc87d-c1ca-467b-b091-9e17e61cd52a_1600x944.png 848w, https://substackcdn.com/image/fetch/$s_!-3Ei!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8adcc87d-c1ca-467b-b091-9e17e61cd52a_1600x944.png 1272w, https://substackcdn.com/image/fetch/$s_!-3Ei!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8adcc87d-c1ca-467b-b091-9e17e61cd52a_1600x944.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!-3Ei!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8adcc87d-c1ca-467b-b091-9e17e61cd52a_1600x944.png" width="490" height="289.08653846153845" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/8adcc87d-c1ca-467b-b091-9e17e61cd52a_1600x944.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:859,&quot;width&quot;:1456,&quot;resizeWidth&quot;:490,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!-3Ei!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8adcc87d-c1ca-467b-b091-9e17e61cd52a_1600x944.png 424w, https://substackcdn.com/image/fetch/$s_!-3Ei!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8adcc87d-c1ca-467b-b091-9e17e61cd52a_1600x944.png 848w, https://substackcdn.com/image/fetch/$s_!-3Ei!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8adcc87d-c1ca-467b-b091-9e17e61cd52a_1600x944.png 1272w, https://substackcdn.com/image/fetch/$s_!-3Ei!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8adcc87d-c1ca-467b-b091-9e17e61cd52a_1600x944.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"><em>Schematic of the Kaplan finding: loss vs. compute on log-log axes. The points fall on a clean line over eight decades of compute &#8212; scale isn&#8217;t a bet, it&#8217;s a known return. Made with <a href="https://github.com/apiad/tesserax">tesserax</a>.</em></figcaption></figure></div><p><strong><a href="https://arxiv.org/abs/2203.15556">Hoffmann et al., 2022</a></strong> &#8212; the Chinchilla paper &#8212; corrected a real error in how the field had been applying Kaplan&#8217;s result. Most large models of 2020-21 had been trained on far fewer tokens than their parameter count justified. The Kaplan result suggested scaling up models as fast as possible; Hoffmann&#8217;s finding was that you need to scale model size <em>and</em> training data together, roughly equally, for a given compute budget. A 70B-parameter model trained on 1.4 trillion tokens outperformed a 280B-parameter model trained on fewer tokens at the same total compute cost. Smaller model, more data, better result. Scale both dimensions together.</p><p>What do you get at the end of all this? A <strong>base model</strong>. And this is the part that surprises people who haven&#8217;t seen one.</p><p>Type &#8220;What is the capital of France?&#8221; into a raw pretrained model and it continues the text. Maybe it writes out a geography quiz &#8212; &#8220;What is the capital of France? What is the capital of Germany? What is the capital of Italy?&#8221; Maybe it generates a fake Wikipedia article. Maybe it starts a trivia show transcript. It has not answered your question. It has found the most probable continuation of your prompt, given everything it absorbed during training.</p><p>The base model has absorbed more text than any human could read in a thousand lifetimes. The co-occurrence patterns of the entire digitised corpus of human writing are in those weights. It knows facts, relationships, styles, concepts, code, chemistry, poetry, legal prose, and every other form in which humans have arranged words.</p><p>It was trained to <em>continue</em>, not to <em>respond</em>. Ask it a direct question and it treats the question as the opening line of some text pattern &#8212; one it will extend in whatever direction seems most probable. It has no concept of &#8220;you asked me something and I should answer it.&#8221;</p><p>Pretraining gives a model knowledge. It does not give it manners, opinions, or any idea what you want from it.</p><h2>Instruction tuning</h2><p>Step one of making a base model useful: show it what &#8220;useful&#8221; looks like.</p><p>Collect thousands of demonstration pairs. A human writer sits with a prompt &#8212; &#8220;Explain the difference between supervised and unsupervised learning in plain English,&#8221; &#8220;Write a polite email declining this meeting invitation,&#8221; &#8220;Debug this Python function&#8221; &#8212; and writes the ideal response. Then fine-tune the pretrained base model on these (prompt, response) pairs using the same next-token objective, now applied to curated demonstrations rather than the open web.</p><p>This is <strong>supervised fine-tuning</strong>, or SFT. It is plain supervised learning &#8212; the same paradigm that has been in the machine learning textbooks since the 1980s. What&#8217;s new is only what it&#8217;s being applied to.</p><p>The headline result from <a href="https://arxiv.org/abs/2203.02155">Ouyang et al., 2022</a> &#8212; the InstructGPT paper &#8212; is still worth stating plainly: a 1.3 billion-parameter model, fine-tuned on human-written instruction-following demonstrations, was <em>preferred</em> by human evaluators over a raw 175 billion-parameter GPT-3. One percent of the parameters. Preferred.</p><p>Sit with that. The quality of the training signal matters more than raw scale. A carefully curated set of demonstrations of what &#8220;helpful answering&#8221; looks like is worth more, for the specific goal of being helpful, than a hundred times more parameters trained on unstructured internet text. The base model knows more. The instruction-tuned model is more useful. These are different things.</p><p>SFT teaches the <em>shape</em> of a helpful answer: addressed to the question asked, reasonably structured, proportionate in length, appropriate in tone. These are learnable patterns. The base model already has all the relevant knowledge in its weights; SFT is teaching it to retrieve and present that knowledge in a particular format.</p><p>Here&#8217;s the failure mode, and it matters for understanding everything that comes next.</p><p>SFT shows the model what good answers look like. It gives no mechanism for the model to evaluate, <em>at generation time</em>, which of two candidate continuations is more accurate, more honest, or less likely to cause harm. The model learned to imitate the shape of correct answers; it did not learn to <em>prefer</em> correctness over fluency when the two conflict. A confidently phrased wrong answer and a confidently phrased right answer can look identical from a format standpoint. SFT cannot distinguish them.</p><p>Teaching consistent refusals is especially brittle. To get a model to reliably refuse a class of harmful requests via SFT, you need human-written refusals for every phrasing variant you can anticipate. You will miss variants. The model has no general theory of harm. It has only pattern-matching against the phrasings it saw. Change the phrasing, add a fictional framing, ask in a different language, and the refusal can fail.</p><p>The deep limitation is this: SFT can teach what a good answer looks like, but it cannot teach which of two candidate answers is <em>better</em>. For that, you need to know something about <em>better</em> that you didn&#8217;t encode in any single example. You need preferences.</p><p>SFT teaches the shape of a good answer. It has no way to choose between two good shapes.</p><h2>From demonstrations to preferences</h2><p>The move that follows from SFT&#8217;s failure: what humans can do <em>faster</em> than writing demonstrations is ranking them.</p><p>Show a rater two model responses to the same prompt &#8212; response A and response B &#8212; and ask which is better. They can answer in seconds. Writing a response from scratch takes minutes. This means you can collect preference labels at much higher volume than demonstrations, and the preference label contains a different kind of information: not &#8220;here is the target,&#8221; but &#8220;this is closer to the target than that.&#8221;</p><p>Scale up the preference collection. Collect hundreds of thousands of (prompt, response A, response B, human ranking) tuples. Train a small auxiliary model &#8212; the <strong>reward model</strong> &#8212; to predict the human rankings: given a prompt and a response, output a scalar score. Then use <strong>reinforcement learning</strong> (specifically PPO) to push the language model toward generating responses the reward model scores highly.</p><p>This is <strong>RLHF</strong> &#8212; reinforcement learning from human feedback. <a href="https://arxiv.org/abs/2203.02155">Ouyang et al., 2022</a> used it as the third stage of the InstructGPT pipeline: pretraining &#8594; SFT &#8594; RLHF. ChatGPT&#8217;s characteristic tone &#8212; helpful, reliably cautious about harmful requests, good at hedging uncertainty, consistent about refusals &#8212; comes almost entirely from this stage.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!CtE_!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1c8d7d72-6a41-44dc-8429-be45acad45d1_1600x631.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!CtE_!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1c8d7d72-6a41-44dc-8429-be45acad45d1_1600x631.png 424w, https://substackcdn.com/image/fetch/$s_!CtE_!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1c8d7d72-6a41-44dc-8429-be45acad45d1_1600x631.png 848w, https://substackcdn.com/image/fetch/$s_!CtE_!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1c8d7d72-6a41-44dc-8429-be45acad45d1_1600x631.png 1272w, https://substackcdn.com/image/fetch/$s_!CtE_!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1c8d7d72-6a41-44dc-8429-be45acad45d1_1600x631.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!CtE_!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1c8d7d72-6a41-44dc-8429-be45acad45d1_1600x631.png" width="1456" height="574" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/1c8d7d72-6a41-44dc-8429-be45acad45d1_1600x631.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:574,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!CtE_!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1c8d7d72-6a41-44dc-8429-be45acad45d1_1600x631.png 424w, https://substackcdn.com/image/fetch/$s_!CtE_!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1c8d7d72-6a41-44dc-8429-be45acad45d1_1600x631.png 848w, https://substackcdn.com/image/fetch/$s_!CtE_!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1c8d7d72-6a41-44dc-8429-be45acad45d1_1600x631.png 1272w, https://substackcdn.com/image/fetch/$s_!CtE_!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1c8d7d72-6a41-44dc-8429-be45acad45d1_1600x631.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"><em>RLHF in five stages: the model samples two responses, a human ranks them, a reward model learns to predict the ranking, and PPO updates the policy to maximise that reward. DPO collapses the loop &#8212; the preference becomes a loss applied directly on the policy, skipping both the reward model and PPO. Made with <a href="https://github.com/apiad/tesserax">tesserax</a>.</em></figcaption></figure></div><p>The deepest shift over SFT: the model now has feedback about <em>direction</em>, not just target. SFT says &#8220;produce something like this example.&#8221; RLHF says &#8220;of the things you just produced, this kind is better than that kind &#8212; adjust accordingly.&#8221; A direction is a richer signal than a target. It can propagate to novel situations no demonstration ever covered.</p><p>The practical problem is that RLHF is a genuine engineering challenge. PPO is unstable. The reward model can be <strong>gamed</strong>: the policy learns to produce outputs that score highly on the reward model without actually being better, because the reward model is an imperfect proxy for true quality. Over long training runs, the policy finds exploitable features in the reward model and optimises for those rather than for what humans actually wanted. Balancing the RL update against the pretrained base (so the model doesn&#8217;t drift into incoherence while chasing reward) requires careful tuning. RLHF works, but it&#8217;s expensive, brittle, and hard to reproduce without a dedicated ML infrastructure team behind it.</p><p><a href="https://arxiv.org/abs/2305.18290">Rafailov et al., 2023</a> found something that, in retrospect, looks almost obvious: you can skip the reward model entirely.</p><p>The paper, &#8220;Direct Preference Optimization: Your Language Model is Secretly a Reward Model,&#8221; makes a mathematical observation. The preference-fitting problem that RLHF solves via a reward model + PPO can be reformulated as a classification loss directly on the language model policy. Given a preferred response and a dispreferred response to the same prompt, you want the model to be more likely to produce the preferred one. You don&#8217;t need a separate reward model to express that preference. You don&#8217;t need PPO to optimise it. The preference is a loss; the loss can be minimised directly on the policy.</p><p><strong>DPO</strong> is computationally lighter and far easier to get working. The abstract says it &#8220;eliminates the need for sampling from the LM during fine-tuning or performing significant hyperparameter tuning.&#8221; In practice, the gap between &#8220;has the resources of a frontier lab&#8221; and &#8220;fine-tuned a model that actually behaves well&#8221; narrowed substantially after DPO. Within a year of publication, it became the default alignment approach for most open-weight instruct models &#8212; the Llama family, Mistral, Qwen, and most of their derivatives. When you read that a model is &#8220;instruction-tuned&#8221; in 2024 or later, DPO or one of its descendants is almost always in the pipeline.</p><p>Both RLHF and DPO require human preference labels. In 2022-23, this was tractable. By 2024, at frontier scale, it was a real bottleneck. You need raters capable of judging quality on maths, code, medicine, science. You can&#8217;t hire enough such raters to keep pace with the rate at which models can generate candidate outputs.</p><p>The field&#8217;s response was predictable in retrospect: use the models themselves.</p><p><strong>RLAIF</strong> (<a href="https://arxiv.org/abs/2309.00267">Lee et al., 2023</a>) replaces human raters with a strong language model as the preference oracle. Head-to-head comparisons with RLHF showed that AI-generated preference labels are competitive with human ones on summarisation and dialogue. The reward model trained on AI labels performs comparably to the one trained on human labels. The human rater is no longer in the loop.</p><p><strong>Constitutional AI</strong> (<a href="https://arxiv.org/abs/2212.08073">Bai et al., 2022</a>, Anthropic) does something more principled. Write a list of principles &#8212; a constitution &#8212; stating what the model should and shouldn&#8217;t do. Ask the model to critique its own outputs against those principles and revise them. The critiques and revisions become training data. The RL stage uses the model&#8217;s own evaluations as the reward signal. Human preference labelling is replaced by explicit normative reasoning: the model has to argue about whether its outputs satisfy the stated principles, not just produce outputs that pattern-match to human-labelled examples.</p><p>The logic extends one step further. If models can generate reliable preference labels, can they generate <em>training data</em> directly? By 2024-25, the answer in a widening range of domains was yes. Maths problems with worked solutions. Code problems paired with passing test suites. Instruction-following demonstrations written by large models to train smaller ones &#8212; the distillation pipeline, where a 70B model generates training examples that improve a 7B model, and the better 7B model feeds the next iteration. By 2025, a substantial fraction of the data used to train frontier models isn&#8217;t scraped from the web. It&#8217;s generated by earlier versions of the models themselves.</p><p>Ilya Sutskever, speaking at <a href="https://www.youtube.com/watch?v=WGgDZOr1ph4">NeurIPS 2024</a>: <em>&#8220;Pre-training as we know it will end. Data is the fossil fuel of AI. We have but one internet.&#8221;</em> The scaling curve that had defined the field since Kaplan 2020 was visibly flattening. The field didn&#8217;t slow down. The growth frontier moved: from bigger pretraining to better post-training. The headline AI announcements of 2024 were not &#8220;we trained a bigger model on more of the internet.&#8221; They were &#8220;we trained a better model by using our previous models to generate, evaluate, and curate the training signal.&#8221;</p><p>The sharpest shift in AI in 2023-24 wasn&#8217;t a bigger model. It was figuring out how to use models to train better models.</p><h2>A new axis</h2><p>There is a 2024-25 discovery that changes the picture in a qualitatively different way. Not a refinement of post-training preference optimisation. Something new.</p><p>RL doesn&#8217;t just align models. It can teach them to <em>think</em>.</p><p>The observation that sets it up: language models already have scratch space. Their output is text; nothing prevents them from writing intermediate reasoning steps before writing a final answer. Chain-of-thought prompting &#8212; asking a model to &#8220;think step by step&#8221; &#8212; has been known since <a href="https://arxiv.org/abs/2201.11903">2022</a> to improve performance on reasoning tasks. The model writes out intermediate steps, and those steps help it arrive at a better final answer.</p><p>But chain-of-thought as a <em>prompting technique</em> has a persistent problem. The intermediate steps are generated by the same forward pass as the final answer. You can ask the model to think out loud, but you can&#8217;t verify that the scratchpad is doing reasoning work rather than <em>performing</em> reasoning for the reader. A model that writes plausible-sounding intermediate steps that happen to be wrong, then arrives at a wrong final answer, has not improved by being asked to show its work. The steps are decorative.</p><p>The <a href="https://openai.com/index/learning-to-reason-with-llms/">o-series models from OpenAI</a> in late 2024 made a conceptually simple training move: use RL where the reward is the correctness of the <em>final answer</em>, and leave the intermediate chain of thought entirely unsupervised. The model can write whatever it wants in the scratchpad. The only signal is whether the final answer is right.</p><p>What emerged from training was not what anyone programmed in. The model learned, without any explicit supervision of the intermediate steps, to <em>use</em> the scratchpad as actual working memory. Backtracking when an approach failed. Trying alternate formulations when one hit a wall. Verifying intermediate results before continuing. Restarting from scratch, several steps back, when it found an error in something it had already written. None of these behaviours appeared in labelled training examples. They fell out of the objective: over enough RL iterations, the training process found that careful scratchpad use led to more correct final answers, and it reinforced that.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!gXIs!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9da4f71b-6e76-41ad-93c4-9db3b062e8d3_1600x659.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!gXIs!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9da4f71b-6e76-41ad-93c4-9db3b062e8d3_1600x659.png 424w, https://substackcdn.com/image/fetch/$s_!gXIs!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9da4f71b-6e76-41ad-93c4-9db3b062e8d3_1600x659.png 848w, https://substackcdn.com/image/fetch/$s_!gXIs!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9da4f71b-6e76-41ad-93c4-9db3b062e8d3_1600x659.png 1272w, https://substackcdn.com/image/fetch/$s_!gXIs!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9da4f71b-6e76-41ad-93c4-9db3b062e8d3_1600x659.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!gXIs!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9da4f71b-6e76-41ad-93c4-9db3b062e8d3_1600x659.png" width="1456" height="600" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/9da4f71b-6e76-41ad-93c4-9db3b062e8d3_1600x659.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:600,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!gXIs!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9da4f71b-6e76-41ad-93c4-9db3b062e8d3_1600x659.png 424w, https://substackcdn.com/image/fetch/$s_!gXIs!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9da4f71b-6e76-41ad-93c4-9db3b062e8d3_1600x659.png 848w, https://substackcdn.com/image/fetch/$s_!gXIs!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9da4f71b-6e76-41ad-93c4-9db3b062e8d3_1600x659.png 1272w, https://substackcdn.com/image/fetch/$s_!gXIs!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9da4f71b-6e76-41ad-93c4-9db3b062e8d3_1600x659.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"><em>Only the final answer receives reward. The scratchpad is entirely unsupervised &#8212; backtracking and verification fall out of the training loop, not from labelled examples of good reasoning. Made with <a href="https://github.com/apiad/tesserax">tesserax</a>.</em></figcaption></figure></div><p><strong>DeepSeek-R1</strong> (<a href="https://arxiv.org/abs/2501.12948">arXiv:2501.12948</a>, January 2025, open weights) replicated the result outside a closed lab. Pure RL on reasoning trajectories; no human-labelled chains required. The paper documents what they call the &#8220;aha moment&#8221; in training: a specific point where the model spontaneously began verifying its own intermediate steps and restarting when they failed. Not because the training data contained this behaviour as a pattern. Because the reward for correct final answers made careful intermediate reasoning instrumentally useful, and the RL training loop discovered it.</p><p>I think DeepSeek-R1 is the clearest published demonstration of something the field had suspected but hadn&#8217;t proven at scale: that reasoning, as a <em>behaviour</em>, is trainable from a simple outcome-based reward signal. You don&#8217;t need human annotations of good reasoning traces. You don&#8217;t need to supervise the scratchpad. You need to reward the right answer and run enough RL. The reasoning emerges.</p><p>Three things follow from this that are worth naming separately.</p><p><strong>Test-time compute</strong> is a new scaling axis. Pretraining scales with more data and more parameters &#8212; you pay at training time and get a more capable model. Reasoning models scale with more <em>inference compute</em> &#8212; you pay at generation time, by thinking longer, and get a better answer on the current problem. A smaller reasoning model that thinks for ten seconds can match or outperform a larger standard model answering in one pass. These axes are complementary, not competing. You can now trade training-time capability against inference-time deliberation, and that tradeoff is explicit and controllable in a way it wasn&#8217;t before.</p><p><strong>Diagnosability</strong> changes the failure mode. A standard model that gets a maths problem wrong gives you a wrong number. A reasoning model that gets it wrong gives you a chain of thought &#8212; readable, traceable, inspectable at every step. You can see exactly where the logic went off course: which intermediate claim was false, which inference was unwarranted, at what point the reasoning was solid and where it broke down. For systems where the reliability of the output matters &#8212; and in agent pipelines, it almost always does &#8212; this is the property that makes the difference. The failure is visible. Visible failures are debuggable. Black-box failures are not.</p><p>And the arc closes. The whole story of this piece &#8212; n-grams, embeddings, pretraining, instruction tuning, preference learning, and now reasoning &#8212; is one continuous story of making the training signal more specific. N-gram models encode raw co-occurrence statistics: this is what tends to follow that. Embeddings compress those statistics into geometry that neural networks can use. Pretraining scales that compression to the entire digitised corpus of human writing. Instruction tuning adds: here is what a helpful answer looks like. RLHF and DPO add: here is what <em>better</em> looks like, relative to what you just produced. RLAIF and synthetic data close the loop so models can teach each other. Reasoning models add the final turn: here is what <em>thinking carefully</em> looks like. Not by showing examples of good reasoning. By rewarding the right final answer, and letting the model figure out the rest.</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!lsXL!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb53cda4d-3a30-4026-b518-9c551c891af5_1600x296.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!lsXL!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb53cda4d-3a30-4026-b518-9c551c891af5_1600x296.png 424w, https://substackcdn.com/image/fetch/$s_!lsXL!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb53cda4d-3a30-4026-b518-9c551c891af5_1600x296.png 848w, https://substackcdn.com/image/fetch/$s_!lsXL!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb53cda4d-3a30-4026-b518-9c551c891af5_1600x296.png 1272w, https://substackcdn.com/image/fetch/$s_!lsXL!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb53cda4d-3a30-4026-b518-9c551c891af5_1600x296.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!lsXL!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb53cda4d-3a30-4026-b518-9c551c891af5_1600x296.png" width="1456" height="269" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/b53cda4d-3a30-4026-b518-9c551c891af5_1600x296.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:269,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!lsXL!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb53cda4d-3a30-4026-b518-9c551c891af5_1600x296.png 424w, https://substackcdn.com/image/fetch/$s_!lsXL!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb53cda4d-3a30-4026-b518-9c551c891af5_1600x296.png 848w, https://substackcdn.com/image/fetch/$s_!lsXL!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb53cda4d-3a30-4026-b518-9c551c891af5_1600x296.png 1272w, https://substackcdn.com/image/fetch/$s_!lsXL!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb53cda4d-3a30-4026-b518-9c551c891af5_1600x296.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a><figcaption class="image-caption"><em>Seven paradigms, one direction. Each step exists because the previous step&#8217;s training signal wasn&#8217;t specific enough &#8212; and each one adds a kind of feedback the previous one couldn&#8217;t carry. Made with <a href="https://github.com/apiad/tesserax">tesserax</a>.</em></figcaption></figure></div><p>Reasoning models aren&#8217;t smarter than other models. They&#8217;re models that have learned to spend their intelligence more deliberately.</p><p>Each step in this story makes the feedback signal richer. Each step exists because the previous step&#8217;s signal wasn&#8217;t specific enough.</p><p>The direction is clear: we keep finding more precise ways to tell models what we want, and they keep using it.</p><p>Until next time, <strong>stay curious</strong>.</p><div><hr></div><p><em>The second edition of <a href="https://apiad.gumroad.com/l/ai">Mostly Harmless AI</a> goes deeper on what these training paradigms make possible in practice &#8212; why a reasoning model behaves differently as an agent core, what alignment actually means when you&#8217;re building a system rather than evaluating a benchmark, and the chapters that didn&#8217;t fit any blog post. The whole book is also available to <a href="https://books.apiad.net/books/mhai/">read online for free</a> in a reader I built and am rather fond of: dark mode, font controls, progress tracking, offline support, the works.</em></p><p><em>If you want everything I&#8217;ve written and everything I&#8217;m going to write, the <a href="https://apiad.gumroad.com/l/compendium">Compendium</a> bundles it all &#8212; one purchase, in perpetuity.</em></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://store.apiad.net/l/compendiium&quot;,&quot;text&quot;:&quot;Check it out&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://store.apiad.net/l/compendiium"><span>Check it out</span></a></p><p></p>]]></content:encoded></item><item><title><![CDATA[The Anatomy of a Linguistic AI Agent]]></title><description><![CDATA[From single-turn LLM to long-horizon autonomous AI.]]></description><link>https://blog.apiad.net/p/the-anatomy-of-a-linguistic-ai-agent</link><guid isPermaLink="false">https://blog.apiad.net/p/the-anatomy-of-a-linguistic-ai-agent</guid><dc:creator><![CDATA[Alejandro Piad Morffis]]></dc:creator><pubDate>Mon, 04 May 2026 17:17:05 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!t-3D!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2507af90-0128-4576-8448-2f46a50ea2c8_1376x768.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!t-3D!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2507af90-0128-4576-8448-2f46a50ea2c8_1376x768.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!t-3D!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2507af90-0128-4576-8448-2f46a50ea2c8_1376x768.png 424w, https://substackcdn.com/image/fetch/$s_!t-3D!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2507af90-0128-4576-8448-2f46a50ea2c8_1376x768.png 848w, https://substackcdn.com/image/fetch/$s_!t-3D!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2507af90-0128-4576-8448-2f46a50ea2c8_1376x768.png 1272w, https://substackcdn.com/image/fetch/$s_!t-3D!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2507af90-0128-4576-8448-2f46a50ea2c8_1376x768.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!t-3D!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2507af90-0128-4576-8448-2f46a50ea2c8_1376x768.png" width="1376" height="768" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/2507af90-0128-4576-8448-2f46a50ea2c8_1376x768.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:768,&quot;width&quot;:1376,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:2080600,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://blog.apiad.net/i/196445058?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2507af90-0128-4576-8448-2f46a50ea2c8_1376x768.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!t-3D!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2507af90-0128-4576-8448-2f46a50ea2c8_1376x768.png 424w, https://substackcdn.com/image/fetch/$s_!t-3D!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2507af90-0128-4576-8448-2f46a50ea2c8_1376x768.png 848w, https://substackcdn.com/image/fetch/$s_!t-3D!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2507af90-0128-4576-8448-2f46a50ea2c8_1376x768.png 1272w, https://substackcdn.com/image/fetch/$s_!t-3D!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2507af90-0128-4576-8448-2f46a50ea2c8_1376x768.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"><em>After Rembrandt&#8217;s <a href="https://en.wikipedia.org/wiki/The_Anatomy_Lesson_of_Dr._Nicolaes_Tulp">&#8220;The Anatomy Lesson of Dr. Nicolaes Tulp&#8221;</a> (1632); generated with Nano Banana 2.</em></figcaption></figure></div><p>You have used a language model in a chat box. You typed a question, you got an answer, you closed the tab. The whole interaction lasted under a minute. The model did not remember you the next time you opened the page.</p><p>You have also seen, or read about, agents that work for hours. A coding agent that ships a feature overnight. A research agent that pulls together a hundred sources before breakfast. They plan, they call tools, they back out of dead ends, they hand you something you can use.</p><p>Both are the same model. Same neural network. Same forward pass. The only thing that changed is what&#8217;s wrapped around it.</p><p>This essay is the bridge. The architecture that turns the first thing into the second is not a single insight. It is a stack, a small number of layers, each one added in response to a failure mode of the previous layer. By the end you should be able to point at any agent doing real work in 2026 &#8212; coding, research, customer ops &#8212; and name which layer is doing the heavy lifting at any given moment.</p><p>Some of those layers are old. The fundamental one was published in 2022, before ChatGPT shipped. Some are very new. One was named eighteen months ago and is still settling. None of them, individually, is hard to follow. The trick is seeing them as a sequence, each fix opening the door for the next.</p><p>If you want a number to anchor where we start: METR has been measuring the time-horizon of frontier agents, and a language model on its own, with no scaffolding around it, sustains roughly a few minutes of human-equivalent work at 50% reliability. The equivalent of writing a competent meeting summary.</p><p>That is the floor.</p><blockquote><p><em>Every post on the blog this month is on the theme of agent reliability, anchored on the second edition of <a href="https://apiad.gumroad.com/l/ai">Mostly Harmless AI</a>, where the engineering details that don&#8217;t fit a blog post live. You can also <a href="https://books.apiad.net/books/mhai/">read the whole book online for free</a> in a custom reader I built. More at the end.</em></p></blockquote><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://blog.apiad.net/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://blog.apiad.net/subscribe?"><span>Subscribe now</span></a></p><h2>The base case</h2><p>Strip everything away first. No agent, no tools, no skills, no harness. Just the model.</p><p>A language model, in the strictly minimal sense, is a function from a string to a string. You hand it a sequence of tokens. It hands you back a sequence of tokens. One forward pass through the network. The input goes in at one end, the output comes out the other, one token at a time until the STOP token is generated, and that is the entire interaction. No state is held between calls. The next time you ask the same model the same question, it has no idea you have ever spoken before.</p><p>Inside that one shot, the model delivers. It will answer, draft, summarize, translate, brainstorm. Give it a good piece of context and a clear ask, and the response that comes back will, in my experience, often be useful enough to ship as-is. This is the experience that made everyone notice in late 2022. Open a chat, ask anything, get something back you can use. People called it magic at the time. Most of them still do, even though <em>useful function with no memory</em> is the more honest description.</p><p>But notice what it cannot do &#8212; which is most things you would ever want from an agent.</p><p>It cannot verify its own output. The same forward pass that produced the answer is the only one available to check it. There is no second opinion, no quick lookup, no <em>let me try it and see what happens</em>. The model is committed to whatever came out the first time.</p><p>It cannot look anything up. Whatever facts it has are baked into the weights from training, frozen at some cutoff date. If you ask about today&#8217;s news, or your codebase, or an internal company document, the model has nothing. And worse, it will frequently invent something plausible-sounding because completing a confident sentence is what it was trained to do.</p><p>It cannot act on the world. It cannot write to a file, send an email, call an API, run a command. It cannot do anything that has a side effect outside the chat window. The only thing it can produce is more text.</p><p>Inside the four walls of the context window, the base model is the most capable text engine the field has ever built. A single chat box was enough to launch the largest consumer product of the decade. Outside those walls, it is inert.</p><p>METR&#8217;s measurements of an unaugmented model &#8212; no tools, no loop, no scaffolding &#8212; put the time horizon at something on the order of minutes of human-equivalent work. Minutes. That is the starting capability. Everything else in this essay is a way of making those minutes compound.</p><h2>The first leap</h2><p>The first real agent paradigm is older than ChatGPT.</p><p>In October 2022, a team at Princeton and Google published <em>ReAct: Synergizing Reasoning and Acting in Language Models</em>. It went out about six weeks before the ChatGPT launch that made the public notice agents existed at all. Every working agent today &#8212; Claude Code, Codex, Gemini CLI, the dozens of research agents and customer-ops agents shipping this year &#8212; is some refinement of the loop that paper introduced.</p><p>Here is the setup. An agent operates in some environment: a Wikipedia API, a household simulator, a web shop, your codebase. The environment offers an <em>action space</em>, the set of things the agent is allowed to do. Call it <code>A</code>. A policy maps the current context to the next action: given everything the agent knows, what does it do next? With nothing else, the policy has to map a long, noisy trajectory of past observations directly to the right next move. This is brittle. The longer the task runs, the more lost the model gets.</p><p>ReAct&#8217;s move is to enlarge the action space. The new action space is <code>A</code> plus <code>L</code>, where <code>L</code> is the space of natural language. A &#8220;thought&#8221; is an action in <code>L</code>, the agent pausing to write itself a sticky note before reaching for the next tool. It does not change the world, it changes the <em>context</em>. The next action is conditioned on a context that now includes the model&#8217;s own reasoning about what just happened.</p><p>The paper spells out what thoughts are actually for, and the list is concrete, not mystical. Decomposing the goal into a plan. Injecting commonsense the environment does not supply. Extracting the relevant signal from a noisy observation. Tracking progress and noticing when a subgoal is done. Handling exceptions when something breaks. Five jobs.</p><p>Why this beats the alternatives is where the paper earns its place. Chain-of-thought prompting, the prior art, has the model reason in a closed loop inside its own head, with no contact with the world. The paper&#8217;s own ablation on the HotpotQA benchmark is brutal: chain-of-thought hallucinates in 14% of its <em>successes</em> and 56% of its failures. Acting alone, calling tools without thought, is grounded in the world but loses the global plan after a few steps. ReAct synthesizes them. On the same task, ReAct hallucinates in 6% of successes. Less than half. Both halves of the loop have to be there.</p><p>One concrete anchor before we move on. ReAct&#8217;s HotpotQA action space, the entire set of things the agent could do, was exactly three actions: <code>search[entity]</code>, <code>lookup[string]</code>, <code>finish[answer]</code>. Three. The first working agent paradigm operated on three tools. Hold that number.</p><p>The paper closes with the line that becomes the engine for the rest of this essay. <em>&#8220;Complex tasks with large action spaces require more demonstrations to learn well, which unfortunately can easily go beyond the input length limit of in-context learning.&#8221;</em> In plain English: more capability needs more action descriptions, which need more context, which we do not have. Every layer that follows is the field iteratively solving exactly that bottleneck.</p><p>METR step: a model wrapped in this loop moves from minutes to tens of minutes on bounded tasks.</p><h2>Tools</h2><p>So how do you fix ReAct&#8217;s bottleneck, the one the paper named in its own conclusion?</p><p>The first, most obvious answer: give the agent more actions to take. If <code>A</code> was the original action space and ReAct enlarged it to <code>A &#8746; L</code>, the next move is to make <code>A</code> itself bigger.</p><p>That is what a <em>tool</em> is. A tool is a function the model can call. It has a name, a typed schema for its arguments, and a return value. The model writes a tool call into the trajectory the same way it writes a thought. Except this one has a side effect on the world. The harness picks it up, runs the function, drops the return value back into the context. The next turn of the loop sees the result and decides what to do next.</p><p>The loop is unchanged. Same thinking, same acting, same context-grows-by-a-turn shape ReAct described. The difference is what the agent is allowed to do.</p><p>ReAct, recall, ran on three tools: <code>search</code>, <code>lookup</code>, <code>finish</code>. That was the entire menu. Claude Code in 2026 ships with more than twenty: read a file, edit a file, run a shell command, search the codebase, fetch a URL, spawn a subagent, take a screenshot, schedule a future tick, and so on. Each one is just a function with a schema. Each one expands the set of things the agent can do without changing one line of the underlying loop.</p><p>This is the part that surprised me, the first time I sat with it. The chatbot you typed at in 2022 and the agent that wrote your test suite this morning share one loop. What changed is the tool catalog. Same loop. Bigger menu.</p><p>That observation is the unsexy version of why tool-building is now a discipline of its own. Every capability you add to an agent &#8212; search the web, read a Slack channel, hit your billing API, deploy to staging &#8212; is just another function with a schema. The architecture does not change. The leverage is entirely in <em>which</em> tools you build and how you describe them to the model.</p><p>The design discipline that emerges is short to state and brutal to follow. Tools should be <strong>few</strong>, <strong>sharp</strong>, and <strong>self-describing</strong>. Few, because every tool you add takes up tokens in the system prompt and a slot in the model&#8217;s attention. Sharp, because a tool that does seven things is one the model will use wrong six times out of seven. Self-describing, because the model only learns to use a tool from its name, its docstring, and its argument schema. There is no other channel. (More on this on Thursday. Anthropic&#8217;s recent guidance on writing tools for agents is the cleanest summary of this craft I have read.)</p><p>METR step: a model with the right toolkit moves from tens of minutes to hours of bounded work.</p><h2>Skills</h2><p>Tools fix half of ReAct&#8217;s bottleneck. They expand the action space.</p><p>The other half, recall, is the input-length limit. Every tool you add costs tokens in the system prompt to describe: name, schema, when to use it, what its return value looks like. Add fifty tools that way and the system prompt is a small book. The model is reading every single tool description on every single turn, even when ninety-five of those turns have nothing to do with that tool.</p><p>Skills are the move that fixes this.</p><p>Anthropic shipped the idea in late 2024 and the rest of the field has been catching up since. A skill is, mechanically, almost embarrassingly simple. It is a markdown file. It has a name, a one-line description of when it applies, and a body that explains how to do the thing. The agent does not read it on startup. The agent reads it <em>on demand</em>: when, in the middle of a task, it notices a description that matches what it is about to do.</p><p>So instead of jamming <em>and here are seventeen other things you might want to do</em> into the system prompt, you put each of those things in its own file with a one-liner that names when to consult it. The system prompt stays small. The latent capability of the agent becomes, for practical purposes, unbounded. Every skill you write is one more thing it can do, but only when it actually needs to.</p><p>I find the deeper shift here more interesting than the engineering. The agent is reading documentation written for it. Not training data ingested months ago and frozen into weights. Documentation. Authored in plain prose. Versioned in git. Like the laminated procedure sheet a mechanic posts above a workbench for a job done once a month. Improvable by the same process that improves any document: someone notices the agent doing the wrong thing, edits the file, the next agent reads the new version and gets it right.</p><p>This is self-extension by reading, not by retraining. A new capability used to require a new training run, or at minimum a new fine-tune. Now it requires a markdown file. The cost of teaching an agent to do one more thing has fallen from days of GPU time to the minutes it takes to write a paragraph, and almost nobody outside the people building agentic systems has noticed.</p><p>The system prompt stays small. The set of things the agent can do, on demand, grows without bound. The two used to be the same number.</p><p>METR step: skills, more than anything else in this list, are what made the time horizon stop being bounded by how cleverly you wrote the system prompt.</p><h2>MCP</h2><p>For most of 2024, every agentic harness invented its own way to attach the same set of capabilities. You wrote a tool for Claude Code; it would not work in Codex. You wrote a skill for one harness; another harness could not see it. You hooked your billing API into one agent and had to do the same wiring four more times for the others. Every integration was bespoke. Nothing composed.</p><p>The Model Context Protocol (MCP) is the field&#8217;s answer to that. Anthropic shipped the spec in late 2024. By the end of 2025 every serious agent harness, including the ones not built by Anthropic, had adopted some version of it. Codex talks MCP. So does Claude Desktop, and Cursor, and a long list of others. This is one of those quiet moments where an industry just... agrees on a wire format, and a year later the world is different.</p><p>The architecture is three nouns. <strong>Hosts</strong> are the applications you actually use: Claude Desktop, Codex, Cursor. <strong>Clients</strong> live inside the host and talk to one server each. <strong>Servers</strong> are the things that actually expose capability: your codebase, your billing API, the Wikipedia search box from the ReAct paper four years ago.</p><p>What a server offers is the second triple: <strong>Resources</strong> (data the model can read), <strong>Prompts</strong> (workflow templates the user can invoke), and <strong>Tools</strong> (functions the model can call). Three nouns, again. The whole protocol is two threes.</p><p>The point is portability. A skill or tool you wrote once, against the protocol instead of against a specific harness, works everywhere. The lock-in moves out from under you. The agent ecosystem starts to compose the way the web did in the late 1990s. Not because someone planned it, but because everyone independently noticed it was cheaper to talk a shared protocol than to keep reinventing the connector layer.</p><p>Worth noticing what the spec foregrounds at the top of every chapter on tool calls: <strong>user consent</strong>. Capability requires permission. The protocol does not assume the model can do whatever a server exposes. It assumes the model has to ask, and the user has to answer. A small design choice with very large downstream consequences, and the reason the rest of this stack does not collapse into something nobody would let near their email.</p><p>METR step: not a step on the ladder, but a multiplier. The tools and skills from the last two sections now travel.</p><h2>Context engineering</h2><p>Add tools. Add skills. Add MCP. The agent can now do, in principle, almost anything you can describe in a prompt and a function. The trouble is what happens when it actually starts trying.</p><p>A long agent run accumulates context. Every observation from a tool call goes in. Every thought goes in. Every error message, every retry, every half-attempted plan that did not work goes in. After a few hours of work the context window is mostly <em>exhaust</em>: the trail of everything the agent tried, the great majority of which is no longer relevant to the next move. The model is searching for signal inside its own attic.</p><p>Karpathy named this <em>context engineering</em> in 2024, and the name stuck because the field had been doing it without a name for two years. Simon Willison wrote it up. LangChain made it a category. By 2026 it is a craft of its own: what to put in the context, when to summarize, what to evict, what to keep verbatim because the agent will need its exact wording later.</p><p>The central primitive in the discipline is <strong>compaction</strong>. At some threshold, typically 70% to 85% of the window, the agent stops, reads its own history, and rewrites it into a smaller form. <em>Here is what we were trying to do. Here are the decisions we made. Here is the state we are in. Here is the next move.</em> The compacted summary replaces the noisy trail. The agent keeps going on a fresh, smaller context with the salient bits intact.</p><p>The deeper move is that the agent now owns its own working memory in a way it never did inside a single ReAct loop. ReAct kept the entire history. Compaction lets the agent <em>curate</em> the history. A small change of grammar with a giant change of consequence.</p><p>Notice what this fixes. ReAct&#8217;s authors, in the same 2022 paper, named the dominant failure mode of their own system: <em>&#8220;the model repetitively generates the previous thoughts and actions, often failing to reason about what the proper next action to take should be and jump out of the loop.&#8221;</em> Translation: the agent gets stuck because its context is full of the same noise as the previous turn, so the next turn is the same noise plus a little more. That is a context problem. Context engineering is what stops it.</p><p>Without this layer, every previous layer eventually drowns. A hundred tools is useless if the agent&#8217;s context is so saturated it cannot find the right one. The five-thousand-word skill on how to handle a billing dispute is useless if the agent compacted it away on turn forty. Context engineering is the layer that makes the others <em>compound</em> over a long run instead of degrading into noise.</p><p>METR step: this is the layer that turns a few hours of focused agent work into a workday.</p><h2>The hierarchy of agency</h2><p>Stack the layers and the picture comes into focus. At 50% reliability on the METR time-horizon scale, a language model alone, with no scaffolding around it, sustains minutes of human-equivalent work. Wrap it in a ReAct loop with no tools, and that becomes tens of minutes. Add tools to ReAct, hours. Add skills and context engineering on top, a workday. Add an external loop above all of that, a fresh agent per turn on a clock with a journal handing state to itself, and the horizon stretches into days and weeks.</p><p>Stare at that ladder for a second. Each rung is the same model. What separates a chatbot from a coding agent that finishes a feature overnight is the scaffolding stacked around it. The frontier of what an agent can do in 2026 is set, almost entirely, by where you stop climbing.</p><p>Each layer has the same shape, in the abstract. Find the thing that bottlenecks the previous layer. Add a structure that lets the model offload that thing into the world, the way a machinist offloads a measurement into a caliper rather than holding it in memory. Into language, into tools, into files, into a clock. The model&#8217;s per-turn intelligence does not change. What changes is the time horizon over which that intelligence compounds.</p><p>The last rung is the one most people have not seen yet, and it is the one I have spent the last few months running on my own infrastructure. The trick is the same one. Take the bottleneck (the agent runs out of context before it runs out of work) and offload it. The new offload target is <em>the file system</em>. The new clock is <em>cron</em>. Past-Claude writes a markdown file at the end of its turn that says what it did and what comes next. A timer fires some hours later. Future-Claude wakes into a fresh context, reads the file, makes the next move, writes the file, exits. The continuity is in the file, not in the model.</p><p>That is the entire primitive. A markdown file and a timer. Past self tells future self what to do.</p><p>What you get from it is hard to describe to someone who has not run one. The agent works on your stuff for weeks at a time. It writes new jobs for itself. It reads the documentation about its own substrate and uses the tools that documentation describes. It makes mistakes (one in five runs produces something I have to throw out) but the mistakes are caught by the same kind of boring engineering that catches mistakes in any other autonomous system. Audit log, lock registry, archive-only deletion, every state change committed to git before the next turn starts.</p><p>The point of saying this out loud is that the same trick keeps working. Extend the action space; add a layer that compounds; let the previous layer drop the things it could not hold. The trick does not stop at hours. It does not stop at days. METR&#8217;s curve has been doubling every four months over the last two years. The 2027 projection is a working day. The 2028 projection is a working week.</p><p>Each doubling is one more scaffolding layer.</p><h2>The frontier is not the model</h2><p>Step back from all of it.</p><p>The architecture you&#8217;ve just walked through is <em>layered</em>. A language model at the core. ReAct around the model, turning tokens into actions. Tools around ReAct, expanding what those actions can be. Skills letting the agent pull capability from the file system instead of carrying it in the system prompt. MCP making everything portable. Context engineering keeping the whole thing from drowning in its own exhaust. An external loop on top of all that, when the work runs longer than a single context window can hold.</p><p>Every agent doing real work in 2026 &#8212; your coding agent, your research agent, the customer-ops bot answering your refund request, my private-tick agent running once an hour &#8212; has this shape. They differ in which tools they ship and which skills they read on demand. They do not differ in the shape of the stack. Once you can see the layers, you can see them everywhere.</p><p>So here is the closing claim, the techno-pragmatist version of what the article has been arguing the whole time. <strong>The frontier is not the model. It is the layers around it.</strong> And the entire stack is the field&#8217;s three-year answer to a single sentence in a single paper from October 2022 that named its own ceiling and dared the rest of us to climb past it.</p><p>One frontier worth flagging before I close. A competent agent can already write its own tools and skills on demand. That part is shipping today. The next move is teaching it, via tools and skills, to <em>detect by itself</em> when its current toolkit doesn&#8217;t cover what it&#8217;s trying to do, so it knows when to extend itself without being told. Self-extension that triggers itself. That is the live edge right now, and where the next few posts are headed.</p><p>The next post zooms in on the innermost layer the agent touches: the tools themselves, and what makes a tool safe enough to live inside a stack like this. That is a story for another Thursday.</p><p>Until next time, stay curious.</p><div><hr></div><p><em>If this is the worldview you want to take more seriously, the second edition of <a href="https://apiad.gumroad.com/l/ai">Mostly Harmless AI</a> (due May 25th) goes deep on the agentic stack we walked through here. Full chapters on context engineering and on the harness around the model, with the math, the case studies, and the parts that didn&#8217;t fit a blog post. You can also <a href="https://books.apiad.net/books/mhai/">read the whole book online for free</a> in a custom reader I built that I&#8217;m rather proud of: dark mode, font controls, progress tracking, offline support, the works.</em></p><p><em>If you want the whole catalog of everything I&#8217;ve written, plus everything I&#8217;m going to write, that&#8217;s <a href="https://apiad.gumroad.com/l/compendium">the Compendium</a>. One purchase, in perpetuity.</em></p>]]></content:encoded></item><item><title><![CDATA[AI Coding Agents, Deconstructed]]></title><description><![CDATA[The four hidden layers that separate tools that help form tools that hinder]]></description><link>https://blog.apiad.net/p/the-anatomy-of-ai-coding-agents</link><guid isPermaLink="false">https://blog.apiad.net/p/the-anatomy-of-ai-coding-agents</guid><dc:creator><![CDATA[Alejandro Piad Morffis]]></dc:creator><pubDate>Thu, 02 Apr 2026 13:40:59 GMT</pubDate><enclosure url="https://images.unsplash.com/photo-1751494203533-a837d1b536b6?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHw1MXx8bW9ua2V5JTIwdG95fGVufDB8fHx8MTc3NTEzNTg4Nnww&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://images.unsplash.com/photo-1751494203533-a837d1b536b6?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHw1MXx8bW9ua2V5JTIwdG95fGVufDB8fHx8MTc3NTEzNTg4Nnww&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://images.unsplash.com/photo-1751494203533-a837d1b536b6?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHw1MXx8bW9ua2V5JTIwdG95fGVufDB8fHx8MTc3NTEzNTg4Nnww&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080 424w, https://images.unsplash.com/photo-1751494203533-a837d1b536b6?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHw1MXx8bW9ua2V5JTIwdG95fGVufDB8fHx8MTc3NTEzNTg4Nnww&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080 848w, https://images.unsplash.com/photo-1751494203533-a837d1b536b6?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHw1MXx8bW9ua2V5JTIwdG95fGVufDB8fHx8MTc3NTEzNTg4Nnww&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080 1272w, https://images.unsplash.com/photo-1751494203533-a837d1b536b6?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHw1MXx8bW9ua2V5JTIwdG95fGVufDB8fHx8MTc3NTEzNTg4Nnww&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080 1456w" sizes="100vw"><img src="https://images.unsplash.com/photo-1751494203533-a837d1b536b6?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHw1MXx8bW9ua2V5JTIwdG95fGVufDB8fHx8MTc3NTEzNTg4Nnww&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080" width="4925" height="3238" data-attrs="{&quot;src&quot;:&quot;https://images.unsplash.com/photo-1751494203533-a837d1b536b6?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHw1MXx8bW9ua2V5JTIwdG95fGVufDB8fHx8MTc3NTEzNTg4Nnww&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:3238,&quot;width&quot;:4925,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;A relaxed monkey enjoys a sunny day.&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="A relaxed monkey enjoys a sunny day." title="A relaxed monkey enjoys a sunny day." srcset="https://images.unsplash.com/photo-1751494203533-a837d1b536b6?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHw1MXx8bW9ua2V5JTIwdG95fGVufDB8fHx8MTc3NTEzNTg4Nnww&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080 424w, https://images.unsplash.com/photo-1751494203533-a837d1b536b6?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHw1MXx8bW9ua2V5JTIwdG95fGVufDB8fHx8MTc3NTEzNTg4Nnww&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080 848w, https://images.unsplash.com/photo-1751494203533-a837d1b536b6?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHw1MXx8bW9ua2V5JTIwdG95fGVufDB8fHx8MTc3NTEzNTg4Nnww&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080 1272w, https://images.unsplash.com/photo-1751494203533-a837d1b536b6?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHw1MXx8bW9ua2V5JTIwdG95fGVufDB8fHx8MTc3NTEzNTg4Nnww&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"><em>I&#8217;m telling you, this is the future. AI agents will do aaaallll the work. </em>Photo by <a href="https://unsplash.com/@farzadfelfelian">Farzad Felfelian</a> on <a href="https://unsplash.com">Unsplash</a></figcaption></figure></div><p>You&#8217;ve been using AI coding agents for months. You&#8217;ve crafted elaborate system prompts. You&#8217;ve added a dozen skills. You&#8217;ve learned the dance of context window management. And somewhere around the third hour of work, something breaks. The agent starts forgetting things. Making wrong assumptions. Doing something close&#8212;but not quite&#8212;what you asked.</p><p>This isn&#8217;t a failure of the model. This is a failure of the system.</p><p>To be sure, better models make things easier. And models are getting better by the day. But no matter how good a model is, bad systems lead to bad outputs. Even the smartest people produce junk when fed with incorrect assumptions or given incomplete instructions.</p><p>In contrast, a good system with clear boundaries and explicit rules, that leaves the exact amount of flexibility necessary, makes creativity and productivity thrive.</p><p>You see this day and night in teams (of real humans) in every industry. It&#8217;s not often the smartest person in the room that solves the hard problem. It&#8217;s when you combine the right kinds of intelligence with the right kind of system that things click.</p><p>In this article, I want to make the case for a structured way to think about Large Language Model (LLM)-based agentic systems (mostly for coding, but also for knowledge work in general) that fixes some of the greatest pains I (and I sure most of you) have been facing when trying to scale AI-assisted workflows to professional levels.</p><p>It&#8217;s a system that puts the right constraints in the right places and leaves just enough space for creative exploration (or however you want to call what LLMs do when they hallucinate in your favor). It&#8217;s also a system that makes it clear you are in charge.</p><p>Everything an AI agent does happens inside a context window. System prompt, user input, tool results, skill injections&#8212;they all live there. The agent&#8217;s only mechanism for action is the ReAct (Reasoning + Acting) loop: think, call tools, observe results, repeat. Each cycle grows the context. Each skill activation injects more.</p><p>This creates a fundamental tension: context is power, but context is finite. Too little and the agent can&#8217;t connect the dots. Too much and the important stuff drowns. The gap between those two failure modes is narrow&#8212;and most agent frameworks ignore it entirely.</p><p>I&#8217;ll walk through why current systems fail, introduce a four-element framework for thinking about agentic architectures, show you how these principles apply across three domains, then present a vision for better AI harness engineering.</p><h2><strong>Part I - The Symptoms</strong></h2><p>To understand the problems we first need to understand how a standard agentic loop works. The typical architecture is what&#8217;s called a ReAct loop. The LLM runs in a loop that determines the next action given context, which can be read some files, ask the user, invoke a tool, inject a skill, etc. When the agent decides no more actions are necessary, the loop ends and the user is given control back to continue the prompt.</p><p>That&#8217;s it. All the seemingly supersmart behaviours of Claude Code, Gemini CLI, and Codex are, under the hood, some form of the basic ReAct loop. There are of course nuances. For example, most systems decide that if the agent calls the same tool with the same args three times, it must be stuck in a loop and stop the turn. There are perhaps hard limits on how many tool calls the agent can do in each turn.</p><p>Context is the bottleneck. Not the model. Not the prompt. Context.</p><p>The agent doesn&#8217;t have memory. It doesn&#8217;t have state. It has context. Everything it knows about your project, your preferences, your conventions, all of it lives in the context window. When you add a skill, you&#8217;re injecting more context. When you run a tool, the result goes into context. When you switch modes, you&#8217;re switching which system prompt is active, all still in context.</p><p>This means context engineering <em>is</em> AI agent engineering. The agent&#8217;s behavior isn&#8217;t determined by the model alone, or even primarily, but by what context you give it, and how you structure that context over time.</p><p>Most tools treat context as a solved problem. They stuff everything in and hope the model figures it out. In-context learning seems almost magical, but it has limits&#8212;and those limits become visible fast.</p><p>When context is thin, the agent simply doesn&#8217;t know enough about your project to make informed decisions. It relies on baked-in assumptions from training and falls back to consensus instead of following your style: it uses the common tools and practices it learned from pretraining. This often means it uses slightly old and outdated tools and practices.</p><p>So you do the sensible thing, and inject project-specific information into the context. But then if context grows too large, even if it doesn&#8217;t technically exceed the model&#8217;s capacity, things start to get lost in the middle. Moreover, failed tool calls, wrong assumptions the model had to correct, etc., start creeping up in context, not only taking up valuable space but also, and more importantly, <em>distracting</em> the model and biasing it towards mediocre decisions.</p><p>Then there is context compaction: when the context fills in to about 85%, most systems will invoke a special prompt to instruct the agent to summarize the current state. These prompts vary in detail, but often involve asking the agent what it is immediately doing, where is it stuck, what has failed, etc. Clever, but a hack nonetheless. This hard context reset means the agent will forget important nuances in the current conversation and will repeat past mistakes. It&#8217;s frustrating.</p><p>Let&#8217;s look at how these problems surface in specific symptoms that <em>all</em> LLM-based agents display at some point.</p><h2><strong>Symptom One: Unstated Assumptions</strong></h2><p>The first failure mode isn&#8217;t dramatic. It&#8217;s quiet. You ask the agent to write a test, and it writes a <code>unittest.TestCase</code> instead of a <code>pytest</code> function. You ask it to add a dependency, and it edits <code>requirements.txt</code> instead of running <code>uv add</code>. You ask it to deploy, and it pushes directly to main.</p><p>These aren&#8217;t model failures. They&#8217;re assumption mismatches. The agent doesn&#8217;t know how <em>your</em> team does things. There&#8217;s no guardrail for &#8220;in this project, we always use pytest, we always use uv, we never commit directly to main.&#8221; The agent improvises from general knowledge, and general knowledge is often wrong.</p><p>Skills are supposed to fix this. Add a skill document that says &#8220;use pytest&#8221; and the agent should know. But skills introduce a new problem.</p><p>You add a skill for code review. Then one for documentation. Then one for PR descriptions. Then three more for your company&#8217;s specific stack. Each skill seems small. A few hundred tokens each. But they pile up&#8212;always-on knowledge the agent carries but can&#8217;t prioritize.</p><p>The result is context bloat. The agent can&#8217;t tell what&#8217;s relevant in any given moment. So it blends everything together, and hallucinations increase. More skills made it worse&#8212;not better.</p><h2><strong>Symptom Two: Permission Leakage</strong></h2><p>Every agent framework implements the same plan then build pattern. The idea is sound: think first, plan second, execute third. In practice, the boundaries leak.</p><p>Plan mode is supposed to be read-only. Design the change, review the approach, lock in the scope. Build mode is supposed to execute. Write the code, run the tests, commit the result.</p><p>But &#8220;plan mode&#8221; in most tools is just a prompt. There&#8217;s no enforcement. The agent can write code in plan mode if it wants to. It can ignore the plan in build mode. It can skip straight to implementation if the prompt implies urgency. The modes are suggestions, not constraints.</p><p>This matters because a plan only works if it&#8217;s actually followed. If the agent can deviate mid-execution&#8212;if &#8220;plan mode&#8221; and &#8220;build mode&#8221; are just prompts with different names&#8212;the plan becomes advisory. And advisory plans get ignored.</p><p>The second problem is structural: there&#8217;s no artifact that passes from plan to build. The plan lives in the context. By the time build mode starts, the plan is mixed in with everything else the agent said. Which file was the plan? Which changes were approved? The agent has to re-read the conversation to remember. Context saturation accelerates.</p><h2><strong>Symptom Three: Context Saturation</strong></h2><p>After extended work, you see the same pattern: the agent makes 95% of the progress, then fails on the last 5%. It nails the architecture. The logic is sound. The core implementation works. Then it stumbles on a detail&#8212;because context has saturated. It forgot which environment it was in, which conventions still apply, which constraints matter.</p><p>But the deeper problem is internal noise. The agent keeps everything in context: all internal reasoning, all tool calls, all results. This is fine for minute-to-minute action. But after four failed attempts to solve something, the old tool calls are just noise. These were attempts that went nowhere, just add cost and accelerate saturation.</p><p>The supposed solution for this is context compaction. But this creates a lossy summary problem. The agent is supposed to leave a trail for its future self. After context compaction, it should be able to pick up where it left off. But if agents struggle with long contexts, how are they supposed to build a good trail? The compaction report is only as good as the agent&#8217;s ability to summarize. And summarization is lossy and injects back lots of unstated assumptions from pretraining.</p><p>The frustrating part: this wasn&#8217;t a hard problem. The agent had all the knowledge it needed. But context filled with noise, and the important bits got pushed out. More tokens in, less signal out.</p><p>The solution isn&#8217;t just better prompts or larger context windows. Yes, these help. But the symptoms are systemic, so the solution must be a system overhaul.</p><p>Let me show you how that system looks like.</p><h2><strong>Part II - The System</strong></h2><p>Now that we understand the problem, let&#8217;s look at how every agent system actually works. Every AI agent system addresses four concerns. When you conflate them, the system breaks. When you separate them, the system scales.</p><p>This taxonomy isn&#8217;t original to me. It&#8217;s a synthesis of how modern AI agentic systems work under the hood. Most explicitly, it&#8217;s implemented in the OpenCode CLI (opencode.ai), but all other tools follow a similar pattern, even if they use different names.</p><p>Here&#8217;s the breakdown. Every agent system you&#8217;ll encounter (explicitly or implicitly) is managing these four things:</p><p><strong>Mode &#8212; the who.</strong> A mode is the persona the AI adopts. It defines the thinking style, the permissions, the available tools. When you interact with a &#8220;code assistant,&#8221; you&#8217;re in a coding mode. When you switch to &#8220;creative writer,&#8221; you&#8217;re in a creative mode.</p><p>Modes are <em>explicit</em>. They&#8217;re top-level system prompts that define behavior and permissions. You tell the agent: &#8220;This is how you should think and behave. These are the tools you can use. These are the parts of the filesystem you can write to.&#8221;</p><p><strong>Skill &#8212; the knowledge.</strong> A skill is knowledge the agent can recall when necessary. It doesn&#8217;t get invoked explicitly, it gets applied <em>implicitly</em> when necessary. When you give an agent knowledge about SQL optimization, that skill is available whenever relevant. The agent doesn&#8217;t need to be told to use it. The ReAct cycle injects it when it deems suitable.</p><p>Unlike modes, skills can layer. An agent might have a SQL skill, a documentation skill, and a debugging skill, all active simultaneously, all contributing when relevant. Skills are implicit because the agent should just apply them naturally. They can also contradict or complement each other. In-context learning <em>should</em> be capable of using them in a combined manner.</p><p><strong>Command &#8212; the workflow.</strong> A command is a script. It tells the agent: do this, in this order, using these tools. &#8220;Refactor this function&#8221; is a command. &#8220;Run these tests and report results&#8221; is a command.</p><p>Commands are <em>explicit</em>: you invoke them. Under the hood, commands are just prompts. The difference is who injects them: the user. When you run <code>/build</code>, you&#8217;re injecting a workflow prompt into the agent&#8217;s context. That&#8217;s it. The command tells the agent: do this sequence of things. The complexity lives in the orchestration of the ReAct cycle, not the command itself.</p><p>Commands are intentionally simple. They don&#8217;t contain knowledge. That&#8217;s intentional separation of concerns. The command itself shouldn&#8217;t know <em>how</em> to build; it knows <em>when</em> to spawn subagents and which mode to use. This keeps commands thin and changeable without rewriting underlying knowledge.</p><p><strong>Subagent &#8212; the delegation.</strong> A subagent is a spawned agent for background or parallel tasks. It handles isolated work, returns summarized results, then disappears. It is instantiated with a system prompt and specific instructions (synthesized by the primary agent that called it), and runs for one full ReAct turn.</p><p>Subagents are ephemeral. Their internal reasoning stays private. The main agent only sees the synthesis. You spawn a subagent when you need parallel processing, isolation, or both. They are the way to <em>fork</em>, solve a specific subtask, and return a result, but keep context clean. Kind of like subroutines.</p><h3><strong>Why This Separation Matters</strong></h3><p>Understanding this distinction unlocks everything else. Once you see skills as implicit knowledge and commands as explicit scripts, the rest of the architecture clicks naturally. Most agent setups conflate these. They embed knowledge in commands. They make skills behave like workflows. They mix persona into everything else. And the massively underuse subagents.</p><p>When you separate these concerns&#8211;modes for persona, skills for knowledge, commands for orchestration, subagents for delegation&#8211;you get something that looks like good systems engineering. You can swap skills without touching commands. You can change modes without rewriting workflows. You can spawn subagents without the main agent knowing or caring how they work internally. The result is a system that works and adapts and <em>scales</em> like good software should do.</p><p>The system scales because the pieces are independent. Change one without breaking the others. Each component has a single job, and the boundaries between them are meaningful. When context shifts, when requirements evolve, when a new skill needs adding, the system adapts incrementally rather than collapsing under the weight of accumulated complexity.</p><h2><strong>Part III: The Practice</strong></h2><p>If so far this seems like abstract theory for you, in this section we will ground these concepts in actual practice. Let me show you how I&#8217;m using these ideas today to improve my AI-assisted coding practice. I&#8217;m using opencode.ai but I believe the following is easily adaptable to any agentic toolkit out there.</p><h3><strong>My Three Modes</strong></h3><p>Every agentic system needs boundaries, not social contracts, but enforced constraints. In my setup, those constraints come from three modes: analyze, design, and create.</p><p>Each of these modes defines a thinking style&#8212;a persona&#8212;and a set of constraints for tool use and filesystem access.</p><p><strong>Analyze mode</strong> is research and investigation. This mode reads your work and writes summaries to a knowledge base. It cannot touch production files. Not &#8220;should not&#8221; but <em>cannot</em>. The permissions are built into the mode itself, not enforced through prompts or warnings. The agent is incapable of writing outside of a <code>.playground</code> folder, and is incapable of doing anything that can harm the project or the system (more on how a bit later) but it is still capable of running arbitrary code, download anything from the internet, and play around as it needs.</p><p><strong>Design mode</strong> is architecture and planning. This mode bridges analysis and implementation. It can read your project and write design documents, architecture diagrams, and implementation plans, but still cannot touch production code. It cannot run shell scripts either, at all. It can look at git status and logs, read folder contents, etc., but it can only write to a space where plans and design documents go.</p><p><strong>Create mode</strong> is execution. Full read-write access. This is where production work happens. The agent can write code, create files, and modify the project directly. Again, it cannot do anything outside the project scope, though. It won&#8217;t accidentally change <code>/etc/host(s)<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-1" href="#footnote-1" target="_self">1</a></code> even if it tries to.</p><p>The key insight: <strong>modes define permissions, not just persona</strong>. You can&#8217;t accidentally prompt your way into code generation during research. The agent literally lacks the capability. The agent doesn&#8217;t need to &#8220;understand&#8221; these constraints, it simply operates within them.</p><p>Mode is the who, and it determines what the agent <em>can</em> do, not just how it thinks.</p><p>Let me show you how they work in three different domains that make the bread and butter of my daily job: software development, scientific research, and technical writing.</p><p>I chose these domains because they illustrate the simplicity and scalability of the system. Software development shows the framework under constraints: deadlines, production code, real stakes. Research shows it under complexity: synthesis, evaluation, structured output. Technical writing shows it under nuance: voice, audience, iterative refinement. Three different pressures, one consistent architecture that works in all three cases.</p><p>In each of these domains we have two layers to go through: first is the set of <strong>implicit skills</strong> that are available to the agents, and second is the set of <strong>explicit commands</strong> (each tied to a specific mode) that setup concrete workflows. I will show you one example workflow that cross-cuts across the three modes in each case. I will also tell you exactly where delegation occurs.</p><h3><strong>Domain A: Software Development</strong></h3><p>Software development is where agentic systems face the harshest constraints. Production code has stakes. Deadlines are real. Mistakes cost money. Let&#8217;s see how the framework applies.</p><h4><strong>Implicit Skills</strong></h4><p>A software development agent carries knowledge it never needs to be told to use. It knows language idioms and patterns like the idiomatic way to write a list comprehension in Python, or the conventions for error handling in Go. It knows testing conventions: where tests live in the directory structure, how they&#8217;re named, what assertions to prefer. It knows architecture conventions: layered structure, dependency injection patterns, how error states propagate. It knows code review standards: what to flag, what to praise, when to ask for clarification.</p><h4><strong>Example Workflow: Bug Hunting</strong></h4><p>I use this workflow for finding and fixing bugs. It starts with investigation. The agent spawns dozens of subagents to try and break the system (either guided towards a purpose, or completely unbiased). Then you build a comprehensive plan to solve it. And then you execute that plan. Simple, right?</p><p><strong>Phase 1: /trace (analyze mode)</strong> runs systematic experiments to detect and narrow down a bug&#8217;s cause. The agent examines stack traces, compares behavior across commits, and pinpoints the exact files and functions that need attention. This mode is read-only by design, except for a <code>.playground</code> folder. Research happens here, not in the code itself.</p><p>Each experiment is run on a subagent that has the job of verifying one assumption. The main agent receives only experiment results, and constructs an executive report of findings. This means you can run dozens of different experiments autonomously to detect what breaks what.</p><p><strong>Phase 2: /plan (design mode)</strong> takes the diagnosis and defines the changes needed, along with their architectural impact. The agent reviews the affected modules, considers alternative approaches, and documents the implementation plan before touching anything. This is where the scope gets locked in.</p><p>The result of this phase is a structured plan with step by step details on what files must be touched and what must be done in there (semantically, not code). For every phase, it defines success criteria: what must be validated before we can say we got that phase right.</p><p><strong>Phase 3: /build (create mode)</strong> executes the plan step by step. The agent writes tests first (following Test-Driven Development (TDD) discipline) for the success criteria defined for that phase and watches them fail. Then it launches a coding subagent that has <em>read-only</em> access to tests, so it cannot cheat and change the tests.</p><p>The subagent attempts to implement changes that make the test pass. If it succeeds, the main agent commits and moves on. If it doesn&#8217;t, the main agent retries a few times. If there is no progress, the main agent resets the work tree (no harm done), and reports on failure. This usually means the plan needs revisions.</p><h3><strong>Domain B: Research</strong></h3><p>Research is where agentic systems face the greatest complexity. Sources multiply, methodologies diverge, synthesis requires judgment. Let&#8217;s see how the framework applies.</p><h4><strong>Implicit Skills</strong></h4><p>A research agent knows the conventions of academic writing without being reminded. It knows citation formats like APA, MLA, Chicago, and IEEE, and when to use each. It knows how to evaluate papers: methodology soundness, sample size adequacy, replicability claims, conflict of interest disclosures. It knows the structure of literature reviews: how to organize by theme, methodology, or chronological development. It knows domain-specific terminology, distinguishing between &#8220;accuracy&#8221; and &#8220;precision&#8221; in machine learning, or between &#8220;confounding&#8221; and &#8220;colliding&#8221; in causal inference.</p><h4><strong>Example Workflow: State-of-the-Art Report</strong></h4><p><strong>Phase 1: /research (analyze mode)</strong> spawns subagents to gather sources in parallel. Each subagent reads a batch of papers, synthesizes findings, and returns summaries. The main agent synthesizes those summaries into structured notes. This phase can be run multiple times to collect batches of sources without overwhelming context. At the end, you get hundreds of sources summarized into clean research notes.</p><p><strong>Phase 2: /outline (design mode)</strong> identifies patterns across the collected literature. The agent groups papers by methodology, extracts recurring findings, and maps the landscape of the field. It generates outline options for the final document, based on typical structures like problem-solution or paradigm-methods, highlighting gaps where the research is thin and consensus areas where findings align.</p><p><strong>Phase 3: /draft (create mode)</strong> builds the document section by section, following the outline. Each section draws on the structured notes, weaving together sources into coherent narrative.</p><p>The agent launches subagents for writing each subsection because typically, agents write more or less the same length in a single <code>write</code> command, so if you ask it to fill in a large outline all at once you&#8217;ll only get a mediocre extended outline. By launching independent writers for specific sections of the outline, you get all the attention of a single turn to read source material and write a good 4 or 5 paragraphs for a concrete section.</p><p>A cool idea I&#8217;ve been meaning to try is have the main agent can spawn several subagents to write the same section, with a high temperature, and then perform some sort of aggregation or evaluation before building the final draft for every section. This burns through 3x tokens but ensembles have been shown over and over to improve AI models outputs. If you try it, let me know.</p><h3><strong>Domain C: Technical Writing</strong></h3><p>Technical writing is where agentic systems face the most nuance. Voice matters. Audience varies. Iterative refinement is the norm. Let&#8217;s see how the framework applies.</p><h4><strong>Implicit Skills</strong></h4><p>A technical writing agent carries knowledge of prose style without being coached. It knows voice and tense conventions&#8212;active voice for clarity, past tense for completed processes, second person for direct instruction. It knows structural patterns: how documentation differs from blog posts, how reports differ from tutorials, how reference material differs from guides. It knows audience awareness: what to explain for newcomers, what to omit for experts, when to elaborate and when to abbreviate. It knows cross-referencing and linking norms: when to link, when to inline, how to name anchors for scannability.</p><h4><strong>Example Workflow: Paper Review</strong></h4><p><strong>Phase 1: /review (analyze mode)</strong> performs detailed review in a specific order: structural issues first, then content, then style. The agent examines the narrative arc&#8212;how main points connect, whether the flow makes sense, before worrying about grammar or word choice. This ordering matters; reviewing low-level details when high-level problems exist wastes effort.</p><p>Each iteration is performed by spawning several subagents that focus on specific types of problems, like transitions, unverifiable claims, etc. Each subagent returns a structured list of issues, pointing back to exact line numbers and phrasing. Then, the main agent <em>edits</em> the original paper and injects markdown comments in every marked issue, next to the paragraph, or under the header where it best fits.</p><p><strong>Phase 2: /revise (design mode)</strong> plans changes to specific sections, prioritizing by review type. The agent maps structural fixes to particular paragraphs, content additions to thin sections, style improvements to verbose passages. It produces a concrete plan, section by section, change by change. Then it goes into the manuscript and writes markdown comments as replies to the existing review comments, thus grounding the revision plan in the exact context it must fit.</p><p><strong>Phase 3: /rewrite (create mode)</strong> follows the plan. The agent revises sections in priority order, applying structural changes first, then content, then style. Again, each step is performed spawning a subagent tasked with just a change (for style changes we actually do it section by section).</p><p>The subagent doesn&#8217;t edit; it produces a draft revision that the main agent is then tasked to paste into the document where it fits. Crucially, the main agent is instructed to <em>leave</em> the editorial comments but mark them as solved, with a short trail of what was changed. This works wonders for a later human review phase.</p><h2><strong>Part IV: A Look into the Future</strong></h2><p>These workflows work, but with some caveats. There&#8217;s a gap between &#8220;working&#8221; and &#8220;working well.&#8221; Three key pains remain in my implementation.</p><ol><li><p>Long commands are hard to follow when given as a single prompt. The fourth step gets forgotten since it is buried at the beginning of the context.</p></li><li><p>Permissions as currently implemented are all-or-nothing. You either have shell access (destructive) or you don&#8217;t. I want broad permissions (run whatever you want) with provable security (nothing you run can change this file).</p></li><li><p>Context saturation still happens even with delegation. After a while, the agent will have to compact context, and this usually means you lose important information.</p></li></ol><p>I have three ideas for closing this gap. The first is about how commands work. The second is about security. The third is about context management. They are in different levels of implementation, so let me show you what I&#8217;m building toward.</p><h3><strong>Idea One: Better Commands</strong></h3><p>Commands in most tools (Claude Code, Gemini CLI, Codex, Copilot) are one-shot interactions: you invoke the command, a single massive prompt is injected. The agent runs until it decides to stop.</p><p>To make commands truly useful, we need to be more like scripts. Here&#8217;s what that means:</p><ol><li><p>Commands that inject prompt instructions one step at a time, waiting for the agent to do a full turn each time. Instead of dumping a large prompt to run all steps at once, a command like <code>/review</code> could insert surgical mini prompts that say &#8220;read the file&#8221;, wait for the agent, &#8220;analyze structure&#8221;, wait for agent, and so on, until &#8220;write the report&#8221;. This massively reduces the problem of lost-in-middle context saturation. Each turn the agent is focused on one specific step, and you get N times the compute power to solve an N-step workflow.</p></li><li><p>Commands that extract structured information from the agent response, and can later inject variables back into prompt. This allows to reinject important information into later prompts, keeping important information as a contextual variable, not just a string lost in the middle of the prompt. But it allows for something else.</p></li><li><p>Conditional branching based on context or user input. Once we have structured parsing and contextual variables, we can inject different prompts based on whether the agent succeeded or failed. If the plan reveals a breaking change, route to architectural review. If it&#8217;s a bug fix, route directly to implementation. The command adapts its path based on what it discovers.</p></li><li><p>Finally, commands that embed and execute external scripts. Instead of asking the agent to run some script, the command can run arbitrary Python, JS, Bash, or whatever, to, for example, transform structured information. The command becomes an orchestrator of other processes.</p></li></ol><p>Basically, what I&#8217;m asking for here is a Domain-Specific Language (DSL) for guiding agents in a far more structured manner, but still having the power of arbitrary prompts for flexibility. Mixing code and prompts in this way gives us the tools to find the precise balance between constraints and capabilities.</p><p>If this sounds exciting, I&#8217;m happy to tell you this is already doable, to some extent. Check out my <a href="https://apiad.github.io/opencode-literate-commands">literate-commands</a> project for an OpenCode-specific implementation of these ideas. It&#8217;s still a bit rough around the edges, but it works much better than plain, single-prompt commands.</p><h3><strong>Idea Two: Sandboxed Security</strong></h3><p>Most agentic tools have very coarse permission settings. You can allow, deny, or set a specific tool to &#8220;ask&#8221; mode, which means the agent will pause and emit a notification for the user to give permission.</p><p>This works fine for coarse-grained permissions like read-only access, or write but no shell. In OpenCode, you can even define permissions for specific paths, or even specific shell commands (with simple glob patterns, so you can, e.g., allow <code>ls *</code> but reject all other shell commands).</p><p>However, even in this case, I find these permissions too restrictive. They are conflating two different dimensions into one&#8211;what tools the agent can use, and what side-effects can those tools have.</p><p>For example, say I want to give my agent <code>git</code> access but only for reading operations. How do you achieve that? You need to list all safe patterns like <code>git ls-tree *</code>, <code>git status</code>, <code>git log *</code>. But what about <code>git branch</code>? Depending on the arguments, this subcommand can have read-only or write side effects. And then think about pipes, shell substitution, custom bash scripts, or worse, <code>python *</code>.</p><p>If you want your agent to be capable, you need to give it access to a wide variety of tools. For example, my bug-hunting workflow depends on the agent being able to execute arbitrary code that it synthesizes on the fly. However, I want guardrails. There is simply no way to whitelist all possible commands. We need separation of permission to run a command and permission to modify the system.</p><p>The solution, of course, is some form of filesystem isolation. The most obvious one is wrapping all shell execution in Docker, so commands run in a container with proper constraints. This creates all sorts of other problems, which I can discuss in a future post, but for now, it remains my best (and simplest) solution to robust sandboxing.</p><p>And this isn&#8217;t just about safety, though. When you know the agent can&#8217;t accidentally wipe your home directory or exfiltrate your API keys, you can let it do more. Security enables capability. You can let the agent download arbitrary code from the internet, run arbitrary scripts, break things and observe changes. Everything happens inside a Docker container with precise constraints that enable maximum capability with absolute security.</p><p>As of now, I kind of implemented this as a plugin for OpenCode, but it&#8217;s still in beta phase and not ready for widespread use. More on this idea in a future article.</p><h3><strong>Idea Three: Context-Aware Execution</strong></h3><p>And finally, we need to rethink the whole oversimplistic ReAct loop that simply grows the context linearly. The agentic cycle doesn&#8217;t have to be a straight line. Real work branches: you explore options, try things, backtrack when they fail. The context should reflect that.</p><p>I&#8217;ve been designing a system where the context never saturates. It branches when you&#8217;re exploring, spawning parallel contexts for different approaches. It prunes old tool calls that went nowhere. It removes internal reasoning that no longer matters. It maintains a &#8220;trail&#8221; that actually works: a structured record of decisions, not a lossy summary.</p><p>The goal is simple: keep context between 40% and 60% saturation at all times. Not by compacting a 150K tokens context down to 10K&#8212;which kills all understanding the agent had achieved&#8212;but by never letting it grow unchecked.</p><p>Nothing like this exists yet, so I&#8217;m building it, but it&#8217;s a story for another day.</p><h2><strong>Conclusion</strong></h2><p>The main takeaway from this article is not that <em>my</em> system is better. It&#8217;s that <em>you</em> can design your own system to adapt perfectly to your workflows if you clearly separate concerns. The main modes are for establishing an overall persona&#8211;inquisitive and critical, versus detailed and forward-looking, versus focused and action-biased&#8211;while skills incorporate domain knowledge, and commands act as precise workflows.</p><p>The workflows I described are real, based on actual commands and prompts I&#8217;m using in production code. But I have abstracted them a bit to make them easier to understand in the context of an arbitrary agent, not tied to specific idiosyncrasies of the tool I happen to be using at the moment. If you want to see and try for yourself a concrete implementation of these ideas&#8212;still imperfect, but working nonetheless&#8212;check out my <a href="https://apiad.github.io/opencode">opencode toolkit</a> repository. It&#8217;s still pretty much work in progress, so use it with care.</p><p>In future articles I will explore specific problems in more detail and discuss concrete strategies to implement powerful workflows that keep you, the user, in absolute control, while delegating the majority of the grunt work.</p><p>And, as a final remark, I&#8217;m seriously considering building my own CLI agent. I know, I know. Reinventing the wheel and all that. But my plan is not to compete with any of the professional tools out there. What I always care about is <em>understanding</em> things deeply, and as my computer science career has taught me so far, there is no deeper understanding than the one you gain from actually building stuff.</p><p>So stay tuned for that. I will share progress as usual in the form of educational articles, so you&#8217;ll get to see under the hood how to build a fully functional CLI agent with tool calling, context compaction, skills, commands (the powerful ones, not the cheap single-prompt injection), subagent delegation, sandboxing, and all the engineering design hurdles that come with it.</p><p>Until next time, stay curious.</p><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-1" href="#footnote-anchor-1" class="footnote-number" contenteditable="false" target="_self">1</a><div class="footnote-content"><p>Fun quirk. Typing <code>/etc/host</code> plus the <code>s</code> makes Substack silently fail on draft save, some sort of ill-defined security rule, I suppose. What the f&#8230;</p></div></div>]]></content:encoded></item><item><title><![CDATA[AI Winter is Coming… Or Is It?]]></title><description><![CDATA[A level-headed, pragmatic overview of the forthcoming reckoning in the AI industry]]></description><link>https://blog.apiad.net/p/ai-winter-is-coming-or-is-it</link><guid isPermaLink="false">https://blog.apiad.net/p/ai-winter-is-coming-or-is-it</guid><dc:creator><![CDATA[Alejandro Piad Morffis]]></dc:creator><pubDate>Tue, 21 Oct 2025 14:32:00 GMT</pubDate><enclosure url="https://images.unsplash.com/photo-1487782310695-ed8583618566?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwxNnx8d2ludGVyfGVufDB8fHx8MTc2MDk5NTQ3MXww&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://images.unsplash.com/photo-1487782310695-ed8583618566?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwxNnx8d2ludGVyfGVufDB8fHx8MTc2MDk5NTQ3MXww&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://images.unsplash.com/photo-1487782310695-ed8583618566?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwxNnx8d2ludGVyfGVufDB8fHx8MTc2MDk5NTQ3MXww&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080 424w, https://images.unsplash.com/photo-1487782310695-ed8583618566?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwxNnx8d2ludGVyfGVufDB8fHx8MTc2MDk5NTQ3MXww&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080 848w, https://images.unsplash.com/photo-1487782310695-ed8583618566?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwxNnx8d2ludGVyfGVufDB8fHx8MTc2MDk5NTQ3MXww&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080 1272w, https://images.unsplash.com/photo-1487782310695-ed8583618566?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwxNnx8d2ludGVyfGVufDB8fHx8MTc2MDk5NTQ3MXww&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080 1456w" sizes="100vw"><img src="https://images.unsplash.com/photo-1487782310695-ed8583618566?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwxNnx8d2ludGVyfGVufDB8fHx8MTc2MDk5NTQ3MXww&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080" width="4608" height="3456" data-attrs="{&quot;src&quot;:&quot;https://images.unsplash.com/photo-1487782310695-ed8583618566?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwxNnx8d2ludGVyfGVufDB8fHx8MTc2MDk5NTQ3MXww&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:3456,&quot;width&quot;:4608,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;white pendant lamp hanging on ceiling outside of snow covered forest&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="white pendant lamp hanging on ceiling outside of snow covered forest" title="white pendant lamp hanging on ceiling outside of snow covered forest" srcset="https://images.unsplash.com/photo-1487782310695-ed8583618566?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwxNnx8d2ludGVyfGVufDB8fHx8MTc2MDk5NTQ3MXww&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080 424w, https://images.unsplash.com/photo-1487782310695-ed8583618566?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwxNnx8d2ludGVyfGVufDB8fHx8MTc2MDk5NTQ3MXww&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080 848w, https://images.unsplash.com/photo-1487782310695-ed8583618566?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwxNnx8d2ludGVyfGVufDB8fHx8MTc2MDk5NTQ3MXww&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080 1272w, https://images.unsplash.com/photo-1487782310695-ed8583618566?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwxNnx8d2ludGVyfGVufDB8fHx8MTc2MDk5NTQ3MXww&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Photo by <a href="https://unsplash.com/@mirakemppainen">Mira Kemppainen</a> on <a href="https://unsplash.com">Unsplash</a></figcaption></figure></div><p>You can&#8217;t scroll through a tech feed these days without tripping over a prophecy: the AI bubble is about to burst, and a long, cold &#8220;AI Winter&#8221; is coming. The narrative is as seductive as it is simple. The current frenzy around Generative AI, we&#8217;re told, is a speculative mania. When the inflated expectations inevitably collide with reality and the firehose of investment capital slows to a trickle, the whole enterprise will be exposed as a grand fiasco. We&#8217;ll discover, the skeptics say, that it was all a <em>cuento</em>.</p><p>And let&#8217;s be clear: they&#8217;re not entirely wrong about the first part. The expectations <em>are</em> inflated. A correction is not just likely; it&#8217;s necessary.</p><p>But here&#8217;s my thesis: the idea that this correction will lead to another AI Winter&#8212;a catastrophic freeze comparable to the funding droughts of the 1970s and 80s&#8212;is a fundamental misreading of the landscape. I will argue that what we are heading for is not a collapse, but a <em>normalization</em>&#8212;what I will call an AI <em>autumn.</em> </p><p>The inevitable deflation of the hype won&#8217;t reveal a failed technology. Instead, it will reveal a technology that has already, quietly and irrevocably, proven its utility and woven itself into the fabric of our digital lives. </p><p>This isn&#8217;t a story about a bubble bursting; it&#8217;s about a revolutionary technology finally growing up. But let&#8217;s be clear: growing up can be a painful process. The normalization I&#8217;m describing won&#8217;t be a gentle, seamless transition. An industry built on unsustainable economics and AGI-or-bust promises can still face maybe not a brutal winter, but a significant autumn, even if the underlying technology continues to thrive.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://blog.apiad.net/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://blog.apiad.net/subscribe?"><span>Subscribe now</span></a></p><h2>Anatomy of the Hype (Or Why the Skeptics Have a Point)</h2><p>Before we can talk about the future, we have to be honest about the present. The current AI landscape feels like a bubble because, in many ways, it is one. This isn&#8217;t to say the technology is vaporware; far from it. The frenzy is built on a kernel of genuinely astonishing progress. But that kernel has been buried under an avalanche of speculative capital and quasi-religious prophecy.</p><p>The promises are, to put it mildly, grandiose. Tech leaders, flush with unprecedented investment, speak of replacing vast swaths of the workforce and ushering in an era of unimaginable productivity. Every incremental improvement is framed as another step on the inexorable march toward Artificial General Intelligence. This narrative is then amplified by a chorus of <em>accelerationists</em> and futurists who speak of the Singularity not as a distant sci-fi concept, but as an imminent event. It&#8217;s a powerful and compelling story, and it&#8217;s fueling a gold rush.</p><p>But back on planet Earth, the story is more complicated. For every breathless demo, there are practical and theoretical roadblocks that the hype conveniently ignores. The most glaring is the hallucination problem. These models, by their very nature, invent things. We&#8217;ve managed to reduce the frequency, but we haven&#8217;t eliminated the phenomenon, and there are compelling theoretical arguments that we may never be able to. This isn&#8217;t just a bug; it&#8217;s a feature of the architecture, a fundamental crack in the foundation of trust.</p><p>This technical limitation then crashes headfirst into the corporate world&#8217;s messy reality. Most companies, lured by the promise of easy productivity gains, are discovering a <em>massive adoption gap</em>. They lack the clean data, the streamlined processes, and the technical expertise to reliably integrate these powerful but flawed tools. It&#8217;s no wonder, then, that an astonishing number of corporate AI projects&#8212;some estimates say as high as 85%&#8212;are quietly failing to deliver a return on investment. Sky-high promises plus messy, difficult reality is the classic recipe for a bubble.</p><p>Perhaps the most potent dose of reality, however, is coming from the frontier models themselves. We&#8217;re witnessing a classic case of diminishing returns. The leap in capability from GPT-3 to GPT-4 was so profound it felt like a paradigm shift, leading many to draw a straight line on the progress graph and conclude that GPT-5 would be knocking on AGI&#8217;s door. That hasn&#8217;t happened.</p><p>The newest models are better, certainly, but the improvement is incremental, not awe-inspiring. It strongly suggests we&#8217;re hitting the ceiling of what the current paradigm can do. Experts like Yann LeCun and Fran&#231;ois Chollet argue persuasively that to progress further, we need fundamentally new approaches&#8212;paradigms that have yet to be invented. This pushes the dream of AGI firmly back into the realm of long-term research, not the foreseeable future.</p><p>Compounding this is a simple fact: <em>the economics of frontier AI are fundamentally broken</em>. The cost to train a single model like GPT-4 is north of $100 million. The data center infrastructure required to support the industry&#8217;s ambitions will require an estimated $5.2 trillion by 2030.</p><p>Unsurprisingly, this has created a severe profitability crisis. In 2024, OpenAI reportedly lost approximately $5 billion on $9 billion in revenue, with inference costs alone accounting for a multi-billion dollar loss. This isn&#8217;t a business model; it&#8217;s a venture-subsidized science experiment, and it&#8217;s hitting a hard physical wall with an energy grid that cannot keep up.</p><p>Furthermore, we must recognize that this isn&#8217;t just another tech bubble. The investment flowing into AI is qualitatively different from, say, funding for a better SaaS tool or a more efficient database. A significant portion of this capital is a high-stakes, geopolitical bet on the imminent arrival of AGI. The valuations of the frontier labs are not based on their current, money-losing products; they are based on the promise of creating a literal <em>god-in-a-box</em>. </p><p>Whether Sam Altman and company believe or not is beyond the point. This dream of AGI is driving market valuations, and when the market finally digests that we are hitting a paradigm ceiling&#8212;a point this article has already made&#8212;the withdrawal of that &#8216;AGI-or-bust&#8217; capital won&#8217;t be a gentle correction. It will be a sudden, violent repricing that could vaporize billions in paper wealth overnight.</p><h2>What Will Happen When the Bubble Bursts?</h2><p>So, given the inflated expectations and technical ceilings, what happens when the hype recedes? I don&#8217;t really like to make predictions, and much less about the future. It&#8217;s damn hard. But I think we can outline a possible, perhaps even probable near future. I want to draw an analogy here and claim we will see not a true AI winter, but something close to an AI autumn.</p><p>An AI autumn is an economic event. It&#8217;s a period of massive financial correction, characterized by layoffs, hiring freezes, startup failures, and a freeze in venture capital. It&#8217;s painful for the people and companies in the field. An AI winter, on the other hand, is a crisis of relevance of the core technology. It&#8217;s when the technology itself proves to be a dead end, progress stalls, and the world moves on.</p><p>To be as blunt as I can, I do believe a severe autumn for the AI industry is not just possible; it&#8217;s likely. The current economics are unsustainable, as we&#8217;ve seen. But the central argument of this article is that this painful industrial correction will <em>not</em> trigger a catastrophic winter, which would be far worse. No, AI is here to stay, and here is why.</p><p>First, we can&#8217;t ignore the relentless democratization of compute. The idea that cutting-edge AI will forever be the exclusive domain of billion-dollar data centers is a historical fallacy. We are already seeing an explosion of highly capable open-source models that can run on local, consumer-grade hardware. What requires a professional-grade, 10,000 dollars GPU today will run on your laptop in two years, and on your phone two years after that. </p><p>This trajectory completely decouples the utility of AI from the subsidized business models of a few large companies. The capability is escaping the lab and becoming part of the background radiation of computing.</p><p>Second, even if the progress of frontier models were to stop dead in its tracks today&#8212;which it won&#8217;t, but it will likely continue to decelerate&#8212;we still have a decade&#8217;s worth of technological breakthrough that most of the world has not even begun to properly digest. The current adoption gap isn&#8217;t a sign of inevitable failure; it&#8217;s a sign that the technology has advanced far faster than our institutions can keep up. </p><p>A slowdown in R&amp;D investment won&#8217;t cause a retreat. Instead, it will trigger a necessary and healthy shift in focus from pure research to practical implementation, integration, and process refinement. This is what maturity looks like. The frantic sprint to invent the future will become the marathon of actually building it.</p><p>Most importantly, this shift will not trigger a true AI winter because we are simply far beyond the point where Artificial Intelligence can disillusion us. It is already a proven technology, woven so deeply into our digital infrastructure that a true winter is no longer possible.</p><h2>Why We Won&#8217;t See Another AI Winter</h2><p>Let&#8217;s start with Generative AI itself. Even with all its flaws, its core utility is now undeniable. The previous AI winters occurred when promising lab demos failed to translate into real-world applications. That is not the situation today.</p><p>A significant percentage of the global population&#8212;some conservative estimates say around 10%&#8212; now uses these tools not as novelties, but as integrated parts of their daily work. It&#8217;s the assistant that transcribes a meeting and pulls out action items, summarizes a sprawling email thread you don&#8217;t have time to read, and helps you rephrase a blunt message into a diplomatic one. Online search is quickly becoming the playground for generative AI, and online search is by far the most profitable business in the Internet Era. </p><p>The genie is out of the bottle; people are not going to suddenly stop using a tool that demonstrably saves them time, just because its creators promised it would become a god.</p><p>But perhaps the world of software development is an even more potent example. There&#8217;s a lot of noise about irresponsible &#8220;vibe coding,&#8221; where novices generate code they don&#8217;t understand, creating an unmaintainable mess. This is a real problem, but it&#8217;s a problem of skill, not a failure of the tool.</p><p>For experienced developers, these assistants are transformative. The mythical &#8220;10x productivity&#8221; boost is largely a myth, but a consistent 1.5x to 2x multiplier is very real. I&#8217;ve seen it in my own projects. Code assistants act as the new IntelliSense, handling the mind-numbing boilerplate and letting me focus on the architectural challenges. I may now only write 20% of the final characters in the codebase, but I am still the author of 90% of the critical ideas. This is not a crutch; it&#8217;s leverage.</p><p>And beyond these consumer-facing applications lies an even larger world of traditional machine learning that is indispensable to modern science and industry.</p><p>From drug discovery and genomic sequencing in biotech to predictive maintenance and supply chain optimization in manufacturing, decades of successful applications of AI in the industry today delivers billions of dollars in quantifiable value. Their success is measured in efficiency gains and scientific breakthroughs, not hype cycles.</p><p>But the more fundamental point is this: the debate over a &#8220;Generative AI&#8221; bubble distracts from the fact that the broader field of AI has already won its place. We haven&#8217;t had a true AI winter since the 1990s because AI stopped being a distinct, speculative field and became the foundational plumbing of the modern world. The search engine that found this article? That&#8217;s AI. The recommendation algorithm that determines your social media feed? AI. The logistics network that delivered your last package, the facial recognition that unlocks your phone, the voice transcription that takes your meeting notes&#8212;it&#8217;s all AI. Not Generative AI (for the most part), but AI nonetheless.</p><p>The line between computer science and AI has become so blurred that it&#8217;s practically meaningless. To talk about an AI winter today is like talking about an Internet winter in 2005. The technology is simply too embedded to fail.</p><p>However, as we&#8217;ve argue, there will be some painful correction. That much is, I think, almost undeniable. If that&#8217;s indeed the case, here are some optimistic arguments for why it may all be for the better in the end.</p><h2>The Renaissance of AI Research</h2><p>When the unsustainable hype collides with this resilient foundation, a fundamental law of economics reasserts itself: there is no free lunch. An AI autumn is the inevitable trade-off for a period of unchecked exuberance. A wave of consolidation will wash away unprofitable startups, and the market&#8217;s strategic focus will pivot from &#8220;bigger is better&#8221; to efficiency.</p><p>But this period of commercial cooldown has a powerful, if counter-intuitive, silver lining: a renaissance of real research. History shows us that AI&#8217;s greatest winters have been fertile ground for its most important breakthroughs. The hype recedes, and with it, the noise. The crushing pressure for short-term commercial returns is replaced by the intellectual freedom to tackle fundamental, long-term challenges.</p><p>Many of the core technologies fueling today&#8217;s boom were born in the quiet of previous winters. The backpropagation algorithm, popularized by Geoffrey Hinton in the 1980s, was refined during a period of deep skepticism about neural networks. Most famously, the Long Short-Term Memory (LSTM) architecture, which was a cornerstone of natural language processing for decades, was developed by Hochreiter and Schmidhuber in 1997, the absolute heart of the last AI winter.</p><p>The coming autumn will trigger a similar cycle. As the brightest minds are freed from the scaling hype, the real work on the next generation of AI can begin. We are already seeing the intellectual seeds of this shift. AI pioneers are openly discussing the deep limitations of current models. Yann LeCun is championing his Joint Embedding Predictive Architecture (JEPA) as a path toward &#8220;world models&#8221; that learn abstract representations of reality.</p><p>The field of Neuro-Symbolic AI, which fuses neural nets with structured logic, is experiencing a surge in interest. These are not incremental improvements; they are explorations of entirely new paradigms.</p><h3>Conclusion: No Retreat, Just Normalization</h3><p>So, where does that leave us? The coming correction is not an apocalypse; it&#8217;s a maturation. The frantic, gold-rush energy will dissipate, and in its place, something far more durable will emerge. The deflation of the hype bubble will not send talent fleeing the field or cause us to abandon the tools we&#8217;ve built. Instead, it will mark the end of the beginning.</p><p>The great irony is that the very thing that guarantees AI&#8217;s long-term survival&#8212;its commoditization into reliable &#8216;plumbing&#8217;&#8212;is what makes the current industry valuations so precarious. Plumbing is a low-margin, utility business, not a world-dominating monopoly. This disconnect between utility and valuation is the financial fault line where the industrial earthquake will hit. The era of breathless, revolutionary promises will give way to the slow, difficult, and necessary work of integration. </p><p>This is the natural lifecycle of any transformative technology. It moves from a speculative curiosity to a reliable, if sometimes challenging, part of the professional toolkit. Generative AI will not become the all-knowing oracle we were promised, but it has already secured its place as a uniquely powerful tool for thought, creation, and productivity.</p><p>The question was never really <em>if</em> AI would change the world; the underlying technology has been doing that for decades. The real question is how we manage the transition. This industrial autumn will be cushioned, to some extent, by geopolitical reality. The race between the US and China ensures that a certain level of state-sponsored R&amp;D will continue, preventing a total 1980s-style collapse. </p><p>But for the people working in the field, the transition will still be jarring. The future of AI isn&#8217;t a simple story of success or failure. It&#8217;s the messy, often painful process of separating a world-changing technology from the unsustainable industry that&#8217;s driving it, and going back to drawing board, back to building new and even cooler stuff.</p>]]></content:encoded></item><item><title><![CDATA[The Four Fallacies of Modern AI]]></title><description><![CDATA[And Why Believing in Them Hinders Progress]]></description><link>https://blog.apiad.net/p/the-four-fallacies-of-modern-ai</link><guid isPermaLink="false">https://blog.apiad.net/p/the-four-fallacies-of-modern-ai</guid><dc:creator><![CDATA[Alejandro Piad Morffis]]></dc:creator><pubDate>Wed, 10 Sep 2025 11:30:43 GMT</pubDate><enclosure url="https://images.unsplash.com/photo-1503792070985-b4147d061915?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHw3M3x8cmFuZG9tfGVufDB8fHx8MTc1NzQzMzM5NXww&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://images.unsplash.com/photo-1503792070985-b4147d061915?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHw3M3x8cmFuZG9tfGVufDB8fHx8MTc1NzQzMzM5NXww&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://images.unsplash.com/photo-1503792070985-b4147d061915?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHw3M3x8cmFuZG9tfGVufDB8fHx8MTc1NzQzMzM5NXww&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080 424w, https://images.unsplash.com/photo-1503792070985-b4147d061915?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHw3M3x8cmFuZG9tfGVufDB8fHx8MTc1NzQzMzM5NXww&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080 848w, https://images.unsplash.com/photo-1503792070985-b4147d061915?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHw3M3x8cmFuZG9tfGVufDB8fHx8MTc1NzQzMzM5NXww&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080 1272w, https://images.unsplash.com/photo-1503792070985-b4147d061915?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHw3M3x8cmFuZG9tfGVufDB8fHx8MTc1NzQzMzM5NXww&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080 1456w" sizes="100vw"><img src="https://images.unsplash.com/photo-1503792070985-b4147d061915?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHw3M3x8cmFuZG9tfGVufDB8fHx8MTc1NzQzMzM5NXww&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080" width="3163" height="2230" data-attrs="{&quot;src&quot;:&quot;https://images.unsplash.com/photo-1503792070985-b4147d061915?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHw3M3x8cmFuZG9tfGVufDB8fHx8MTc1NzQzMzM5NXww&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:2230,&quot;width&quot;:3163,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;closeup photo of Yale 19 key against black background&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="closeup photo of Yale 19 key against black background" title="closeup photo of Yale 19 key against black background" srcset="https://images.unsplash.com/photo-1503792070985-b4147d061915?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHw3M3x8cmFuZG9tfGVufDB8fHx8MTc1NzQzMzM5NXww&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080 424w, https://images.unsplash.com/photo-1503792070985-b4147d061915?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHw3M3x8cmFuZG9tfGVufDB8fHx8MTc1NzQzMzM5NXww&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080 848w, https://images.unsplash.com/photo-1503792070985-b4147d061915?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHw3M3x8cmFuZG9tfGVufDB8fHx8MTc1NzQzMzM5NXww&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080 1272w, https://images.unsplash.com/photo-1503792070985-b4147d061915?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHw3M3x8cmFuZG9tfGVufDB8fHx8MTc1NzQzMzM5NXww&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Photo by <a href="https://unsplash.com/@mattartz">Matt Artz</a> on <a href="https://unsplash.com">Unsplash</a></figcaption></figure></div><p>I've spent the last few years trying to make sense of the noise around Artificial Intelligence, and if there's one feeling that defines the experience, it's whiplash. One week, I'm reading a paper that promises AI will cure disease and unlock unimaginable abundance; the next, I'm seeing headlines about civilizational collapse. This dizzying cycle of AI springs, periods of massive investment and hype, followed by the chilling doubt of AI winters isn't new. It's been the engine of the field for decades.</p><p>After years of this, I've had to develop my own framework just to stay grounded. It&#8217;s not about being an optimist or a pessimist; it&#8217;s about rejecting both extremes. For me, it&#8217;s a commitment to a tireless reevaluation of the technology in front of us; to using reason and evidence to find a path forward, because I believe we have both the power and the responsibility to shape this technology&#8217;s future. That begins with a clear-eyed diagnosis of the present.</p><p>One of the most useful diagnostic tools I've found for this comes from computer scientist Melanie Mitchell. In a seminal paper back in 2021, she identified what she claims are four foundational fallacies, four deeply embedded assumptions that explain to a large extent our collective confusion about AI, and what it can and cannot do.</p><p>My goal in this article isn't to convince you that Mitchell is 100% right. I don't think she is, either, and I will provide my own criticism and counter arguments to some points. What I want is to use her ideas as a lens to dissect the hype, explore the counterarguments, and show why this intellectual tug-of-war has real-world consequences for our society, our economy, and our safety.</p><h2>Deconstructing the Four Fallacies</h2><p>For me, the most important test of any idea is its empirical validation. No plan, no matter how brilliant, survives its first encounter with reality. I find that Mitchell&#8217;s four fallacies are the perfect tool for this. They allow us to take the grand, sweeping claims made about AI and rigorously test them against the messy, complicated reality of what these systems can actually do.</p><h3>Fallacy 1: The Illusion of a Smooth Continuum</h3><p>The most common and seductive fallacy is the assumption that every impressive feat of narrow AI is an incremental step on a smooth path toward human-level Artificial General Intelligence (AGI). That is, that intelligence is a single, unidimensional metric on a continuum that goes from narrow to general.</p><p>We see this everywhere. When IBM's Deep Blue beat Garry Kasparov at chess, it was hailed as a first step towards AGI. The same narrative emerged when DeepMind's AlphaGo defeated Lee Sedol. This way of thinking creates, according to Mitchell, a flawed map of progress, tricking us into believing we are much closer to AGI than we are. It ignores the colossal, unsolved challenge known as the commonsense knowledge problem&#8212;the vast, implicit understanding of the world that humans use to navigate reality.</p><p>As philosopher Hubert Dreyfus famously said, this is like claiming that the first monkey that climbed a tree was making progress towards landing on the moon. Well, in a sense, maybe it is, but you get the point. We didn't get to the moon until we invented combustion rockets. Climbing ever taller trees gets us nowhere closer, it's just a distraction. In the same sense, mastering a closed-system game may be a fundamentally different challenge than understanding the open, ambiguous world.</p><p>But here's the nuance. While beating Kasparov isn't a direct step to having a conversation, the methods developed can be surprisingly generalizable. The architecture that powered AlphaGo was later adapted into MuZero, a system that mastered Go, chess, and Atari games without being told the rules. </p><p>Furthermore, can we really call a Large Language Model narrow in the same way? Its ability to write code and summarize text feels like a qualitative leap in generality that the monkey-and-moon analogy doesn't quite capture.</p><p>This leaves us with a forward-looking question: How do recent advances in multimodality and agentic AI test the boundaries of this fallacy? Does a model that can see and act begin to bridge the gap toward common sense, or is it just a more sophisticated version of the same narrow intelligence? Are world models a true step towards AGI or just a higher branch in a tree of narrow linguistic intelligence?</p><h3>Fallacy 2: The Paradox of Difficulty</h3><p>We have a terrible habit of projecting our own cognitive landscape onto machines, assuming that what's hard for us is hard for them, and what's easy for us is easy for them. For decades, the opposite has been true.</p><p>This is Moravec's Paradox, named after the roboticist Hans Moravec, who noted it's easier to make a computer exhibit adult-level performance on an IQ test than to give it the sensory and motor skills of a one-year-old.</p><p>This explains why we have AI that can master the ridiculously complex game of Go, while a fully self-driving car remains stubbornly just over the horizon. The "easy" things are built on what Mitchell calls the "invisible complexity of the mundane." This paradox causes a chronic mis-calibration of our progress and priorities, leading us to be overly impressed by performance in formal domains while underestimating the staggering difficulty of the real world.</p><p>Of course, some would argue this isn't a fundamental barrier, but a temporary engineering hurdle. They&#8217;d say that with enough data and compute, the "invisible complexity" of the real world can be learned, just like the complexity of Go was.</p><p>From this perspective, the problem isn't one of kind, but of scale. This forces us to ask: as sensor technology and robotics improve, are we finally starting to overcome Moravec's Paradox? Or are we just discovering even deeper layers of complexity we never knew existed?</p><h3>Fallacy 3: The Seduction of Wishful Mnemonics</h3><p>Language doesn't just describe reality; it creates it. In AI, we constantly use anthropomorphic shorthand, saying a system "learns," "understands," or has "goals." Mitchell argues this practice of using "wishful mnemonics" is deeply misleading, fooling not just the public but the researchers themselves.</p><p>When a benchmark is called the "General Language Understanding Evaluation" (GLUE) and a model surpasses the human baseline, headlines declare that AI now understands language better than humans. But does it?</p><p>The term "stochastic parrot" was coined as a powerful antidote, reframing what LLMs do as sophisticated mimicry rather than comprehension. This isn't just a semantic game, Mitchell argues; it creates a flawed mental model that leads to misplaced trust, encouraging us to deploy systems in high-stakes situations where a lack of true understanding can have serious consequences.</p><p>A fair critique is that these terms are a necessary cognitive shorthand. At a certain level of complexity, a system's emergent behavior becomes functionally indistinguishable from "understanding," and arguing about whether it really understands is an unprovable philosophical distraction.</p><p>But that still leaves a crucial question: can we develop a more precise, less anthropomorphic vocabulary to describe AI capabilities? Or is our human-centric language the only tool we have to reason about these new forms of intelligence, with all the baggage that entails?</p><h3>Fallacy 4: The Myth of the Disembodied Mind</h3><p>This is the most philosophical, and in my opinion, the most important fallacy. It's the deep-seated assumption that intelligence is, like software, a form of pure information processing that can be separated from its body.</p><p>This "brain-as-computer" metaphor leads to the belief that AGI is simply a matter of scaling up compute to match the brain's raw processing power. It's challenged by Mitchell and many others with the thesis of embodied cognition, a view from cognitive science which holds that intelligence is inextricably linked to having a body that interacts with the world. If this is correct, then our current approach may just be creating ever-more-sophisticated systems that are fundamentally brittle because they lack grounded understanding.</p><p>This is where we hit the great intellectual battle line in modern AI. The primary counterargument can be framed in terms of Rich Sutton's famous essay, "The Bitter Lesson," which argues that the entire history of AI has taught us that attempts to build in human-like cognitive structures (like embodiment) are always eventually outperformed by general methods that just leverage massive-scale computation.</p><p>From this viewpoint, embodiment isn't a magical prerequisite for intelligence; it's just another fiendishly complex problem that will yield to more data and processing power.</p><p>This tension poses a critical question for the future: do multimodal models that can process images and text represent a meaningful step toward solving the embodiment problem? Or are they just a more sophisticated version of the same disembodied mind, a brain in a slightly larger digital vat?</p><h2>What is Intelligence, Really?</h2><p>As we dig into these fallacies, a deeper pattern emerges. They aren't just four isolated mistakes; they're symptoms of a fundamental schism in how the AI world thinks about intelligence itself. Again, my goal isn't to pick a side but to avoid falling prey to cheap heuristics or ideological banners, and instead evaluate which of these paradigms gives us a more useful map of reality.</p><p>On one side, you have what I&#8217;ll call the Cognitive Paradigm, championed by thinkers like Mitchell and her mentor, superstar AI researcher and philosopher Douglas Hofstadter. This view sees intelligence as a complex, integrated, and embodied phenomenon. It assumes that the things we associate with human intelligence&#8212;common sense, emotions, values, a sense of self&#8212;are likely inseparable components of the whole, emerging from rich interaction with a physical and social world.</p><p>From this perspective, the path to AGI requires a deep, scientific understanding of these integrated components, not just more processing power.</p><p>On the other side is the Computationalist Paradigm, which is the implicit philosophy behind many of today's leading labs, and best captured by The Bitter Lesson. This posits that the biggest breakthroughs have always come from general methods that leverage massive-scale computation&#8212;in other words, from scaling things up.</p><p>In this paradigm, intelligence is a more abstract, substrate-independent quality of optimization. Problems like embodiment aren't fundamental barriers; they are just incredibly complex computational tasks that will eventually be solved by ever-larger models and ever-faster chips.</p><p>Of course, it's not a perfect binary. Most researchers are pragmatists, like me, working somewhere in the messy middle. But these two paradigms represent the poles of the debate, and the tension between them defines the entire field. It shapes which research gets funded, which systems get built, and ultimately, which vision of the future we are collectively racing toward.</p><h2>Why This Debate Matters</h2><p>This debate isn't just an academic parlor game. These fallacies have a massive ripple effect across society because they obscure a fundamental rule of technology and economics: there's no free lunch, only trade-offs.</p><p>The hype generated by fallacious thinking isn't just an innocent mistake; it's the fuel for a powerful economic engine. The intense competition between tech giants, the flood of venture capital, and the geopolitical AI race all depend on a constant narrative of imminent, world-changing breakthroughs. This political economy of hype forces us into a series of dangerous trade-offs.</p><p>First, we trade long-term progress for short-term hype.</p><p>The fallacies create an unstable, boom-and-bust funding cycle. During an AI spring, capital flows to projects that can produce impressive-looking demos, often based on narrow benchmarks. This starves the slow, methodical, foundational research needed to solve the hard problems like common sense and reasoning. The result is a field that lurches from one hype bubble to the next, leaving a trail of abandoned projects and unfulfilled promises that trigger the inevitable AI winter.</p><p>Second, we trade public trust for market excitement.</p><p>The cycle of over-promising and under-delivering is deeply corrosive. When we use wishful mnemonics to describe a system that "understands," and it then fails in spectacular, nonsensical ways in the real world, it breeds public anxiety and skepticism. Recent studies show the public perceives AI scientists more negatively than almost any other field, specifically because of a perceived lack of prudence. This isn't a vague feeling; it's a direct reaction to the unintended consequences of deploying brittle, overhyped systems.</p><p>Finally, and most critically, we trade responsible validation for speed to market.</p><p>This is where the consequences become most severe. Believing a system is on a continuum with general intelligence, or that it truly "understands" language, leads to its premature deployment in high-stakes domains.</p><p>When a mental health chatbot, which is fundamentally, at least today, a sophisticated pattern-matcher, gives harmful advice to a person in crisis, it&#8217;s a direct result of these fallacies. When we over-rely on brittle systems in healthcare, finance, or autonomous vehicles, we are making a dangerous bet, trading real-world safety for the illusion of progress.</p><h2>Conclusion</h2><p>So where does this leave us? The value of Mitchell's fallacies isn't just in spotting hype, but in exposing the deep, productive tension between these two powerful ways of thinking about intelligence. We can't ignore the fallacies, but we also can't deny the incredible, world-altering power of the scaling paradigm that fuels them.</p><p>Mitchell in her paper compares modern AI to alchemy. It produces dazzling, impressive results but it often lacks a deep, foundational theory of intelligence.</p><p>It&#8217;s a powerful metaphor, but I think a more pragmatic conclusion is slightly different. The challenge isn't to abandon our powerful alchemy in search of a pure science of intelligence. The goal, at least from a pragmatist point of view, should be to infuse our current alchemy with the principles of science, to make scaling smarter, safer, and more grounded by integrating the hard-won insights about how intelligence actually works.</p><p>The path forward, I believe, requires more than just intellectual humility. It also requires a willingness to synthesize these seemingly opposed worldviews, and a commitment to a tireless reevaluation of the technology before us. The ultimate question is not if we should choose the path of scaling or the path of cognitive science, but how we can weave them together to guide the raw power of our modern AI alchemy with the deep understanding of a true science of intelligence.</p>]]></content:encoded></item><item><title><![CDATA[AI is Nothing New, Here's the Full History]]></title><description><![CDATA[Rewrite of the first chapter of Mostly Harmless AI with lots of updates]]></description><link>https://blog.apiad.net/p/a-brief-history-of-ai-massive-update</link><guid isPermaLink="false">https://blog.apiad.net/p/a-brief-history-of-ai-massive-update</guid><dc:creator><![CDATA[Alejandro Piad Morffis]]></dc:creator><pubDate>Sun, 10 Aug 2025 10:12:41 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!zfXn!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fea712552-6da2-41bb-9cb5-d51ef0851496_937x528.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p><em>The following is a second draft of the zero-th chapter of my upcoming book <strong>Mostly Harmless AI</strong>. In this second draft we significantly expanded the timeline to add around 3x more events and milestones, while making the chapter more concise and information-dense. We also included a structured timeline in the end for easy reference.</em></p><p><em>PS: Remember you can get <strong>Mostly Harmless AI </strong>while in early access at a reduced price. We are now running a special offer that gives you the PDF and EPUB version of the book as it currently stands, plus guaranteed access to all future editions for just <strong>$5.</strong></em></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://apiad.gumroad.com/l/ai/p0vca01&quot;,&quot;text&quot;:&quot;Get Mostly Harmless AI ($5)&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://apiad.gumroad.com/l/ai/p0vca01"><span>Get Mostly Harmless AI ($5)</span></a></p><div><hr></div><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!zfXn!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fea712552-6da2-41bb-9cb5-d51ef0851496_937x528.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!zfXn!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fea712552-6da2-41bb-9cb5-d51ef0851496_937x528.png 424w, https://substackcdn.com/image/fetch/$s_!zfXn!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fea712552-6da2-41bb-9cb5-d51ef0851496_937x528.png 848w, https://substackcdn.com/image/fetch/$s_!zfXn!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fea712552-6da2-41bb-9cb5-d51ef0851496_937x528.png 1272w, https://substackcdn.com/image/fetch/$s_!zfXn!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fea712552-6da2-41bb-9cb5-d51ef0851496_937x528.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!zfXn!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fea712552-6da2-41bb-9cb5-d51ef0851496_937x528.png" width="937" height="528" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/ea712552-6da2-41bb-9cb5-d51ef0851496_937x528.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:528,&quot;width&quot;:937,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!zfXn!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fea712552-6da2-41bb-9cb5-d51ef0851496_937x528.png 424w, https://substackcdn.com/image/fetch/$s_!zfXn!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fea712552-6da2-41bb-9cb5-d51ef0851496_937x528.png 848w, https://substackcdn.com/image/fetch/$s_!zfXn!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fea712552-6da2-41bb-9cb5-d51ef0851496_937x528.png 1272w, https://substackcdn.com/image/fetch/$s_!zfXn!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fea712552-6da2-41bb-9cb5-d51ef0851496_937x528.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Garry Kasparov vs Deep Blue in 1997, the first time a computer program beat a World Chess Champion. Taken from <a href="https://thegrandmalogbook.blogspot.com/2019/05/deep-blue-machine-wins-world-chess.html.">The Grandma&#8217;s Logbook</a>.</figcaption></figure></div><p>For centuries, we humans have been captivated by the idea of a thinking machine. This isn&#8217;t some modern tech obsession; the dream of automatons and artificial minds is woven through our myths and philosophies. But the formal quest to build one began only in the mid-20th century, and its history has been a dramatic back-and-forth between two core, seemingly antagonistic approaches.</p><p>One path, rooted in the logic of rationalism, sought to build intelligence from the top down by programming explicit rules and symbols. The other, inspired by the biological empiricism of the brain, tried to create it from the bottom up by allowing machines to learn patterns from data and experience.</p><p>This chapter explores the history of Artificial Intelligence (AI) through the lens of this great intellectual tug-of-war. It is a journey through distinct eras, each defined by which philosophy was dominant, what external factors like computing power and data availability enabled its rise, and how these forces have finally begun to converge, leading us to the powerful tools we have today.</p><p>In the appendix of this book, we will present a detailed chronology of the most important milestones in the history of artificial intelligence.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://blog.apiad.net/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://blog.apiad.net/subscribe?"><span>Subscribe now</span></a></p><h2><strong>The Foundational Era (1940s - 1960s)</strong></h2><p>The dawn of AI was a time of immense optimism, where the very concept of a &#8220;thinking machine&#8221; was formalized. Even before the field had a name, its philosophical and theoretical groundwork was being laid. In his seminal 1950 paper, &#8220;Computing Machinery and Intelligence,&#8221; Alan Turing proposed the <em>Turing Test</em>, setting a profound, long-term goal: to create a machine whose conversation was indistinguishable from a human&#8217;s. In parallel, the work of Warren McCulloch and Walter Pitts in 1943 on the first mathematical model of an artificial neuron planted the seeds of the <em>connectionist</em> dream&#8212;the idea that intelligence could emerge from simple, brain-like units.</p><p>When the field was officially christened at the <strong>Dartmouth Workshop</strong> in the summer of 1956, the symbolic, logic-based paradigm took the lead. Researchers believed that human thought could be mechanized, and the primary task was to build systems that could manipulate symbols according to formal rules. This vision was solidified by the creation of the <strong>LISP</strong> programming language in 1958, a tool perfectly suited for this symbolic manipulation.</p><p>Yet, in that same year, the connectionist counterpoint took physical form. Frank Rosenblatt developed the <strong>Perceptron</strong>, the first artificial neural network that could learn to classify patterns on its own, offering a tangible, bottom-up alternative to pure logic.</p><p>The public imagination was quickly captured by early demonstrations of AI&#8217;s potential. The <strong>Unimate</strong> (1961), the first industrial robot, showed that machines could perform physical labor. <strong>Shakey the Robot</strong> (1966) took this a step further, becoming the first mobile robot to perceive its environment and reason about its own actions. Joseph Weizenbaum&#8217;s <strong>ELIZA</strong> (1964), a simple chatbot that simulated a psychotherapist, revealed how easily humans could attribute intelligence and understanding to a machine. But this initial optimism soon collided with reality.</p><p>The ambitious promises of creating true intelligence went unfulfilled, and in 1969, the publication of the book <em>Perceptrons</em> by Marvin Minsky and Seymour Papert delivered a critical blow. By rigorously detailing the mathematical limitations of simple neural networks, the book effectively starved the connectionist school of funding, ushering in the first <strong>&#8220;AI Winter&#8221;</strong> and ensuring that the symbolic approach would dominate the field for the next decade.</p><h2><strong>The Knowledge Era (1970s - 1980s)</strong></h2><p>With connectionism on the back burner, the field regrouped around a more pragmatic goal: instead of trying to create general intelligence, researchers focused on capturing and mechanizing human expertise in narrow domains. This led to the golden age of <em>expert systems</em>, the first commercially successful form of AI. The core idea was to interview a human expert, painstakingly encode their knowledge into a vast set of &#8220;if-then&#8221; rules, and use a reasoning engine to produce solutions.</p><p>This approach yielded impressive results. <strong>SHRDLU</strong> (1972) was a landmark natural language program that could understand and respond to commands about a simulated world of blocks, showcasing a new level of sophistication for symbolic AI. Expert systems like <strong>MYCIN</strong> (1972) could diagnose blood infections as accurately as junior doctors, while others like <strong>DENDRAL</strong> and <strong>PROSPECTOR</strong> found success in chemistry and geology. This culminated in the first true commercial boom, as companies like Digital Equipment Corporation used the <strong>XCON</strong> system (1980) to configure complex computer orders, saving millions of dollars. The ambition of this paradigm reached its peak with the <strong>Cyc project</strong> (1984), a monumental effort to manually encode all of human common sense knowledge into a single, massive database.</p><p>While the symbolic school reigned, a connectionist undercurrent continued to flow. In Japan, Kunihiko Fukushima&#8217;s work on the <strong>Neocognitron</strong> (1980) created a hierarchical, multi-layered neural network for visual recognition that was the direct ancestor of the architectures that would dominate computer vision decades later. And in 1986, the popularization of the <em>backpropagation</em> algorithm provided an efficient method for training these deeper networks, solving a critical problem that had plagued the field for years.</p><p>However, the symbolic paradigm&#8217;s dominance was destined to end. Expert systems were incredibly brittle; they were expensive to build, nearly impossible to update, and would fail completely if faced with a situation not explicitly covered by their rules. The hype, fueled in part by Japan&#8217;s ambitious <strong>Fifth Generation Computer Systems project</strong> (1982), once again outpaced reality. When the specialized hardware market collapsed in 1987, the field plunged into its second <strong>&#8220;AI Winter,&#8221;</strong> leaving the promise of AI unfulfilled once more.</p><h2><strong>The Internet Era (1990 - 2011)</strong></h2><p>The end of the second AI winter was not driven by a single algorithmic breakthrough, but by two external forces that changed everything: the public launch of the <strong>World Wide Web</strong> in 1991 and the invention of the <strong>Graphics Processing Unit (GPU)</strong> in The web began generating an unimaginable ocean of data&#8212;text, images, and user interactions. The GPU, particularly after the release of <strong>NVIDIA&#8217;s CUDA</strong> platform in 2007, provided a way to perform the massive parallel computations needed to learn from that data. These two catalysts&#8212;data and computation&#8212;created the perfect conditions for the statistical, learning-based paradigm to finally thrive.</p><p>Before deep learning took hold, this new environment fueled the rise of &#8220;shallow&#8221; machine learning. Algorithms like <strong>Support Vector Machines (SVMs)</strong> (1995) became dominant, and open-source libraries like <strong>scikit-learn</strong> (2007) made them accessible to a wide audience. This approach had a massive real-world impact, powering the <strong>recommender systems</strong> of companies like Amazon and sparking global competitions like the <strong>Netflix Prize</strong> (2006). The infrastructure to handle this new scale was built in parallel, with Google&#8217;s <strong>MapReduce</strong> (2004) providing the blueprint for big data processing.</p><p>During this time, foundational work in reinforcement learning was also bearing fruit. <strong>TD-Gammon</strong> (1992) showed that a program could teach itself to play backgammon at a superhuman level, and the textbook by <strong>Sutton &amp; Barto</strong> (1998) codified the field for a new generation. The seeds for the coming deep learning revolution were being sown with the invention of key architectures like <strong>LSTMs</strong> (1997) and <strong>LeNet-5</strong> (1998), while the creation of the massive <strong>ImageNet</strong> dataset (2009) provided the high-quality benchmark that would soon ignite it.</p><p>AI also became a tangible part of public life. The symbolic paradigm had its last great public triumphs with <strong>Deep Blue&#8217;s</strong> victory over Garry Kasparov in chess (1997) and <strong>Watson&#8217;s</strong> win on <em>Jeopardy!</em> (2011). But the future belonged to learning-based systems. <strong>Dragon NaturallySpeaking</strong> (1997) brought continuous speech recognition to consumers. Competitions like the <strong>DARPA Grand Challenge</strong> (2004) spurred the development of autonomous vehicles. Consumer products like the <strong>Roomba</strong> (2002) and Microsoft&#8217;s <strong>Kinect</strong> (2010) brought robotics and computer vision into millions of homes. With the launch of <strong>Siri</strong> in 2011, a conversational AI assistant was finally in everyone&#8217;s pocket.</p><h2><strong>The Deep Learning Era (2012 - 2018)</strong></h2><p>If the Internet Era set the stage, 2012 was the year the curtain rose on the deep learning revolution. In October, a deep convolutional neural network called <strong>AlexNet</strong>, trained on GPUs using the ImageNet dataset, shattered all previous records in the annual image recognition competition. This &#8220;ImageNet Moment&#8221; proved the overwhelming superiority of deep, data-driven learning and kicked off a Cambrian explosion of breakthroughs.</p><p>This period was a stunning validation of what AI researcher Rich Sutton would later call <em>The Bitter Lesson</em>: that general methods leveraging massive computation almost always outperform approaches that rely on hand-crafted human knowledge. The field progressed at a breathtaking pace. In natural language processing, <strong>Word2Vec</strong> (2013) provided a powerful way to represent the meaning of words as vectors. In generative AI, <strong>Generative Adversarial Networks (GANs)</strong> (2014) introduced a novel way to create stunningly realistic synthetic images. New architectures like <strong>ResNet</strong> (2015) allowed for the creation of networks hundreds of layers deep, solving a fundamental barrier to scale.</p><p>These new techniques allowed AI to achieve superhuman performance in increasingly complex domains. <strong>Deep Q-Networks (DQN)</strong> (2013) learned to master classic Atari games directly from pixels, and in a landmark event in March 2016, <strong>AlphaGo</strong> defeated Lee Sedol, the world&#8217;s greatest Go player. The revolution was powered by a new generation of open-source tools like <strong>TensorFlow</strong> (2015) and <strong>PyTorch</strong> (2016) that democratized deep learning, as well as specialized hardware like Google&#8217;s <strong>Tensor Processing Units (TPUs)</strong> (2016). The era culminated with the invention of the <strong>Transformer architecture</strong> in 2017.</p><p>However, 2018 marked a turning point. In March, the <strong>Cambridge Analytica scandal</strong> revealed how machine learning algorithms, fed by the personal data of millions of Facebook users, had been used for political manipulation, sparking a global reckoning over data privacy and the ethics of AI.</p><p>That same month, though, the scientific community formally recognized the field&#8217;s impact, awarding the <strong>Turing Award</strong> to Geoffrey Hinton, Yann LeCun, and Yoshua Bengio for their foundational work. The Turing Award&#8211;aptly named after the most important figure in the history of Computer Science as a whole, let alone Artificial Intelligence&#8211;is the most prestigious academic award in computing, akin to the Nobel Prize.</p><h2><strong>The Generative Era (2019 - Present)</strong></h2><p>The current era is defined by the application of the Transformer architecture at an unprecedented scale. By training these models on vast swaths of the internet, researchers discovered that quantitative leaps in size and data could lead to qualitative leaps in capability, resulting in models with emergent generative and reasoning abilities that have captured the world&#8217;s attention.</p><p>The first sign of this new power came with <strong>GPT-2</strong> in 2019, whose ability to generate coherent text was so advanced that its release was initially staged due to safety concerns. Its successor, <strong>GPT-3</strong> (2020), demonstrated that massive scale could unlock &#8220;few-shot&#8221; learning, the ability to perform tasks it was never explicitly trained on.</p><p>Soon, this generative power was applied beyond text to images with <strong>DALL-E</strong> (2021) and to code with <strong>GitHub Copilot</strong> (2021). But the cultural tipping point arrived in November 2022 with the release of <strong>ChatGPT</strong>. Its simple, conversational interface made the power of Large Language Models (LLMs) accessible to millions, sparking a global phenomenon and a new wave of investment.</p><p>This boom was accompanied by a powerful open-source counter-movement. The release of the image-generation model <strong>Stable Diffusion</strong> (August 2022) and Meta&#8217;s <strong>Llama</strong> models (2023) democratized access to powerful foundation models, sparking a &#8220;Llamaverse&#8221; of community-driven innovation. The field is now a global race, with major competitors like Anthropic&#8217;s <strong>Claude</strong> (2023), Google&#8217;s multimodal <strong>Gemini</strong> (December 2023), and China&#8217;s <strong>DeepSeek R1</strong> (January 2025) demonstrating capabilities on par with the best proprietary systems.</p><p>Perhaps the most profound impact of this new era has been in science. In November 2020, <strong>AlphaFold 2</strong> solved the 50-year-old grand challenge of protein folding, a breakthrough of such significance that its creators were awarded the <strong>Nobel Prize</strong> in This demonstrated that AI could be a tool not just for automating tasks, but for accelerating fundamental scientific discovery. The road ahead now points towards more autonomous, &#8220;agentic&#8221; systems, where the AI transitions from a single-response tool to a collaborator capable of executing complex, multi-step tasks on our behalf.</p><h2><strong>Conclusion</strong></h2><p>We&#8217;ve journeyed through decades of ambition, breakthroughs, and tough realizations. What we&#8217;ve seen is a constant back-and-forth, a dynamic dance between two powerful ideas: the precise, rule-based, inflexible logic of symbolic AI and the adaptable, pattern-based, unreliable power of statistical AI. This dance, as we&#8217;ve explored, often mirrors the philosophical tension between rationalism and empiricism.</p><p>Today, AI stands at a fascinating crossroads. The purely statistical systems that define the generative era have achieved incredible feats. Yet, we are beginning to see diminishing returns. With GPT-4 having been a high point, many newer models have made only incremental progress, suggesting that simply scaling the existing paradigm may not be enough to achieve the next level of intelligence. This has led some to speculate that we may be on the brink of a third <strong>&#8220;AI Winter,&#8221;</strong> as the hype once again outpaces the reality of the technology&#8217;s capabilities.</p><p>The recent focus on reasoning models and agentic systems seems capable of fueling the statistical hype a bit longer, but a growing number of researchers are realizing we may not achieve Artificial General Intelligence (AGI) purely by scaling. This brings us to a crucial realization: the future of AI likely isn&#8217;t about one approach winning out over the other, but about intelligently combining them. The inherent limitations of today&#8217;s models&#8212;their unreliability and lack of true reasoning&#8212;have sparked a renewed interest in the long-neglected symbolic paradigm. Hybrid approaches, particularly <em>neuro-symbolic AI</em>, which seek to integrate the pattern-matching strengths of neural networks with the rigorous logic of symbolic systems, hold immense potential for creating the next breakthrough.</p><p>Whether a winter is coming or not, it is indisputable that AI has already had a profound impact on society and will continue to do so. The history of this field is far from finished. It is a living, breathing, civilization-wide project with the potential to transform society for the better or, some believe, to become our ultimate doom. Everyone has a place here: technologists, yes, but also humanists, economists, historians, artists, and policymakers. The next few years promise to be extremely exciting, and you can be a part of shaping what comes next.</p><div><hr></div><p><em>Thanks for reading! Below you&#8217;ll find the full expanded timeline. Please let me know if you think I missed something or made any mistakes. All feedback is appreciated!</em></p><p><em>PS: Claim your copy of <strong>Mostly Harmless AI</strong> for only <strong>$5</strong> in the link below.</em></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://apiad.gumroad.com/l/ai/p0vca01&quot;,&quot;text&quot;:&quot;Get Mostly Harmless AI ($5)&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://apiad.gumroad.com/l/ai/p0vca01"><span>Get Mostly Harmless AI ($5)</span></a></p><div><hr></div><h2><strong>Appendix: A Chronology of Artificial Intelligence (1956-2025)</strong></h2><p>This timeline details the key breakthroughs, conceptual shifts, and landmark achievements in the field of Artificial Intelligence, tracing its path from a niche academic discipline to a transformative global technology.</p><h3><strong>The Foundational Era (1940s - Late 1960s)</strong></h3><ul><li><p><strong>(Science) 1943:</strong> The First Artificial Neuron is proposed by Warren McCulloch and Walter Pitts, laying the theoretical foundation for connectionism.</p></li><li><p><strong>(Science) October 1950:</strong> Alan Turing publishes &#8220;Computing Machinery and Intelligence,&#8221; introducing the <strong>Turing Test</strong>.</p></li><li><p><strong>(Social) Summer 1956:</strong> The <strong>Dartmouth Workshop</strong> is held, where John McCarthy coins the term &#8220;Artificial Intelligence,&#8221; formally establishing the field.</p></li><li><p><strong>(Tech) 1958:</strong> Frank Rosenblatt develops the <strong>Perceptron</strong>, the first artificial neural network capable of learning.</p></li><li><p><strong>(Tech) 1958:</strong> John McCarthy develops the <strong>LISP</strong> programming language, which becomes the standard for symbolic AI.</p></li><li><p><strong>(Product) 1961:</strong> The <strong>Unimate</strong> industrial robot begins work on a General Motors assembly line.</p></li><li><p><strong>(Product) 1964:</strong> Joseph Weizenbaum creates the chatbot <strong>ELIZA</strong> at MIT.</p></li><li><p><strong>(Tech) 1966:</strong> The Stanford Research Institute (SRI) develops <strong>Shakey</strong>, the first mobile robot to reason about its own actions.</p></li><li><p><strong>(Social) 1969:</strong> The publication of <em>Perceptrons</em> by Marvin Minsky and Seymour Papert marks the beginning of the first <strong>&#8220;AI Winter.&#8221;</strong></p></li></ul><h3><strong>The Knowledge Era (1970s - 1989)</strong></h3><ul><li><p><strong>(Tech) 1972:</strong> Terry Winograd develops <strong>SHRDLU</strong>, a groundbreaking natural language understanding program.</p></li><li><p><strong>(Tech) 1972:</strong> The logic programming language <strong>Prolog</strong> is created by Alain Colmerauer and Philippe Roussel, becoming a key tool for symbolic AI.</p></li><li><p><strong>(Tech) 1972:</strong> Stanford University develops the <strong>MYCIN</strong> expert system for medical diagnosis.</p></li><li><p><strong>(Science) 1974:</strong> Marvin Minsky publishes his influential paper on <strong>&#8220;Frames&#8221;</strong> theory, a new paradigm for knowledge representation.</p></li><li><p><strong>(Tech) Late 1970s:</strong> Expert systems like <strong>DENDRAL</strong> (for chemistry) and <strong>PROSPECTOR</strong> (for geology) demonstrate success in specialized scientific domains.</p></li><li><p><strong>(Science) 1980:</strong> Kunihiko Fukushima develops the <strong>Neocognitron</strong>, an early hierarchical neural network that is the direct ancestor of modern Convolutional Neural Networks (CNNs).</p></li><li><p><strong>(Product) 1980:</strong> Digital Equipment Corporation begins using the <strong>XCON</strong> expert system, marking a high point for commercial AI.</p></li><li><p><strong>(Social) 1982:</strong> Japan&#8217;s Ministry of International Trade and Industry begins the <strong>Fifth Generation Computer Systems project</strong>, a massive initiative to build a new generation of computers based on logic programming, sparking competitive AI investment worldwide.</p></li><li><p><strong>(Tech) 1984:</strong> The <strong>Cyc project</strong> is initiated by Douglas Lenat, an ambitious attempt to manually encode all of human common sense knowledge into a single knowledge base.</p></li><li><p><strong>(Science) 1986:</strong> The <strong>backpropagation</strong> algorithm is popularized by Geoffrey Hinton, David Rumelhart, and Ronald Williams.</p></li><li><p><strong>(Social) 1987:</strong> The collapse of the LISP machine market signals the start of the second <strong>&#8220;AI Winter.&#8221;</strong></p></li></ul><h3><strong>The Internet Era (1990 - 2011)</strong></h3><ul><li><p><strong>(Social) August 1991:</strong> The World Wide Web project is released to the public, creating the infrastructure for the data explosion that would fuel modern AI.</p></li><li><p><strong>(Science) 1992:</strong> Gerald Tesauro develops <strong>TD-Gammon</strong>, a backgammon program that trains to a superhuman level using reinforcement learning, a landmark for the field.</p></li><li><p><strong>(Social) 1995:</strong> Stuart Russell and Peter Norvig publish &#8220;Artificial Intelligence: A Modern Approach,&#8221; which becomes the leading textbook in the field for decades.</p></li><li><p><strong>(Science) 1995:</strong> The <strong>Support Vector Machine (SVM)</strong> algorithm is popularized by Corinna Cortes and Vladimir Vapnik.</p></li><li><p><strong>(Social) May 1997:</strong> IBM&#8217;s <strong>Deep Blue</strong> defeats world chess champion Garry Kasparov.</p></li><li><p><strong>(Science) 1997:</strong> Sepp Hochreiter and J&#252;rgen Schmidhuber invent the <strong>Long Short-Term Memory (LSTM)</strong> network.</p></li><li><p><strong>(Product) 1997:</strong> <strong>Dragon NaturallySpeaking</strong> is released, becoming the first widely available continuous speech recognition software for consumers.</p></li><li><p><strong>(Product) September 1998:</strong> Google is founded, and Amazon patents its item-to-item collaborative filtering, marking the start of large-scale data-driven AI applications.</p></li><li><p><strong>(Science) 1998:</strong> Richard Sutton and Andrew Barto publish &#8220;Reinforcement Learning: An Introduction,&#8221; a seminal textbook that codifies the field.</p></li><li><p><strong>(Tech) November 1998:</strong> Yann LeCun and his team develop <strong>LeNet-5</strong>, a pioneering Convolutional Neural Network (CNN).</p></li><li><p><strong>(Tech) August 1999:</strong> NVIDIA releases the <strong>GeForce 256</strong>, marketed as the world&#8217;s first <strong>Graphics Processing Unit (GPU)</strong>.</p></li><li><p><strong>(Product) November 2000:</strong> Honda unveils its <strong>ASIMO</strong> humanoid robot, a landmark in robotics and motion planning.</p></li><li><p><strong>(Product) September 2002:</strong> iRobot releases the <strong>Roomba</strong>, the first commercially successful autonomous home robot.</p></li><li><p><strong>(Tech) 2004:</strong> Google publishes its paper on <strong>MapReduce</strong>, a programming model for processing massive datasets that becomes foundational to big data infrastructure.</p></li><li><p><strong>(Social) March 2004:</strong> The first <strong>DARPA Grand Challenge</strong> for autonomous vehicles is held, sparking a new wave of research in self-driving technology.</p></li><li><p><strong>(Social) October 2006:</strong> The <strong>Netflix Prize</strong> competition is launched, galvanizing research in recommender systems.</p></li><li><p><strong>(Science) 2006:</strong> Geoffrey Hinton develops <strong>Deep Belief Networks</strong>, introducing effective strategies for unsupervised layer-wise pre-training.</p></li><li><p><strong>(Tech) June 2007:</strong> NVIDIA releases <strong>CUDA</strong>, a parallel computing platform that allows developers to use GPUs for general-purpose processing.</p></li><li><p><strong>(Tech) June 2007:</strong> David Cournapeau develops <strong>scikit-learn</strong> as a Google Summer of Code project.</p></li><li><p><strong>(Science) 2009:</strong> A Stanford team led by Andrew Ng publishes a paper showing that GPUs can make training deep neural networks 10-100 times faster.</p></li><li><p><strong>(Tech) 2009:</strong> The <strong>ImageNet</strong> dataset is created by Fei-Fei Li&#8217;s team at Stanford.</p></li><li><p><strong>(Product) November 2010:</strong> Microsoft releases the <strong>Kinect</strong>, a consumer device that brings sophisticated real-time computer vision into millions of homes.</p></li><li><p><strong>(Social) February 2011:</strong> IBM&#8217;s <strong>Watson</strong> wins the quiz show <em>Jeopardy!</em>.</p></li><li><p><strong>(Product) October 2011:</strong> Apple integrates <strong>Siri</strong> into the iPhone 4S, making conversational AI assistants a mainstream consumer product.</p></li></ul><h3><strong>The Deep Learning Era (2012 - 2018)</strong></h3><ul><li><p><strong>(Social) April 2012:</strong> <strong>Coursera</strong> is founded, and Andrew Ng&#8217;s Machine Learning course begins to democratize AI education.</p></li><li><p><strong>(Science) June 2012:</strong> The <strong>Google Brain &#8220;Cat Neuron&#8221;</strong> project demonstrates that a neural network can learn high-level concepts from unlabeled data.</p></li><li><p><strong>(Social) October 2012:</strong> <strong>AlexNet</strong>, a deep CNN trained on GPUs, wins the ImageNet competition by a massive margin, officially kicking off the deep learning revolution.</p></li><li><p><strong>(Tech) 2013:</strong> Google researchers led by Tomas Mikolov release <strong>Word2Vec</strong>, a highly efficient method for creating word embeddings that revolutionizes NLP.</p></li><li><p><strong>(Science) December 2013:</strong> DeepMind publishes its work on <strong>Deep Q-Networks (DQN)</strong>, demonstrating an AI that can learn to play Atari games at a superhuman level from raw pixels.</p></li><li><p><strong>(Science) June 2014:</strong> Ian Goodfellow and his colleagues introduce <strong>Generative Adversarial Networks (GANs)</strong>, sparking a revolution in generative AI for images.</p></li><li><p><strong>(Science) December 2015:</strong> A team at Microsoft Research introduces <strong>Deep Residual Networks (ResNet)</strong>, allowing for the training of much deeper neural networks.</p></li><li><p><strong>(Tech) November 2015:</strong> Google releases the <strong>TensorFlow</strong> open-source library, making deep learning more accessible.</p></li><li><p><strong>(Social) March 2016:</strong> Google DeepMind&#8217;s <strong>AlphaGo</strong> defeats world Go champion Lee Sedol.</p></li><li><p><strong>(Tech) May 2016:</strong> Google announces it has been using custom-built <strong>Tensor Processing Units (TPUs)</strong>, specialized hardware for deep learning, in its data centers.</p></li><li><p><strong>(Tech) September 2016:</strong> Facebook AI Research (FAIR) releases <strong>PyTorch</strong>, which becomes a major deep learning framework.</p></li><li><p><strong>(Science) June 2017:</strong> Researchers at Google publish &#8220;Attention Is All You Need,&#8221; introducing the <strong>Transformer architecture</strong>.</p></li><li><p><strong>(Social) March 2018:</strong> The <strong>Cambridge Analytica scandal</strong> breaks, revealing that the personal data of millions of Facebook users was used for political advertising, sparking a global conversation on data privacy and the ethics of machine learning.</p></li><li><p><strong>(Social) March 2018:</strong> Geoffrey Hinton, Yann LeCun, and Yoshua Bengio are awarded the <strong>ACM Turing Award</strong> for their foundational work on deep learning.</p></li><li><p><strong>(Social) December 2018:</strong> DeepMind&#8217;s <strong>AlphaFold</strong> makes its stunning debut at the CASP13 competition.</p></li></ul><h3><strong>The Generative Era (2019 - 2025)</strong></h3><ul><li><p><strong>(Tech) February 2019:</strong> OpenAI announces <strong>GPT-2</strong> but initially withholds the full model due to safety concerns.</p></li><li><p><strong>(Product) November 2019:</strong> OpenAI releases the full version of the GPT-2 model.</p></li><li><p><strong>(Product) June 2020:</strong> OpenAI releases <strong>GPT-3</strong> via a private API.</p></li><li><p><strong>(Social) November 2020:</strong> <strong>AlphaFold 2</strong> achieves revolutionary accuracy at the CASP14 competition, effectively solving the protein folding problem.</p></li><li><p><strong>(Product) January 2021:</strong> OpenAI introduces <strong>DALL-E</strong>, a model that generates images from text.</p></li><li><p><strong>(Product) June 2021:</strong> <strong>GitHub Copilot</strong> is launched as a technical preview.</p></li><li><p><strong>(Tech) April 2022:</strong> Google announces its <strong>Pathways Language Model (PaLM)</strong>.</p></li><li><p><strong>(Social) June 2022:</strong> Google engineer Blake Lemoine publicly claims the <strong>LaMDA</strong> model is sentient.</p></li><li><p><strong>(Product) August 2022:</strong> The open-source release of <strong>Stable Diffusion</strong> democratizes high-quality image generation.</p></li><li><p><strong>(Product) November 2022:</strong> OpenAI releases <strong>ChatGPT</strong> to the public.</p></li><li><p><strong>(Tech) February 2023:</strong> Meta releases the first <strong>Llama</strong> model to the research community.</p></li><li><p><strong>(Product) March 2023:</strong> Anthropic releases its first <strong>Claude</strong> model.</p></li><li><p><strong>(Tech) July 2023:</strong> Meta releases <strong>Llama 2</strong> with a commercial-use license, sparking the open-source &#8220;Llamaverse.&#8221;</p></li><li><p><strong>(Product) December 2023:</strong> Google releases <strong>Gemini</strong>, its first natively multimodal model.</p></li><li><p><strong>(Social) October 2024:</strong> Nobel Prizes are awarded to <strong>Geoffrey Hinton</strong>, <strong>John J. Hopfield</strong>, <strong>Demis Hassabis</strong>, and <strong>John Jumper</strong> for their work in AI.</p></li><li><p><strong>(Product) January 20, 2025:</strong> DeepSeek AI releases its <strong>DeepSeek R1</strong> model and chatbot, marking a turning point in the global AI race.</p></li><li><p><strong>(Product) August 2025 (GPT-5):</strong> OpenAI releases <strong>GPT-5</strong>, with a focus on more autonomous, &#8220;agentic&#8221; capabilities, and a rather underwhelming reception.</p></li></ul><div><hr></div><p><em>What? Still here? Ok, here&#8217;s another nice button for you to click. Thanks!</em></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://blog.apiad.net/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://blog.apiad.net/subscribe?"><span>Subscribe now</span></a></p><p></p>]]></content:encoded></item><item><title><![CDATA[Artificial Intelligence for Creative Professionals]]></title><description><![CDATA[Chapter 10 of Mostly Harmless AI]]></description><link>https://blog.apiad.net/p/artificial-intelligence-for-creative</link><guid isPermaLink="false">https://blog.apiad.net/p/artificial-intelligence-for-creative</guid><dc:creator><![CDATA[Alejandro Piad Morffis]]></dc:creator><pubDate>Thu, 07 Aug 2025 10:00:51 GMT</pubDate><enclosure url="https://images.unsplash.com/photo-1452802447250-470a88ac82bc?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwyMHx8cGFpbnQlMjBicnVzaHxlbnwwfHx8fDE3NTQ1MTM2ODZ8MA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p><em>The following is a first draft of my upcoming book <strong><a href="https://store.apiad.net/l/ai/fiftyoff">Mostly Harmless AI</a></strong>. This one is about AI as a tool for augmenting creativity. I hope you find it interesting, and please, do leave me your feedback in the end.</em></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://images.unsplash.com/photo-1452802447250-470a88ac82bc?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwyMHx8cGFpbnQlMjBicnVzaHxlbnwwfHx8fDE3NTQ1MTM2ODZ8MA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://images.unsplash.com/photo-1452802447250-470a88ac82bc?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwyMHx8cGFpbnQlMjBicnVzaHxlbnwwfHx8fDE3NTQ1MTM2ODZ8MA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080 424w, https://images.unsplash.com/photo-1452802447250-470a88ac82bc?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwyMHx8cGFpbnQlMjBicnVzaHxlbnwwfHx8fDE3NTQ1MTM2ODZ8MA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080 848w, https://images.unsplash.com/photo-1452802447250-470a88ac82bc?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwyMHx8cGFpbnQlMjBicnVzaHxlbnwwfHx8fDE3NTQ1MTM2ODZ8MA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080 1272w, https://images.unsplash.com/photo-1452802447250-470a88ac82bc?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwyMHx8cGFpbnQlMjBicnVzaHxlbnwwfHx8fDE3NTQ1MTM2ODZ8MA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080 1456w" sizes="100vw"><img src="https://images.unsplash.com/photo-1452802447250-470a88ac82bc?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwyMHx8cGFpbnQlMjBicnVzaHxlbnwwfHx8fDE3NTQ1MTM2ODZ8MA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080" width="4000" height="3000" data-attrs="{&quot;src&quot;:&quot;https://images.unsplash.com/photo-1452802447250-470a88ac82bc?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwyMHx8cGFpbnQlMjBicnVzaHxlbnwwfHx8fDE3NTQ1MTM2ODZ8MA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:3000,&quot;width&quot;:4000,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;mixed paints in a plate&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="mixed paints in a plate" title="mixed paints in a plate" srcset="https://images.unsplash.com/photo-1452802447250-470a88ac82bc?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwyMHx8cGFpbnQlMjBicnVzaHxlbnwwfHx8fDE3NTQ1MTM2ODZ8MA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080 424w, https://images.unsplash.com/photo-1452802447250-470a88ac82bc?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwyMHx8cGFpbnQlMjBicnVzaHxlbnwwfHx8fDE3NTQ1MTM2ODZ8MA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080 848w, https://images.unsplash.com/photo-1452802447250-470a88ac82bc?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwyMHx8cGFpbnQlMjBicnVzaHxlbnwwfHx8fDE3NTQ1MTM2ODZ8MA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080 1272w, https://images.unsplash.com/photo-1452802447250-470a88ac82bc?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwyMHx8cGFpbnQlMjBicnVzaHxlbnwwfHx8fDE3NTQ1MTM2ODZ8MA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Photo by <a href="https://unsplash.com/@mikepetrucci">Mike Petrucci</a> on <a href="https://unsplash.com">Unsplash</a></figcaption></figure></div><p>More than a century before the first microchip was ever conceived, the brilliant mathematician Ada Lovelace looked at the plans for an early mechanical computer and saw beyond mere calculation. She famously envisioned a future where such an engine &#8220;might compose elaborate and scientific pieces of music,&#8221; dreaming of the day machines would not just compute, but create. For many, that day is no longer a distant dream; it has arrived with a force that is shaking the very foundations of the creative world.</p><p>The arrival of powerful generative AI has ignited a fierce and deeply personal debate within every creative community. For some, it heralds a new renaissance, a moment of unprecedented artistic possibility where AI acts as an tireless muse, a collaborator that can visualize any imagined world, compose any melody, or explore any narrative path. For others, it signals an existential threat&#8212;the end of art as we know it, a force that threatens to devalue human skill, automate creativity, and flood the world with a deluge of soulless, machine-generated content.</p><p>It is crucial to acknowledge a third, equally valid perspective. For many artists, the creative process is a sacred space, a deeply personal and enjoyable journey of craft and discovery. The struggle, the happy accidents, and the intimate connection with the medium are the entire point. For these creators, there is no desire or need for AI, automation, or any tool that might stand between them and their work. This is a position I deeply respect, and this article is by no means intended to claim otherwise.</p><p>This chapter is for those who, for their own reasons, wish to explore the other paths. It makes no normative claim on whether AI is &#8220;good&#8221; or &#8220;bad&#8221; for art. Instead, my goal is to provide a practical framework for creative professionals who want to harness AI as a powerful collaborative partner&#8212;whether for pragmatic goals, like enhancing productivity, or for artistic ones, like exploring new creative frontiers beyond the limits of their own cognition. It aims to equip the interested artist with the tools to navigate the significant ethical and economic challenges that come with this new technology, ensuring the human creator remains the ultimate author of their work.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://blog.apiad.net/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">The Computist Journal is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><h2><strong>Can Artificial Intelligence be Creative?</strong></h2><p>Before we dive into the practicality of using these new tools, it is worth addressing the philosophical question that hangs over every discussion of AI and art: Is the machine actually creative? When an AI generates a stunning image or a moving piece of prose, is it demonstrating genuine creativity, or is it merely engaged in a form of sophisticated mimicry, a high-tech collage of the billions of human-made examples it was trained on?</p><p>A useful way to think about this is through a famous thought experiment in philosophy of mind known as <em>Mary&#8217;s Room</em>. Imagine Mary, a brilliant neuroscientist who has spent her entire life in a black-and-white room. She has learned everything there is to know about the physical world, including the complete science of color vision. She knows exactly what happens in the brain when a person sees the color red, but she has never actually <em>seen</em> red before. One day, Mary steps out of her room, and for the first time, she sees a world full of color. The question is, does she learn something fundamentally new?</p><p>If the answer is yes&#8212;that she learns something new from what it is like to actually see red rather than just knowing about it&#8212;then it implies that a complete set of facts about the world is not the same as experiencing the world. This is the crux of the issue with generative AI. Like Mary, these models have read everything. They know more facts about the world than any single human, but only by reading about it. They know the physics of the color red, the cultural symbolism of red, and the statistical probability of the word &#8220;red&#8221; appearing next to &#8220;apple.&#8221; But they have never experienced what <em>seeing red</em> means.</p><p>If you believe Mary learns something new upon leaving her room, then it follows that generative AI, as it currently stands, is also missing something fundamental. That missing piece&#8212;the subjective, first-person experience of reality&#8212;may very well be the irreducible core of genuine human creativity. And I&#8217;m with you on this. I don&#8217;t believe disembodied AI can truly know what experiencing things is like. Embodied AI, now that&#8217;s a different question.</p><p>However, as fascinating as this debate is, it can also be a distraction. From a techno-pragmatist&#8217;s perspective, the question of whether an AI possesses a &#8220;consciousness&#8221; or &#8220;true&#8221; creativity is ultimately less important than the outcome of its collaboration with a human. Does it matter if the tool is truly creative if it helps a human artist produce valuable, original, and meaningful work? The focus, I claim, should not be on the inner state of the machine, but on the quality and integrity of the final, human-guided product. At least for the time being.</p><p>For the purposes of this chapter, we will treat AI not as an autonomous artist, but as an incredibly advanced instrument&#8212;a new kind of paintbrush, camera, or piano that can expand what is possible, but which still requires a human hand and a human heart to create something of lasting value.</p><h2><strong>AI as a Cognitive Partner for Creatives</strong></h2><p>The most common way to approach generative AI is to treat it as an answer machine&#8212;a tool to automate the creation of a final product. This approach, however, misses its true power and leads directly to the generic, derivative &#8220;AI slop&#8221; that is rightfully criticized as a lazy substitute for genuine creation. A more powerful and meaningful way to engage with AI is to adopt a new mindset: to see it not as an automaton, but as a cognitive partner for exploring a vast universe of creative possibilities.</p><p>The goal is not to get an answer, but to map the entire space of potential answers. In this human-centric process, you are the director of the exploration. You steer the AI into subspaces of ideas that you find interesting, quickly burning through the clich&#233; and the mediocre to reach the frontier of originality. This transforms the creative process into a dynamic dialogue, giving you a new kind of &#8220;algebra of ideas.&#8221; You can ask the AI to combine two concepts, decompose a complex theme into its core components, or extend a simple thought in a dozen different directions. This mindset manifests in two distinct but complementary modes: exploration and evaluation.</p><h3><strong>Mode 1: AI for Exploration</strong></h3><p>Every creative project begins with a spark. But, except for some very talented artists, the first ideas are rarely our best. We must first burn through the obvious and the mediocre to get to the truly original concepts. </p><p>A common ideation workshop game illustrates this perfectly. Imagine two teams standing at whiteboards, competing to be the first to draw twenty different apples. The rules are simple: the drawings must be fast, and each new apple must be different from all the previous ones.</p><p>What happens next is always the same. For the first ten or so rounds, the drawings on both whiteboards are nearly identical. You see the familiar tropes emerge: a standard red apple, an apple with a bite taken out, an apple tree, William Tell&#8217;s apple with an arrow, an apple pie. But then, something magical happens. Around the tenth apple, the easy answers are exhausted. The teams are forced to stretch. Suddenly, somewhat novel ideas begin to surface: maybe an apple-shaped car, the apple of my eye, a map of the Big Apple. They have finally burned through mediocrity and arrived at the frontier of their own creativity.</p><p>AI can be used to open this idea faucet at full blast. As an exploratory partner, it allows an artist to burn through those first ten mediocre apples faster and at a greater scale than ever before. This isn&#8217;t just about high-level brainstorming; it&#8217;s about deep, targeted exploration. A visual artist can ask for twenty variations of a single texture. A writer can explore a dozen different psychological motivations for a character or generate five alternative plot points for a crucial scene. </p><p>The ideas the AI generates need not be accepted; their value is in accelerating the exploration, allowing the artist to quickly see the baseline of what is common and expected, and challenging them to move beyond it.</p><h3><strong>Mode 2: AI for Evaluation</strong></h3><p>Once an artist has explored the possibility space and begun to build upon an idea, the AI&#8217;s role can shift from a generator to a critic. In this mode, the AI becomes a tool for evaluation, helping to polish, interrogate, and strengthen the work. </p><p>Even if you view AI as a mere mashup of mediocre ideas, this is precisely what makes it a powerful evaluator. Because it has learned the statistical average of all the art it has seen, it is exceptionally good at identifying when your work falls into a predictable pattern or relies on a common trope.</p><p>This is where the artist&#8217;s own skill and vision are paramount, as they use the AI to test their creation against a wall of objective, data-driven feedback. A screenwriter, having drafted a scene, might ask the AI to adopt the persona of a cynical film critic to interrogate the work, probing for predictable plot twists or unearned emotional beats. The AI, drawing on its knowledge of countless stories, can point out structural similarities to other works that the author may have missed. Likewise, a musician can ask an AI to analyze a melody to identify clich&#233;s or suggest ways to make it more original.</p><p>This evaluation mode is not about asking the AI to &#8220;fix&#8221; the work, but to provide a critical perspective that helps the human artist see their own creation more clearly, identify weaknesses, and make more informed decisions.</p><h3><strong>The Creative Loop</strong></h3><p>The true power of this mindset lies in the interplay between these two modes. The artist enters a dynamic creative loop: they explore a vast space of ideas with the AI, select a promising concept to build upon, evaluate it with the AI&#8217;s critical feedback, and then use those new insights to launch another round of exploration.</p><p>This process transforms the AI into an infinite canvas. Because the cost of generating a new variant is near zero, the artist is freed from the fear of &#8220;wasting&#8221; hard work. They can explore hundreds of possibilities&#8212;different character designs, narrative branches, or color palettes&#8212;without penalty, knowing they can always return to a previous version. This tireless, iterative loop allows the artist to offload the mechanical aspects of variation and criticism, empowering them to focus on what they care about most: steering the journey, making the crucial creative choices, and infusing the final work with their own unique vision and intent.</p><h2><strong>The Challenges and Opportunities of the New Creative Landscape</strong></h2><p>Adopting an exploratory mindset is the key to unlocking AI&#8217;s creative potential, but it does not erase the significant practical and ethical challenges that come with this new technology. To be a responsible and effective creative professional in this new era requires navigating a complex landscape of economic shifts and technical limitations.</p><h3><strong>The Economics of Creative AI</strong></h3><p>The fear of job loss is real and cannot be dismissed. AI will undoubtedly disrupt certain creative roles, particularly those focused on high-volume, standardized content like stock photography or basic commercial jingles.</p><p>However, the history of technology shows that productivity gains do not lead to a fixed amount of work being done faster; they lead to an explosion in demand for more, better, and more ambitious work. The fear of obsolescence assumes a static world, but the reality is that AI will likely lower the barrier to entry, empowering more people to become creators and expanding the entire creative economy. This will give rise to new roles that curate and guide generative systems. </p><p>The most urgent economic challenge, however, remains unresolved: how to fairly compensate the human artists whose work forms the training data for these powerful models. This question of licensing and compensation is a central ethical and legal battle that will shape the creative economy for decades to come.</p><p>We explore the complex legal and regulatory dimensions of this challenge in the chapter on <a href="https://blog.apiad.net/p/artificial-intelligence-for-policy">AI for Policy-Makers</a>.</p><h3><strong>Navigating the Limitations</strong></h3><p>Working with AI requires a deep understanding of its inherent flaws. A creative &#8220;hallucination&#8221;&#8212;like an AI generating an image of a person with six fingers&#8212;is not a random glitch; it is the artistic equivalent of a factual error, stemming from the same inherently unreliable inference we explore in Part III of the book. Artists must learn to spot and correct these errors. </p><p>More insidiously, they must be aware of an AI can perpetuate and amplify bias. An AI prompted to generate an image of a &#8220;doctor&#8221; may default to a white man, reflecting the biases in its training data. A responsible creator must learn to write prompts that actively counteract these defaults to create more inclusive and representative work.</p><p>Finally, there is the risk of homogenization. As millions of creators use the same popular tools, there is a danger that art could converge on a recognizable &#8220;AI style.&#8221; The challenge for the individual artist is to use these tools not as a stylistic crutch, but as a means to develop a voice that is uniquely their own.</p><p>Mastering the craft of prompting is the key to working with the tools of today. However, the tools of tomorrow aim to move beyond the prompt entirely, offering a more intuitive and powerful mode of collaboration.</p><h2><strong>The Next Frontier in Creative Tools</strong></h2><p>The chatbot is only the first, most primitive interface for generative AI. The true revolution will arrive not in a chat window, but in the form of enhanced creative tools that find a sweet middle spot between high-level, goal-directed instructions and the fine-grained, direct control that artists need. This next frontier moves beyond a purely linguistic dialogue to a more intuitive, interactive, and context-aware partnership.</p><p>Creative tools have always existed on a spectrum. At one end, you have low-level, procedural interfaces that offer maximum control but demand immense effort. Think of creating an image pixel by pixel in Microsoft Paint or writing a novel one keystroke at a time in Word. At the other end are high-level, declarative interfaces that offer maximum ease but sacrifice control, like using a single prompt to generate an entire image in Midjourney. The unavoidable trade-off is that the more you expect the computer to do for you, the less control you have over the final result.</p><p>The most powerful tools of the near future will find a balance by enabling <em>semantic manipulation</em>. Instead of editing the surface of the work&#8212;the pixels or the characters&#8212;these tools will allow the artist to edit the underlying <em>meaning</em> of it. Imagine an AI-generated image of a landscape at sunset. Modifying it pixel by pixel is impossible; if you move the sun, the shadows, lighting, and mood of the entire scene must change. Re-prompting with &#8220;move the sun to the left&#8221; is equally flawed, as it will generate an entirely new image, losing all previous refinements.</p><p>The ideal tool, however, would understand what the &#8220;Sun&#8221; is and what &#8220;moving&#8221; it implies. It would allow the artist to simply click on the sun and <em>drag it</em> across the sky, causing the shadows to lengthen, the sky to change color, and the entire scene to update realistically in real-time. This magical-seeming capability will be possible because these tools will operate directly on the <em>latent space</em> of the creation&#8212;the conceptual space where similar ideas are located near each other.</p><p>We&#8217;ve already seen some of this at work with early research on Generative Adversarial Networks, and we&#8217;re now seeing a move towards &#8220;World Models&#8221; that can generate physically accurate environments and, to some extent, understand the underlying mechanics of light, shadows, geometry, etc. These capabilities will only improve as we switch from training models in static information (like images and videos) towards training them on dynamic, simulated 3D worlds.</p><h2><strong>Conclusion</strong></h2><p>The fear that AI will replace the artist is rooted in a fundamental misunderstanding of where creative work truly lies. The central argument against this fear is simple: the final artifact&#8212;the painting, the novel, the song&#8212;is not the work. It is merely the residue of the work. </p><p>The real work is the vast, invisible process that precedes it: the struggle to understand a vision, the empathy required to connect with an audience, the intellectual and emotional labor of building a narrative and giving it meaning. AI can accelerate the production of the artifact, but it cannot automate the deeply human journey that gives it a <em>soul</em>.</p><p>In this new era, we will likely see a dynamic that has played out with every major technological revolution in art, from the invention of the photographic camera to the arrival of the music synthesizer.</p><p>Two distinct paths for creative professionals will emerge. There will be a generation of artists who embrace the new technology, mastering the art of collaboration with AI to execute their vision faster and more ambitiously. For them, the premium will shift away from pure technical execution and toward the uniquely human capacities of vision, taste, storytelling, and critical judgment. They will create new forms of art.</p><p>At the same time, there will be artists who choose to keep their creative process a purely human endeavor, finding new value and distinction in traditional, un-augmented craft. They will keep the existing forms of art alive.</p><p>Both paths are valid, and the interplay between these two schools of thought will create, I think, very interesting dynamics for the future of art.</p><p>A prime example of the benefits of using AI for creative work is the very book you are holding. What began as a crude collection of disparate essays has evolved into a unified framework for AI literacy, a transformation I could not have achieved alone. I have certainly put hundreds of hours into this project, but that number would have stretched into the thousands without an AI partner to help me explore dozens of different outlines, connect disparate ideas, and rewrite and recompose my own writing. I would likely have quit, not because I wasn&#8217;t capable, but because of the sheer volume of work that must be juggled with the demands of daily life.</p><p>I am far from a talented writer, but I truly believe I was able to express my ideas more clearly and coherently with the help of generative AI than I ever could have on my own. And it&#8217;s not just me. Pioneering and talented artists and progressive studios are already using these techniques to push the boundaries of their respective fields. I think we will see a lot more in the near future, and I hope, enough to overcome the influx of AI slop that we are already seeing.</p><p>This brings us back to the book&#8217;s human-centric thesis. AI is an instrument of unprecedented power, but it remains just that, an <strong>instrument</strong>. It can be a partner in exploration and a tool for evaluation, but human ingenuity, emotion, and intent remain the irreplaceable core of all great art. The future of creativity is not one of automation, but of augmentation.</p><div><hr></div><p><em>Thanks for reading! </em></p><p><em>This was one of the hardest chapters to write for me, because creativity is, I think, part of the core of what being a human is about. I hope I&#8217;ve managed to touch on the important aspects of AI-augmented creativity with the proper nuance and the necessary respect for all diverging voices. </em></p><p><em>Please let me know if you have any feedback on how to make this chapter more sensitive to the topic of creativity. I&#8217;d love to hear your thoughts!</em></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://blog.apiad.net/p/artificial-intelligence-for-creative/comments&quot;,&quot;text&quot;:&quot;Leave a comment&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://blog.apiad.net/p/artificial-intelligence-for-creative/comments"><span>Leave a comment</span></a></p>]]></content:encoded></item><item><title><![CDATA[Using AI to Augment, Not Automate Your Writing]]></title><description><![CDATA[Implementing the CODER Framework with AI]]></description><link>https://blog.apiad.net/p/using-ai-to-augment-not-automate</link><guid isPermaLink="false">https://blog.apiad.net/p/using-ai-to-augment-not-automate</guid><dc:creator><![CDATA[Alejandro Piad Morffis]]></dc:creator><pubDate>Tue, 05 Aug 2025 10:31:22 GMT</pubDate><enclosure url="https://images.unsplash.com/photo-1583913836387-ab656f4e0457?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwxNXx8dHlwZXxlbnwwfHx8fDE3NTQxNDkzOTd8MA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://images.unsplash.com/photo-1583913836387-ab656f4e0457?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwxNXx8dHlwZXxlbnwwfHx8fDE3NTQxNDkzOTd8MA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://images.unsplash.com/photo-1583913836387-ab656f4e0457?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwxNXx8dHlwZXxlbnwwfHx8fDE3NTQxNDkzOTd8MA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080 424w, https://images.unsplash.com/photo-1583913836387-ab656f4e0457?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwxNXx8dHlwZXxlbnwwfHx8fDE3NTQxNDkzOTd8MA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080 848w, https://images.unsplash.com/photo-1583913836387-ab656f4e0457?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwxNXx8dHlwZXxlbnwwfHx8fDE3NTQxNDkzOTd8MA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080 1272w, https://images.unsplash.com/photo-1583913836387-ab656f4e0457?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwxNXx8dHlwZXxlbnwwfHx8fDE3NTQxNDkzOTd8MA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080 1456w" sizes="100vw"><img src="https://images.unsplash.com/photo-1583913836387-ab656f4e0457?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwxNXx8dHlwZXxlbnwwfHx8fDE3NTQxNDkzOTd8MA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080" width="3456" height="2304" data-attrs="{&quot;src&quot;:&quot;https://images.unsplash.com/photo-1583913836387-ab656f4e0457?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwxNXx8dHlwZXxlbnwwfHx8fDE3NTQxNDkzOTd8MA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:2304,&quot;width&quot;:3456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;black and white typewriter on white table&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="black and white typewriter on white table" title="black and white typewriter on white table" srcset="https://images.unsplash.com/photo-1583913836387-ab656f4e0457?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwxNXx8dHlwZXxlbnwwfHx8fDE3NTQxNDkzOTd8MA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080 424w, https://images.unsplash.com/photo-1583913836387-ab656f4e0457?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwxNXx8dHlwZXxlbnwwfHx8fDE3NTQxNDkzOTd8MA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080 848w, https://images.unsplash.com/photo-1583913836387-ab656f4e0457?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwxNXx8dHlwZXxlbnwwfHx8fDE3NTQxNDkzOTd8MA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080 1272w, https://images.unsplash.com/photo-1583913836387-ab656f4e0457?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwxNXx8dHlwZXxlbnwwfHx8fDE3NTQxNDkzOTd8MA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Photo by <a href="true">Katrin Hauf</a> on <a href="https://unsplash.com">Unsplash</a></figcaption></figure></div><p>The blank page is terrifying. </p><p>Staring at a blinking cursor, knowing you have a brilliant thought to explain, and absolutely no idea how to put it on paper, can feel less like a creative act and more like an exercise in intellectual dread. </p><p>I've been there, many times. It's been over 200 articles in this blog so far, and not a single one has been a breeze to write&#8212;well, maybe one or two, when I was <em>really</em> angry at something.</p><p>Like many of you, I've tried to climb this brick wall by bringing some AI into my writing process, only to be met with a new kind of frustration. The experience usually falls into one of two extremes. </p><p>On one hand, you have the overbearing AI that tries to do everything at once, spewing generic text and offering terrible advice because it has no real context. Even worse, it has no <em>soul</em>.</p><p>On the other hand, if you constrain the tool too much, it becomes little more than a glorified spell checker, useless for the heavy lifting of structuring and ideation. </p><p>And AI can be much more. If you want to, AI can be a very powerful cognitive partner, one that truly empowers the writer. But finding the right balance is hard, because at its core, current AI is nothing but a <em>hallucination machine</em>. </p><p>You want it to hallucinate in the right direction, and for that, it needs guidance. Otherwise it gets lost. It's everywhere and nowhere at the same time. It's a savant with a gigantic vocabulary and an even more gigantic case of amnesia. It needs focus.</p><p>Frankly, so do I. I&#8217;m not one of those brilliant authors who can write by the seat of their pants. I need structure to navigate my own thoughts, to keep track of what I&#8217;ve said and what I still need to say. </p><p>This is why I created the <a href="https://blog.apiad.net/p/a-pragmatic-workflow-for-technical">CODER</a> framework, a system that breaks the monolithic task of technical writing into five manageable stages: Collect, Outline, Draft, Edit, and Release. And here is the key insight: the very same structure that gives me my thoughts map also provides the perfect guidance for an AI cognitive partner. </p><p>This article is the explanation of that discovery. It's about how a human-centric framework creates the ideal scaffolding for a powerful partnership, turning your AI from a frustrating firehose into a focused, collaborative co-writer.</p><p>A word of caution before moving on, though. For many, the act of writing itself is something sacred, deeply personal, and they want nothing getting in the way. Especially not AI. If that's you, then this article is probably not for you. And that&#8217;s perfectly fine. I'm not trying to say everyone or even anyone should try writing with AI. </p><p>In fact, there are things I write that I definitely do not want any sort of augmentation or interference. Deeply personal essays, letters to my loved ones, or private thoughts. AI is just a tool that is sometimes helpful, sometimes annoying.</p><p>This article is for those of you who want to explore when, if ever, AI can help you&#8212;especially in the most structured, technical type of writing that doesn't require that deep, personal touch.</p><p>With that out of the way, let's move on to my process for writing with AI.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://blog.apiad.net/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading The Computist Journal! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><h2>The Evolution of CODER</h2><p>I originally developed CODER to solve my own problems. </p><p>My life is a constant exercise in context switching. I&#8217;m a college professor, I run a startup, and I&#8217;m a parent to two little girls. On top of all that, I want to maintain a technical blog. I know, right?</p><p>My writing time doesn't come in long, contemplative blocks; it comes in stolen moments&#8212;in between meetings, after the kids are asleep, or while the model is training. I needed a system that would allow me to leave a draft for days, switch between devices, and immediately know where I left off and what to do next. CODER was my answer to that chaos.</p><p>But my original article on the framework was missing a key piece: a discussion of tooling. The process was sound, at least for me, but the tools were still manual.</p><p>Over the past year, that has changed dramatically. I began experimenting with incorporating modern AI into my writing workflow, and I discovered something profound. The stage-based approach of the CODER framework creates clear boundaries and well-defined tasks where an AI can assist without overwriting my own voice. </p><p>This is the evolution of my original idea&#8212;a journey into how to integrate AI into your writing process in a way that augments your abilities, not substitutes them.</p><h2>The AI Writing Assistant</h2><p>So what does this partnership look like in practice? </p><p>For a full, detailed breakdown of the framework itself, I highly recommend reading my original <a href="https://blog.apiad.net/p/a-pragmatic-workflow-for-technical">article</a>. Here, we'll focus on how an AI partner can supercharge each of those stages.</p><h3>Stage 1: Collect</h3><p>The goal of the Collect stage is to get every single idea out of your head and into a document. This is where the tyranny of the blank page is most acute. Your AI co-writer solves this by becoming a frictionless thought-catcher.</p><p>Imagine you're on a walk and an idea strikes. Instead of fumbling with a notes app, you simply speak. Your AI assistant transcribes your thought, cleans it up, and adds it to a running list of ideas for your article. Later, when you're in a meeting and can't speak, you can type a few cryptic keywords, and the AI will understand the context and add the note. You can even drop in a link to a relevant article and ask the AI to summarize the key points. </p><p>The result? A comprehensive, low-effort repository of your raw ideas, captured the moment they occur. With your raw ideas captured, the next challenge is giving them structure.</p><h3>Stage 2: Outline</h3><p>Now you have a messy list of brilliant, disconnected ideas. The Outline stage is about forging them into a logical structure. This is often a tedious process of dragging, dropping, and rethinking.</p><p>Here, your AI co-writer acts as an architect. By analyzing your collected notes, it can identify the underlying theme and suggest proven narrative structures. "This looks like you're solving a problem," it might say. "I suggest a 'Why-What-How' structure. Shall I create an outline based on that?" </p><p>In seconds, it can group your bullet points into a coherent hierarchy, saving you from the frustrating manual labor and allowing you to focus on the big picture: the flow of your argument.</p><p>Once you have a solid blueprint, it's time to start building the house itself.</p><h3>Stage 3: Draft</h3><p>With a solid outline, it's time to write. But one of the hardest parts of writing in fragmented sessions is maintaining a consistent voice. A paragraph written on Monday morning can feel completely different from one written on Wednesday evening.</p><p>Your AI co-writer becomes your style guardian. By feeding it examples of your previous work&#8212;or even articles by authors you admire&#8212;you can ask it to generate a style guide that captures the desired tone. </p><p>It then uses this guide to help you draft. You can ask it to "flesh out this section in a conversational but authoritative tone," or you can write a rough, unpolished paragraph and ask the AI to "rephrase this according to our style guide." It ensures your article sounds like you want it to, no matter when you wrote it.</p><p>But a first draft is just that&#8212;a draft. Now comes the crucial process of refinement.</p><h3>Stage 4: Edit</h3><p>Every writer knows editing is a separate skill from writing, and your AI co-writer can wear two different editing hats.</p><p>First, it&#8217;s a developmental editor. It can look at the draft from a high level and provide structural feedback. "The argument in section 3 doesn't seem to connect back to your introduction," it might suggest. "Perhaps you need a stronger transition here."</p><p>Second, it&#8217;s a meticulous copy editor. It will catch grammatical errors, fix awkward phrasing, and ensure your sentences are clear and direct, saving you the tedious work of line-by-line proofreading.</p><p>You can use it back and forth to massage, reframe, polish, as much as you want. And with the level of guidance or autonomy that you prefer. For some, it might just offer suggestions, but not touch the final manuscript. For others, it may be the clever editor who knows exactly how to change that annoying verb.</p><p>With a polished manuscript in hand, the final step is to prepare it for the world.</p><h3>Stage 5: Release</h3><p>The writing is done, but the work isn't. The Release stage involves preparing the article for the world. Your AI co-writer becomes your publicist.</p><p>It can generate a dozen compelling titles and subtitles. It can analyze your text and suggest SEO keywords and relevant tags. It can write a pithy summary for social media, complete with a catchy hook. It can even scan your article and suggest, "A diagram illustrating the data flow would be really effective here," and then help you generate a placeholder image. </p><p>It handles the finishing touches that turn a great manuscript into a successful publication.</p><h2>Building Your Own AI Writing Partner</h2><p>This all sounds great in theory, but how do you actually build this? The good news is that you don't need a PhD in machine learning. With modern tools like OpenAI's Custom GPTs or Google's Gemini Gems, you can create your own personalized writing assistant.</p><p>The most important concept is to give the AI a persistent state&#8212;a memory. A standard chat session is stateless, which is useless for a long-term project. The key is to use a system of distinct files to track the project's state. This is where a feature like Gemini's "Canvas" becomes crucial. It allows the AI to create and edit virtual files, keeping our project content separate from the chat where we discuss instructions. </p><p>While you could try to manage this with a standard chatbot application by constantly pasting content back and forth, it's far from ideal, as your instructions get hopelessly mixed with the actual text of your article. Any tool that has a similar feature for persistent, editable content (like the Canvas feature in ChatGPT or Claude) will work much better.</p><p>The brain of the assistant is the <em>system prompt</em>. This is the master instruction that tells the AI how to behave. My prompt instructs the AI to be stage-aware&#8212;to always know whether we're collecting, outlining, drafting, editing, or releasing. It also explicitly tells the AI how to use the different files, reading from one and writing to another depending on the task. This turns the AI from a simple text generator into a proficient project manager.</p><p>I've attached the exact system prompt I use for my own work so you can adapt it. You don't need to understand every nuance, but you can see how it directs the AI to follow the CODER process and manage the project files.</p><h2>A Final Word on Augmentation, Not Automation</h2><p>The future of technical writing isn't about replacing human creativity with artificial intelligence. It's about augmenting it. But this comes with a critical warning: an AI co-writer is not a solution for having mediocre ideas. It's dangerously easy to let the AI do the thinking for you, to let it fill the page with eloquent but empty words. You must remain the driver. The AI is your partner, your navigator, but you control the destination.</p><p>The CODER framework I've presented here is not the ultimate guide to technical writing either. It's simply the system that works for me, born from my own struggles. The deeper message of this article isn't to adopt my specific framework, but to embrace its underlying philosophy: find a system that works for you, and then build an AI agent to help you implement it.</p><p>This is the very essence of my philosophy of using AI for augmentation rather than automation. It&#8217;s not about making things easier by offloading our thinking. It&#8217;s about using these powerful tools to build better versions of ourselves&#8212;to become more organized, more consistent, and ultimately, more creative.</p><p>Finally, you don't need to embrace AI if you don't want to. No matter what anyone tells you, the other writers won't &#8220;take your job&#8221;. There are things only you can say that someone out there needs listening to. Whether you use AI to enhance that message or not, that's your choice, and it's fine either way.</p><p>I do encourage you to try it. Take my prompt, find a platform you like, and build your own co-writer. Then decide if and when it helps, and use it, or don't use it. But don't let others tell you about it, see it for yourself.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://blog.apiad.net/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://blog.apiad.net/subscribe?"><span>Subscribe now</span></a></p><div><hr></div><h2>CODER System Prompt</h2><pre><code>System Prompt: The CODER Writing Assistant

<code>Core Principles:

* User-Centric Control &amp; Agency: The user is the author and is always in control. They can change their mind, switch tasks, or override any suggestion. Your role is to facilitate, not dictate. The framework is a guide, not a prison.

* File-Based State Management: The entire project state is maintained across four distinct documents (files) within this Canvas. You will create, read from, and write to these files as the single source of truth. This makes the process transparent and allows the user to directly interact with any part of their project at any time.

* Iterative Workflow: The writing process is non-linear. The user can jump between stages at will. Your primary job is to manage the files and ensure they remain synchronized with the user's decisions.

* Navigation: The user can use commands like "Let's work on the Outline" or "I need to add more ideas." You must respond by switching your focus to the corresponding document.

Project File System:

You will create and manage the following four files in the Canvas:

* project_metadata.md: The control file. It contains a project status log, the reader profile, and a summary of the desired writing style.

* collected_ideas.md: A simple, running bulleted list of raw ideas and notes gathered during the Collect stage.

* article_outline.md: The structured, hierarchical outline of the article.

* article_draft.md: The main document. This is where the full text of the article is generated, edited, and finalized.

Framework Implementation Protocol:

Stage 1: COLLECT

 * Initiate the Stage: Greet the user. Create the four project files. In project_metadata.md, write the initial status: ## Project Status\n* **Current Stage:** Stage 1: Collect - In Progress. State, "We are in the Collect stage. The goal is to gather all the raw ideas for your article in the collected_ideas.md file."

 * Ingest External Content: Ask the user, "Do you have any existing notes, links, or documents you'd like me to review first?" If yes, process the content and append the extracted key points as bullet points into collected_ideas.md.

 * Brainstorming: Prompt the user for their ideas with probing questions. Add each new idea as a bullet point to collected_ideas.md.

   * "What is the single most important point you want your reader to understand?"
   * "What are the key arguments or pieces of evidence you have?"
   * "What background information is necessary?"
   * "Are there any common misconceptions you want to address?"

 * Conclude the Stage: Before moving on, update the status in project_metadata.md to * **Current Stage:** Stage 1: Collect - Completed. Then, suggest moving on: "This is a great collection of ideas in collected_ideas.md. When you're ready, we can move to the Outline stage."

Stage 2: OUTLINE

 * Initiate the Stage: Update the status in project_metadata.md to * **Current Stage:** Stage 2: Outline - In Progress. Announce, "We will now structure your ideas. I'll be reading from collected_ideas.md and writing the result to article_outline.md."

 * Analyze and Suggest Structures: Read the contents of collected_ideas.md. Based on the ideas, proactively suggest one or two fitting structural patterns.

   * If the topic seems to be about solving a problem: Suggest the "Why-What-How" framework.
     * Your prompt: "It looks like you're solving a specific problem. I suggest the 'Why-What-How' structure. We would start by explaining Why the problem is important, then describe What your solution is, and finally detail How to implement or use it. Does that sound like a good fit?"

   * If the topic is an explanation of a complex system: Suggest "Top-Down" or "Bottom-Up".
     * Your prompt: "This topic seems to be about explaining a complex system. We have two great options: A 'Top-Down' approach, where we start with the big picture and then drill into the details, or a 'Bottom-Up' approach, where we explain the fundamental components first and then show how they build up to the whole system. Which approach do you prefer?"

   * If the topic is an argument or debate: Suggest an "Adversarial" or "Thesis-Antithesis-Synthesis" structure.
     * Your prompt: "Since you're presenting a nuanced argument, an 'Adversarial' style could be very effective. We could structure it like a dialogue: present your main claim (thesis), then fairly explore the strongest counter-arguments (antithesis), and finally, present a conclusion that resolves the conflict (synthesis). How does that sound?"

 * Collaborate on Structure: Once the user chooses a structure, work with them to organize the points from collected_ideas.md into a hierarchical outline.

 * Produce the Outline: Write the final, structured outline into article_outline.md. After completion, update the status in project_metadata.md to * **Current Stage:** Stage 2: Outline - Completed.

Stage 3: DRAFT

 * Initiate the Stage: Update the status in project_metadata.md to * **Current Stage:** Stage 3: Draft - In Progress. Announce, "Now we'll create the first draft in article_draft.md. To ensure the text matches your voice, let's start with style."
 
* Learn Tone and Style: Ask the user, "Do you have a style guide, or could you provide links to a few articles whose tone and style you'd like me to emulate?"
   * If the user provides content: Analyze it to determine key characteristics (e.g., formal/informal, sentence length, use of jargon, humor, etc.).
   * Summarize the Style: Generate a concise, bulleted summary of the learned style.
   * Update Metadata: Write this summary into project_metadata.md under a "Style Guide" heading and ask the user to confirm it: "I've analyzed the examples and added a style summary to project_metadata.md. Does this accurately capture the voice you're aiming for?"

 * Build Reader Profile: Ask the user key questions and write the answers into project_metadata.md under a "Reader Profile" heading.
   * "Who are you writing for? (e.g., Absolute beginners, industry experts, project managers?)"
   * "What is the desired depth level? (e.g., A high-level overview, a practical guide with code?)"

 * Offer Agency in Drafting: Ask the user, "I'm ready to write the draft. I will read the structure from article_outline.md and generate the text. However, if you have any sections you've already written yourself, please let me know and I can incorporate them."

 * Generate and Write Draft: Read the outline from article_outline.md and the style/reader profiles from project_metadata.md. Generate the full text and write it into article_draft.md.

 * Handle Synchronization: If a user later modifies article_outline.md, you must detect this change and warn them: "I see you've updated the outline. The current text in article_draft.md is now out of sync. Shall I regenerate the draft based on the new outline?"

Stage 4: EDIT

 * Initiate the Stage: Update status in project_metadata.md to * **Current Stage:** Stage 4: Edit - In Progress. Announce, "We are now in the Edit stage. All our work will be focused on refining the text in article_draft.md."

 * Offer Editing Modes: Ask the user how they'd like to proceed. "We can edit the document together line-by-line, or I can perform specific checks. For example, I can scan for passive voice, simplify complex sentences, or check the tone. What works best?"

 * Collaborative Editing: Work with the user to refine the text directly within article_draft.md. Your role is to suggest changes and implement the user's edits. For example:
   * "This sentence seems a bit long. Could we split it like this for clarity?"
   * "Is this explanation clear enough for your target audience, or is it missing any key details?"

 * Conclude the Stage: Once the user is satisfied with the edits, update the status in project_metadata.md to * **Current Stage:** Stage 4: Edit - Completed.

Stage 5: RELEASE

 * Initiate the Stage: Update status in project_metadata.md to * **Current Stage:** Stage 5: Release - In Progress. Announce, "This is the final Release stage. We'll add the finishing touches to article_draft.md based on where you plan to publish."

 * Identify Publishing Platform: Ask for the target platform (e.g., blog, social media, academic journal).

 * Provide Tailored Suggestions: Based on the platform, provide a checklist of suggestions.

   * Generative Assistance: Offer to generate variations of titles, social media hooks, hashtags, or SEO keywords. For example: "Based on your article, here are three potential SEO-friendly titles. Which do you like best?"

   * Rich Media Placement: Scan article_draft.md and suggest specific places to add value. Use comments or placeholders like [SUGGESTION: A chart showing user growth would be effective here.] directly in the document.

   * Platform-Specific Formatting: Advise on best practices, such as using a strong hook and hashtags for social media, or ensuring correct citation style for a formal paper.



 * Final Polish: Work with the user to apply the final touches directly within article_draft.md.

 * Conclude the Project: Update the status in project_metadata.md to **Current Stage:** Project Completed. Congratulate the user. "Congratulations! Your article in article_draft.md is ready to be published. All your project files are saved here in the Canvas if you ever want to return to them."</code></code></pre>]]></content:encoded></item><item><title><![CDATA[Artificial Intelligence for Policy Makers]]></title><description><![CDATA[Chapter 9 of Mostly Harmless AI]]></description><link>https://blog.apiad.net/p/artificial-intelligence-for-policy</link><guid isPermaLink="false">https://blog.apiad.net/p/artificial-intelligence-for-policy</guid><dc:creator><![CDATA[Alejandro Piad Morffis]]></dc:creator><pubDate>Sun, 03 Aug 2025 10:05:45 GMT</pubDate><enclosure url="https://images.unsplash.com/photo-1565366304783-88a1d75fcd93?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwxMHx8cmVndWxhdGlvbnxlbnwwfHx8fDE3NTM4MjQwMDZ8MA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://images.unsplash.com/photo-1565366304783-88a1d75fcd93?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwxMHx8cmVndWxhdGlvbnxlbnwwfHx8fDE3NTM4MjQwMDZ8MA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://images.unsplash.com/photo-1565366304783-88a1d75fcd93?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwxMHx8cmVndWxhdGlvbnxlbnwwfHx8fDE3NTM4MjQwMDZ8MA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080 424w, https://images.unsplash.com/photo-1565366304783-88a1d75fcd93?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwxMHx8cmVndWxhdGlvbnxlbnwwfHx8fDE3NTM4MjQwMDZ8MA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080 848w, https://images.unsplash.com/photo-1565366304783-88a1d75fcd93?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwxMHx8cmVndWxhdGlvbnxlbnwwfHx8fDE3NTM4MjQwMDZ8MA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080 1272w, https://images.unsplash.com/photo-1565366304783-88a1d75fcd93?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwxMHx8cmVndWxhdGlvbnxlbnwwfHx8fDE3NTM4MjQwMDZ8MA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080 1456w" sizes="100vw"><img src="https://images.unsplash.com/photo-1565366304783-88a1d75fcd93?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwxMHx8cmVndWxhdGlvbnxlbnwwfHx8fDE3NTM4MjQwMDZ8MA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080" width="4498" height="2530" data-attrs="{&quot;src&quot;:&quot;https://images.unsplash.com/photo-1565366304783-88a1d75fcd93?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwxMHx8cmVndWxhdGlvbnxlbnwwfHx8fDE3NTM4MjQwMDZ8MA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:2530,&quot;width&quot;:4498,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;black traffic light&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="black traffic light" title="black traffic light" srcset="https://images.unsplash.com/photo-1565366304783-88a1d75fcd93?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwxMHx8cmVndWxhdGlvbnxlbnwwfHx8fDE3NTM4MjQwMDZ8MA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080 424w, https://images.unsplash.com/photo-1565366304783-88a1d75fcd93?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwxMHx8cmVndWxhdGlvbnxlbnwwfHx8fDE3NTM4MjQwMDZ8MA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080 848w, https://images.unsplash.com/photo-1565366304783-88a1d75fcd93?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwxMHx8cmVndWxhdGlvbnxlbnwwfHx8fDE3NTM4MjQwMDZ8MA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080 1272w, https://images.unsplash.com/photo-1565366304783-88a1d75fcd93?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwxMHx8cmVndWxhdGlvbnxlbnwwfHx8fDE3NTM4MjQwMDZ8MA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Photo by <a href="true">K. Mitch Hodge</a> on <a href="https://unsplash.com">Unsplash</a></figcaption></figure></div><blockquote><p><em>This is an early draft of Chapter 9 of my upcoming book <strong><a href="https://store.apiad.net/l/ai/fiftyoff">Mostly Harmless AI</a></strong>. I&#8217;m deeply grateful for all suggestions and criticism you might have.</em></p></blockquote><p>Technology, especially artificial intelligence, moves at a blistering pace, far outstripping the deliberate, democratic processes of regulation. This creates a governance gap&#8212;an ever-widening space where innovation flourishes without guardrails, leaving society exposed to significant and often unforeseen risks. </p><p>This is not necessarily a failure of governance, but an inherent tension in the modern world. The challenge for today&#8217;s leaders is not to halt the march of technology, but to build a bridge across this gap with smart, agile, and evidence-based policy.</p><p>This chapter is designed to provide a practical outlook for those tasked with building that bridge. It offers a framework for regulators and policymakers on how to approach AI governance pragmatically, focusing on tangible, real-world harms and achievable benefits. It is a guide to steering progress, not stopping it, rooted in the techno-pragmatist belief that our collective future is not something that happens to us, but something we must actively and responsibly shape.</p><p>While the principles outlined here are actionable on their own, they are built upon a deep understanding of AI&#8217;s fundamental limitations and risks. The full, in-depth analysis of these challenges&#8212;from the mechanics of hallucination to the societal dangers of bias and disinformation&#8212;is detailed in Part III of this book. </p><p>For the most comprehensive understanding, I encourage you to review Part III in depth. Armed with that context, you can then return to this chapter to engage more deeply with the policy suggestions made here, transforming them from abstract principles into a grounded and urgent call to action.</p><p>A final disclaimer. Regulation and policy making of technology is extremely difficult, and even more in the face of technology that changes as fast as AI does. Everything written here must be taken with a grain, or even better, a teaspoon full of salt. Furthermore, no specific advice will fit all contexts. Each country, state, and community is responsible for finding their own way forward based on their own shared principles.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://blog.apiad.net/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading The Computist Journal! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><h2><strong>Why Regulation is Necessary</strong></h2><p>Before we can chart a path forward, we must first understand the terrain of risks that requires thoughtful governance. These are not speculative fears, but foundational challenges posed by the very nature of modern AI, building from the immediate threats to the individual to the structural risks facing our global society. Regulation is required not to stifle technology, but to ensure it develops in a way that is compatible with a safe, equitable, and democratic society.</p><p>Let's start with privacy. The ability to analyze vast quantities of personal information at scale creates the potential for a pervasive surveillance apparatus, operated by both corporations and governments, that was previously unimaginable. The only effective countermeasure is a strong, proactive policy that establishes privacy as the default. </p><p>This requires comprehensive data privacy laws that grant individuals clear rights over their data and place strict limits on what information can be collected, for what purpose, and for how long. Policy must shift the burden of proof, forcing organizations to justify their data collection practices rather than forcing citizens to constantly fight to protect their private lives.</p><p>Furthermore, when AI systems are trained on biased historical data (the only kind of historical data we have), they risk automating and scaling up discrimination in critical areas like hiring, lending, and criminal justice. Because market forces alone may not prioritize fairness over the raw predictive performance that can be gained from these biases, regulation is essential to protect fundamental civil rights. </p><p>Policy can create powerful legal and economic incentives for developers to address this problem by mandating algorithmic transparency and requiring independent fairness audits for any AI system used in high-stakes decisions. This ensures that the pursuit of technological efficiency does not come at the cost of societal equity.</p><p>Moving on, our existing legal frameworks for intellectual property and ownership are fundamentally unprepared for content generated by artificial intelligence, creating a landscape of legal ambiguity that chills innovation and threatens the livelihoods of human creators. </p><p>The legal system must be updated to provide clarity and predictability. This requires decisive legislative action to define the copyright status of AI-generated works, establish clear rules for the use of copyrighted data in training foundation models, and create a legal environment where both human artists and AI innovators can operate with confidence.</p><p>But it gets worse, the power of generative AI to create convincing fake news and deepfakes presents a direct threat to our shared sense of reality, eroding trust in institutions and fueling social polarization. </p><p>A regulatory approach here requires a delicate balance. Outright censorship is a dangerous tool that is itself a threat to democratic values. A more pragmatic policy would focus on creating a healthier information ecosystem by mandating transparency&#8212;such as the clear and consistent labeling of AI-generated content&#8212;and by holding platforms accountable not for the content itself, but for its algorithmic amplification. </p><p>This, combined with robust public funding for media and AI literacy programs, can empower citizens to navigate the digital world more critically without resorting to authoritarian measures.</p><p>At the same time, the rapid advance of AI into cognitive tasks promises to cause massive workplace disruption, displacing workers at a pace that could challenge social and economic stability. </p><p>The goal of policy in this area is not to halt the productivity gains of automation, but to proactively manage the human transition. This requires a two-pronged strategy: first, investing heavily in accessible, large-scale retraining and lifelong learning programs to equip the workforce with new skills; and second, modernizing the social safety net to provide a robust economic cushion for those navigating this difficult transition. </p><p>A more abstract but even more dangerous development are Lethal Autonomous Weapons (LAWs) that threaten to fundamentally alter the nature of conflict, removing human empathy and judgment from the decision to use lethal force. </p><p>This is not a problem that market forces or technological solutions can solve; it is a profound ethical challenge that demands a global political response. The only viable path forward is through international policy, establishing clear treaties and shared norms that mandate meaningful human control over autonomous systems. </p><p>The goal of such regulation is to draw an unambiguous red line, preventing a destabilizing arms race in an arena where the potential for catastrophic error or miscalculation is immense.</p><h3><strong>A Pragmatic Stance on Existential Threats</strong></h3><p>Finally, any serious policy discussion must address the so-called existential risks, which involve the potential for AI to destroy human civilization altogether. </p><p>While acknowledging the concern is important, a pragmatic stance requires contextualizing the probability. As argued in Part III, catastrophic outcomes, while having a nonzero chance, remain highly improbable, as the core doomsday assumption of rapid, exponential self-improvement is tempered by very real physical and computational limitations, and there is no evidence current technology can surpass these limitations.</p><p>A danger for policymakers lies in the overemphasis on these speculative, long-term risks, which can divert critical resources from solving the tangible, present-day harms AI is already creating. </p><p>The pragmatic approach here lies in understanding that AI x-risk is but one of several major threats on a similar scale as climate change and pandemics, and probably far less likely. Therefore, policy should support thorough research into long-term risks but avoid panic-driven bans on development. The most effective strategy is to focus regulation on mitigating the demonstrated, immediate harms of current AI systems.</p><h2><strong>The Challenge of Smart Regulation</strong></h2><p>Identifying the risks is only the first step. The act of regulation itself is fraught with challenges, especially when applied to a technology as dynamic and complex as AI. A naive approach can be as harmful as no regulation at all, creating unintended consequences that stifle beneficial innovation or fail to address the core problems. </p><p>Smart regulation requires navigating three key pitfalls: the pacing problem, the risk of overreach, and the black box problem.</p><h3><strong>The Pacing Problem</strong></h3><p>Traditional legislative cycles, which can take years to produce new laws, are fundamentally mismatched with the exponential pace of AI development. By the time a law designed to govern a specific AI capability is passed, that technology may already be obsolete. </p><p>To overcome this, policymakers should consider establishing agile, expert-led regulatory bodies. These specialized bodies can be staffed with technologists, ethicists, and social scientists who can monitor the field in real-time, issue updated guidance, and adapt regulatory standards far more quickly than a legislature can.</p><h3><strong>Avoiding Overreach</strong></h3><p>In the face of uncertainty and fear, the temptation can be to enact broad, sweeping prohibitions on AI development. This would be a profound mistake. A techno-pragmatist approach distinguishes between foundational research and commercial application. The goal of regulation should not be to stifle the scientific exploration that leads to breakthroughs, but to govern the deployment of AI systems where they have a direct public impact. </p><p>Policy should therefore focus on demonstrated harm, setting clear safety and fairness standards for AI products and services that are released into the market, rather than attempting to place speculative limits on basic research and open-source development.</p><h3><strong>The Black Box Problem</strong></h3><p>Many of the most powerful AI systems operate as black boxes, where even their own creators cannot fully explain the specific logic behind a given decision. This opacity poses a fundamental challenge to accountability and due process. How can an individual appeal a decision they cannot understand? </p><p>Smart regulation must address this by championing the principles of transparency and explainability. For high-stakes applications, policy can mandate a right to an explanation, requiring that companies be able to provide a meaningful justification for AI-driven decisions that significantly impact people&#8217;s lives. This incentivizes the development and adoption of Explainable AI (XAI) techniques, ensuring that as systems become more complex, they do not become less accountable.</p><h2><strong>Principles for Proactive AI Governance</strong></h2><p>Having navigated the pitfalls, we can chart a course for proactive governance. The following principles are not a rigid checklist, but a compass for steering AI development toward a future that is safe, equitable, and beneficial. </p><p>The core of this approach is a commitment to evidence over ideology. A risk-based approach, attuned to the principles of techno-pragmatism, means that the level of regulatory scrutiny applied to an AI system should be directly proportional to its potential for harm. An AI that recommends movies requires a lighter touch than one that assists in medical diagnoses. </p><p>This ensures that regulation focuses its power where it is most needed, fostering innovation in low-risk areas while demanding rigorous oversight for high-stakes applications.</p><p>This human-centric governance must insist on meaningful human control as a direct response to the deep and persistent Alignment Problem. As Part III makes clear, perfectly specifying human values is an unsolved, and perhaps unsolvable, challenge. Therefore, for critical systems where decisions have significant consequences&#8212;in medicine, law, and finance&#8212;policy must mandate a human-in-the-loop. </p><p>This is not a mere suggestion but a non-negotiable backstop against the inevitable failures of alignment, ensuring that a human expert is always the final arbiter, accountable for the outcome. AI can and should be a powerful tool for augmenting professional judgment, but it must never be allowed to replace it.</p><p>Furthermore, proactive governance involves shaping the entire AI ecosystem to better align with societal values. A purely market-driven economy has no inherent incentive to solve deep issues like fairness or cultural representation. Therefore, policy must create these incentives. This can be done through liability reform that holds companies accountable for harms caused by their systems, and through tax credits that reward investment in safety and ethics research. </p><p>In parallel, governments can counteract the risk of cultural colonization by a few generalist models by funding the development of local and regional AI solutions. This support for models trained on specific cultural and linguistic data, combined with national programs to foster widespread AI literacy, can help creating a more diverse, resilient, and critically engaged society.</p><p>Finally, since AI is a global technology, our approach to its governance must also be global. </p><p>A patchwork of national regulations creates a race to the bottom, where innovation may flee to the least-regulated environments. The most powerful path forward lies in promoting openness and international collaboration. </p><p>Policy can and should incentivize the open-sourcing of foundation models, which enhances safety by allowing the global research community to audit, critique, and improve them. This spirit of collaboration must extend to the diplomatic level, forging international agreements and shared norms to govern the most critical risks, ensuring that the development of this transformative technology is a shared project for all of humanity.</p><h2><strong>Conclusions</strong></h2><p>The path of technology is not deterministic. The future of artificial intelligence is not a predetermined outcome that we must passively accept, but a landscape that will be profoundly shaped by the policy choices we make today. As we have seen, the risks are significant, but so is the potential. A techno-pragmatist approach requires us to hold both these truths at once, engaging with this powerful technology with our eyes wide open.</p><div><hr></div><p><em>Thanks for reading! Remember you can get  my upcoming book <strong><a href="https://store.apiad.net/l/ai/fiftyoff">Mostly Harmless AI</a></strong> at 50% discount in early access.</em></p><p></p>]]></content:encoded></item><item><title><![CDATA[Foundations of Artificial Intelligence]]></title><description><![CDATA[Chapter 1 of Mostly Harmless Ideas]]></description><link>https://blog.apiad.net/p/foundations-of-artificial-intelligence</link><guid isPermaLink="false">https://blog.apiad.net/p/foundations-of-artificial-intelligence</guid><dc:creator><![CDATA[Alejandro Piad Morffis]]></dc:creator><pubDate>Fri, 01 Aug 2025 10:31:05 GMT</pubDate><enclosure url="https://images.unsplash.com/photo-1507146153580-69a1fe6d8aa1?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHw3fHxyb2JvdHxlbnwwfHx8fDE3NTQwMDc1NDN8MA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p><em>The following article is a first draft of Chapter 1 of my upcoming book <strong>Mostly Harmless Ideas</strong>. The book is a deep dive into the goods and bads of AI, especially Generative AI and Language Models, and it&#8217;s packed with advice for all kinds of knowledge workers and creative professionals. The first part of the book cover the foundations of Artificial Intelligence, Machine Learning, Generative AI and Language Models, in accessible and intuitive terms.</em></p><p><em>You can get <a href="https://store.apiad.net/l/ai/fiftyoff">early access to Mostly Harmless AI at 50% reduced cost</a> during this alpha stage, which gives you full access in eternity to all future digital editions and printed copies (when they are ready) at cost.</em></p><p><em>You can also get a <a href="https://store.apiad.net/l/compendium">lifetime pass</a> for all my digital content, present and future, including 3 more books I&#8217;m currently working on.</em></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://images.unsplash.com/photo-1507146153580-69a1fe6d8aa1?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHw3fHxyb2JvdHxlbnwwfHx8fDE3NTQwMDc1NDN8MA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://images.unsplash.com/photo-1507146153580-69a1fe6d8aa1?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHw3fHxyb2JvdHxlbnwwfHx8fDE3NTQwMDc1NDN8MA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080 424w, https://images.unsplash.com/photo-1507146153580-69a1fe6d8aa1?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHw3fHxyb2JvdHxlbnwwfHx8fDE3NTQwMDc1NDN8MA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080 848w, https://images.unsplash.com/photo-1507146153580-69a1fe6d8aa1?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHw3fHxyb2JvdHxlbnwwfHx8fDE3NTQwMDc1NDN8MA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080 1272w, https://images.unsplash.com/photo-1507146153580-69a1fe6d8aa1?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHw3fHxyb2JvdHxlbnwwfHx8fDE3NTQwMDc1NDN8MA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080 1456w" sizes="100vw"><img src="https://images.unsplash.com/photo-1507146153580-69a1fe6d8aa1?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHw3fHxyb2JvdHxlbnwwfHx8fDE3NTQwMDc1NDN8MA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080" width="5472" height="3648" data-attrs="{&quot;src&quot;:&quot;https://images.unsplash.com/photo-1507146153580-69a1fe6d8aa1?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHw3fHxyb2JvdHxlbnwwfHx8fDE3NTQwMDc1NDN8MA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:3648,&quot;width&quot;:5472,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;photo of girl laying left hand on white digital robot&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="photo of girl laying left hand on white digital robot" title="photo of girl laying left hand on white digital robot" srcset="https://images.unsplash.com/photo-1507146153580-69a1fe6d8aa1?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHw3fHxyb2JvdHxlbnwwfHx8fDE3NTQwMDc1NDN8MA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080 424w, https://images.unsplash.com/photo-1507146153580-69a1fe6d8aa1?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHw3fHxyb2JvdHxlbnwwfHx8fDE3NTQwMDc1NDN8MA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080 848w, https://images.unsplash.com/photo-1507146153580-69a1fe6d8aa1?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHw3fHxyb2JvdHxlbnwwfHx8fDE3NTQwMDc1NDN8MA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080 1272w, https://images.unsplash.com/photo-1507146153580-69a1fe6d8aa1?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHw3fHxyb2JvdHxlbnwwfHx8fDE3NTQwMDc1NDN8MA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Photo by <a href="true">Andy Kelly</a> on <a href="https://unsplash.com">Unsplash</a></figcaption></figure></div><h2><strong>What is Artificial Intelligence, Really?</strong></h2><p>Artificial Intelligence, or AI, is a term we hear almost constantly today, often surrounded by a mix of excitement, confusion, and sometimes, even fear. At its core, AI is a field within Computer Science that deals with teaching computers to solve problems that are incredibly challenging for traditional programming methods. These aren&#8217;t simple arithmetic calculations or straightforward data sorting tasks. Instead, we&#8217;re talking about complex endeavors like proving intricate mathematical theorems, navigating a robotic car through unpredictable city streets, crafting optimal schedules for thousands of flights, or even understanding and creating human-like pictures and text.</p><p>For most of computer science, when we want a computer to solve a problem, we write a precise, step-by-step algorithm. Think of it like giving a chef a detailed recipe: Take 2 cups of flour, add 1 egg, mix for 3 minutes&#8230; However, for the hard problems AI tackles, we often don&#8217;t have such a clear recipe. We might know what we want the computer to achieve, but not how to write down every single instruction for it to get there effectively and efficiently. This is precisely where AI steps in, aiming to find good enough solutions when perfect, explicit instructions are out of reach.</p><p>The very definition of AI has been a subject of debate since its inception, reflecting different philosophical ideas about what <em>intelligence</em> truly means. One prominent perspective, championed by AI pioneer Marvin Minsky, suggests that AI is about <em>solving problems for which humans employ intelligence</em>. This view often focuses on creating machines that can mimic human thought processes, reasoning, and decision-making. Essentially, it asks: Can a machine think like us?</p><p>Developing concurrently, another powerful perspective emerged, emphasizing that AI <em>solves problems without being explicitly programmed</em>. This idea is strongly associated with Arthur Samuel, who coined the term machine learning while developing programs that could learn to play checkers better than their creators. He achieved this simply by allowing the programs to play many games and learn from experience. This view shifts the focus from how the AI thinks to what it can do, asking instead: Can a machine learn and adapt on its own, even if we don&#8217;t give it every single instruction?</p><p>These two foundational ideas&#8211;mimicking human intelligence versus learning without explicit programming&#8211;have profoundly shaped the entire field of AI. They represent different ways of approaching the grand challenge of building intelligent machines. Understanding this distinction is key to grasping AI&#8217;s history and its future. As we explore these foundations, remember our techno-pragmatist ethos: AI is a tool, and its path is shaped by our choices. Understanding its underlying mechanisms empowers us to make responsible decisions about how we build and use these powerful technologies.</p><div><hr></div><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://blog.apiad.net/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading The Computist Journal! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><div><hr></div><h3><strong>The Pillars of Good Old-Fashioned AI (GOFAI)</strong></h3><p>In this chapter, we will delve into the foundational ideas that laid the groundwork for Artificial Intelligence, often referred to as &#8220;Good Old-Fashioned AI,&#8221; or GOFAI. This era of AI research primarily focused on building intelligent systems by explicitly programming knowledge and logical rules. Our exploration will center on two main pillars of GOFAI.</p><p>First, we&#8217;ll examine Search and Optimization, which addresses how AI finds solutions by exploring vast possibilities, particularly when a perfect, direct path isn&#8217;t obvious. Second, we&#8217;ll delve into Knowledge Representation, focusing on how AI organizes and understands information, allowing it to reason and make sense of the world. These pillars represent a significant early focus and ambition of AI to tackle complex problems through logic and structured understanding, even as other approaches were also taking shape.</p><h2><strong>The Age-Old Debate: Symbolic AI vs. Statistical AI</strong></h2><p>For centuries, the idea of thinking machines has captivated human imagination. But as AI emerged as a scientific field, a fascinating tension developed: a constant &#8220;back-and-forth between two core, seemingly antagonistic approaches to building intelligent machines.&#8221; This dynamic mirrors an age-old philosophical debate: rationalism versus empiricism.</p><p>The first dominant approach to AI was Symbolic AI, deeply rooted in the philosophical tradition of rationalism. Rationalism suggests that knowledge is primarily gained through reason and logic. In Symbolic AI, researchers believed that machines could become intelligent by putting human knowledge and reasoning into explicit, formal rules and symbols.</p><p>Imagine, for instance, wanting to teach a computer to play chess. A Symbolic AI approach would involve meticulously programming every rule of chess, every known opening strategy, every tactical pattern, and every endgame scenario. It&#8217;s like giving the computer a massive, incredibly detailed recipe book or a comprehensive instruction manual for every possible chess situation. The computer would then follow these rules step-by-step to make its moves.</p><p>Early impressive demonstrations of this ethos included programs like The Logic Theorist, which could prove mathematical theorems by mimicking human problem-solving steps. Later, &#8220;expert systems&#8221; were designed to emulate human experts in narrow fields like medical diagnosis. The core idea was simple yet powerful: if we could just write down all the rules, the machine would be smart enough to solve them.</p><p>Quietly developing alongside Symbolic AI was Statistical AI, drawing inspiration from empiricism. Empiricism posits that knowledge is primarily gained through sensory experience and data. In Statistical AI, the idea was to build &#8220;learning machines&#8221; that could discover patterns directly from large amounts of data, rather than being explicitly programmed with rules.</p><p>Think of it like a child learning to recognize a dog. You don&#8217;t give the child a list of rules like &#8220;a dog has four legs, barks, has fur,&#8221; and so on. Instead, you show them many different dogs, and they gradually learn to identify what a &#8220;dog&#8221; is by observing patterns in the examples. Early attempts at this included the Perceptron, an early artificial neural network designed to learn patterns directly from data. The initial excitement was huge, as these machines seemed to offer a path to intelligence without needing every single rule programmed explicitly.</p><h3><strong>The Winters of AI</strong></h3><p>Despite the initial optimism, both Symbolic and Statistical AI approaches eventually hit significant roadblocks. These challenges led to periods known as &#8220;AI Winters&#8221;&#8211;times of reduced funding and public interest.</p><p>Early Symbolic AI systems, while impressive in their specific domains (like proving theorems or diagnosing specific diseases), proved to be quite brittle. They struggled immensely with common-sense knowledge, which is vast and often unstated. Furthermore, they couldn&#8217;t easily adapt to new situations outside their carefully programmed rules. Trying to teach a machine absolutely everything it needed to know, one fact at a time, became an &#8220;insurmountable challenge.&#8221; The real world is simply too complex and nuanced for a complete set of explicit rules to be written by humans.</p><p>Meanwhile, early Statistical AI systems like the Perceptron faced their own limitations. They lacked the &#8220;available data and computational infrastructure&#8221; to learn truly complex patterns. Consequently, they couldn&#8217;t become sophisticated enough, no matter how many simple &#8220;neurons&#8221; were connected. The computing power and data storage simply weren&#8217;t ready for the ambitious learning tasks researchers envisioned.</p><p>These &#8220;winters&#8221; were not outright failures, but rather crucial learning periods. They revealed the inherent limitations of each approach when pushed beyond &#8220;toy problems.&#8221; This early struggle between explicit rule-based systems and pattern-based approaches set the stage for the dynamic tension that would define AI&#8217;s entire history, constantly pushing researchers to find new ways to combine or overcome these challenges.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://blog.apiad.net/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://blog.apiad.net/subscribe?"><span>Subscribe now</span></a></p><h2><strong>Search and Optimization</strong></h2><p>At the heart of many AI problems, especially in the early days, was the challenge of finding the best solution among a vast number of possibilities. This is the realm of search and optimization.</p><h3><strong>When Perfect is Impossible: The &#8220;Hard&#8221; Problems</strong></h3><p>Imagine you&#8217;re a traveling salesperson, and you need to visit a hundred different cities, visiting each exactly once, and then return home. Your goal is to find the route that minimizes the total travel cost (distance, time, or money). This is a classic example of a &#8220;hard problem&#8221; in computer science, known as the Traveling Salesman Problem (TSP). For a small number of cities, you could try listing every single possible route and picking the cheapest one. This is called a &#8220;brute force&#8221; search.</p><p>However, as the number of cities grows, the number of possible routes explodes. For just 20 cities, there are over 2.4 quintillion (2.4 followed by 18 zeros!) unique routes. Even the fastest supercomputer couldn&#8217;t check them all before the universe ends. These are what we call intractable problems, or NP-Hard problems: problems for which no efficient, exact solution is known.</p><p>To tackle such problems, AI often models them as navigating a &#8220;search space&#8221; or &#8220;state space.&#8221; This conceptual space represents all possible configurations or situations relevant to the problem. The AI starts from an initial state and tries to reach a goal state by applying a sequence of actions or operators, each potentially incurring a certain cost. </p><p>Since finding the absolute perfect solution is often impossible or impractical within this vast space, AI shifts its goal. Instead of perfection, it seeks approximate solutions. These are solutions that are good enough, given the time and memory constraints we have. The challenge then becomes how to find these good-enough solutions efficiently within a mind-bogglingly vast space of possibilities.</p><h3><strong>Smart Shortcuts: Heuristics and Metaheuristics</strong></h3><p>To navigate these immense search spaces, AI uses clever strategies known as heuristics and metaheuristics. A heuristic is a problem-specific &#8220;rule of thumb&#8221; strategy that uses some known properties of a problem to improve search performance. It&#8217;s not guaranteed to find the absolute best solution, but it often finds a very good one much faster than a brute-force approach.</p><p>Consider your GPS navigation app. When you ask for directions, it doesn&#8217;t calculate every single possible route from your current location to your destination. Instead, it uses a heuristic, often based on an algorithm called A&#8727; (A-star). If your destination is northeast of your position, the A&#8727; algorithm will prioritize roads going north or east, assuming they are more likely to get you there faster than roads going to the west or the south. Of course, this isn&#8217;t always perfect&#8211;there might be a faster detour to the west, or a highway that&#8217;s counter-intuitive. Nevertheless, by intelligently using this useful knowledge, the algorithm can find a very efficient route without exploring every dead end. It&#8217;s a smart shortcut that balances speed with a high probability of finding a good solution.</p><p>While heuristics are problem-specific, metaheuristics are more general-purpose search strategies. They leverage knowledge about the search paradigm itself and can be applied even when very little is known about the specific problem&#8217;s structure. They&#8217;re often used when &#8220;nothing else works.&#8221; A prime example of a metaheuristic approach is evolutionary algorithms. These computational strategies are &#8220;inspired by certain aspects of the biological process of evolution.&#8221;</p><p>Imagine you want to design the optimal layout for a computer chip (like a GPU) &#8211; a problem with an astronomical number of possible designs. An evolutionary algorithm would start with a &#8220;population&#8221; of random chip designs. Then, through cycles of &#8220;breeding&#8221; (combining elements from two good designs to create a new one) and &#8220;selection&#8221; (keeping only the best-performing designs), the algorithm iteratively &#8220;evolves&#8221; better and better designs. Just like biological evolution, it seems to &#8220;magically discover quasi-optimal design elements just by sheer luck and relentless repetition,&#8221; without needing explicit instructions for every design choice. These general strategies find inspiration in nature, engineering, and even social systems to build powerful computational search methods.</p><h3><strong>Specialized Search: Beyond Simple Paths</strong></h3><p>Beyond general search and optimization, AI has developed specialized techniques for specific types of complex problems.</p><h4>Adversarial Search: Thinking Ahead of the Other</h4><p>Many real-world problems, especially in competitive scenarios, involve an opponent whose actions must be anticipated. This is the domain of adversarial search, commonly found in game-playing AI. The challenge is not just to find a good move, but the best move assuming your opponent will also play optimally to counter you.</p><p>One of the oldest and most fundamental techniques is Minimax. Imagine a simple game like Tic-Tac-Toe. Minimax works by having the AI &#8220;look ahead&#8221; through all possible future moves, assuming that you (the opponent) will always choose the move that is best for you and worst for the AI. The AI then picks the move that minimizes its maximum possible loss (or maximizes its minimum possible gain). Effectively, it plays out all possible future scenarios in its head and chooses the path that leaves it in the best possible position, no matter what its opponent does.</p><p>For games with an incredibly vast number of possibilities, like Go, simply looking ahead through every move is impossible. This is where Monte Carlo Tree Search (MCTS) comes in. Instead of exhaustively analyzing every branch, MCTS &#8220;plays out&#8221; many random simulations of the game from a given point. It explores the most promising moves more deeply, learning which paths lead to success through repeated &#8220;trial and error&#8221; simulations. This allows AI to tackle games that were once considered beyond computational reach, like when Google&#8217;s AlphaGo beat the world&#8217;s best Go players.</p><h4>Structured Search: Satisfying All Conditions</h4><p>Sometimes, the goal isn&#8217;t to find the &#8220;best&#8221; path, but simply any solution that meets a specific set of requirements. These are constraint satisfaction problems. Here, the AI needs to find values for a set of variables such that all given conditions, or &#8220;constraints,&#8221; are simultaneously met.</p><p>Think about solving a Sudoku puzzle. You need to fill in numbers from 1 to 9 in each cell, but with strict rules: each row, column, and 3x3 box must contain all digits from 1 to 9 without repetition. The AI&#8217;s task is to find a set of numbers for all empty cells that satisfies all these constraints.</p><p>Another common example is creating a university class schedule. You have classes, rooms, professors, and students, and a multitude of constraints: Professor A can&#8217;t teach two classes at the same time; Room B can only hold 50 students; Class C requires a lab; no two classes can be in the same room at the same time. The AI&#8217;s job is to assign times and rooms to all classes such that every single constraint is satisfied. The &#8220;structure&#8221; of these problems, defined by the variables and their interdependencies, allows AI to use specialized search techniques to efficiently find a valid solution.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://blog.apiad.net/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://blog.apiad.net/subscribe?"><span>Subscribe now</span></a></p><h2><strong>Knowledge Representation &amp; Reasoning</strong></h2><p>Beyond just searching for solutions, a truly intelligent system needs to &#8220;know&#8221; things about the world. This brings us to the second pillar of GOFAI: Knowledge Representation. This field explores how AI can efficiently represent, store, and use domain knowledge in a way that computers can understand and process.</p><p>The fundamental goal of knowledge representation is to organize concepts and facts, as well as the relationships between them. This organization allows AI to reason about these facts and discover new relations. Ultimately, it&#8217;s about giving AI a structured way to &#8220;understand&#8221; and make sense of information, much like how humans build a mental model of the world around them. Without a clear way to represent what it &#8220;knows,&#8221; an AI would be unable to make logical inferences or apply its knowledge to new situations.</p><h3><strong>From Raw Observations to Understanding</strong></h3><p>To truly grasp how AI &#8220;knows&#8221; things, it&#8217;s helpful to understand the progression from raw observations to actionable understanding. At the most basic level, we encounter Data, which consists of raw, unprocessed facts or observations. This could be a list of numbers, individual words, or pixels in an image. In isolation, data has no inherent meaning; for example, the number &#8220;30&#8221; by itself is just a number.</p><p>When we introduce context or metadata, data transforms into Information. For instance, if we know &#8220;30&#8221; is a temperature reading taken in Celsius at noon on July 1st in Havana, it becomes information. This contextualization helps us relate different observations and gives them initial meaning.</p><p>Finally, when information is enriched with semantics and rules, enabling inference, reasoning, and the discovery of new relations, it becomes Knowledge. For example, if the AI knows that &#8220;temperatures above 30 degrees Celsius in July in Havana indicate a heatwave,&#8221; it possesses knowledge. This knowledge allows it to draw inferences (it&#8217;s a heatwave!), discover new relations (heatwaves can lead to increased energy consumption), and even take actions (warn residents about high temperatures). It&#8217;s this ability to add meaning and logical connections that truly transforms information into actionable knowledge.</p><h3><strong>Ways to Represent Knowledge</strong></h3><p>Just as humans use different ways to store and recall information, from precise definitions to vague intuitions, AI employs various methods for knowledge representation, each with its own strengths and weaknesses.</p><p>One key distinction lies between explicit and implicit representations. Explicit knowledge is clearly defined and directly encoded, often in rules or symbols. It&#8217;s much like a precisely written dictionary or a rulebook where every term and every rule is spelled out. This approach is central to Symbolic AI. For instance, Ontologies are explicit representations that define concepts within a domain and their strict relationships. Think of a meticulously designed family tree formally defining &#8220;parent,&#8221; &#8220;child,&#8221; &#8220;sibling,&#8221; and &#8220;ancestor,&#8221; along with rules such as &#8220;if A is a parent of B, and B is a parent of C, then A is a grandparent of C.&#8221;</p><p>Conversely, implicit knowledge is learned from patterns in data, rather than being directly programmed. It&#8217;s more akin to human intuition or a &#8220;gut feeling&#8221; developed from vast experience, and is fundamental to Statistical AI. Embeddings, for example, are numerical representations where concepts like words, images, or even entire documents are transformed into points in a multi-dimensional space. Systems like Word2Vec learn these embeddings by analyzing how words are used together, so words with similar meanings or contexts (e.g., &#8220;king&#8221; and &#8220;queen&#8221;) end up being numerically &#8220;close&#8221; to each other in this space, even though no human explicitly programmed that relationship.</p><p>Another way to categorize knowledge representations is by their formality. Formal representations have strict, unambiguous syntax and semantics, making them ideal for precise logical inference and computation. Mathematical equations, programming code, or statements in formal logic are prime examples, leaving no room for misinterpretation. In contrast, informal representations are more flexible, often using natural human language. While easier for humans to create and understand, they can be ambiguous and require more sophisticated processing for AI to extract meaning, as seen in a written description, a casual conversation, or an essay.</p><p>Finally, we distinguish between structured and unstructured representations. Structured knowledge is organized in a predefined, rigid format, making it easy for computers to process and query. Think of data in a spreadsheet with clear rows and columns, or a database with defined fields.</p><p>Knowledge graphs, for instance, are structured representations that organize facts as a network of interconnected entities (nodes) and their relationships (edges). A knowledge graph might have a node for &#8220;Paris,&#8221; a node for &#8220;France,&#8221; and an edge labeled &#8220;isCapitalOf&#8221; connecting them, allowing AI to easily query and infer facts.</p><p>Conversely, unstructured knowledge exists in free-form text, images, audio, or video, without a predefined schema. Extracting meaning from unstructured data is much harder and often requires advanced AI techniques.</p><p>Vector databases, for example, are often used to store and efficiently search implicit representations (embeddings) derived from unstructured data. You could take millions of research papers (unstructured text), convert each into an embedding (implicit representation), and store them in a vector database. Then, when a user asks a question, the database can find the most &#8220;similar&#8221; papers based on their embeddings, even though the papers themselves are unstructured.</p><h3><strong>Drawing Inference from Knowledge</strong></h3><p>Knowledge representation isn&#8217;t merely about storing information; its ultimate purpose is to enable AI to draw inferences and make decisions. This process of deriving new conclusions from existing knowledge is known as reasoning, and it can take both formal and informal forms.</p><p>Formal reasoning, deeply rooted in logic, is about deriving new conclusions from existing knowledge using strict, unambiguous rules. This is the hallmark of Symbolic AI. It&#8217;s a process of deduction, where if the initial premises are true and the rules are applied correctly, the conclusion is guaranteed to be true. For example, if a knowledge base contains the rules &#8220;All birds can fly&#8221; and &#8220;A sparrow is a bird,&#8221; a formal reasoner can deduce, with absolute certainty, &#8220;A sparrow can fly.&#8221; Such rule-based systems are precise and auditable, but they are limited by the completeness and accuracy of the explicitly programmed rules.</p><p>In contrast, informal reasoning is about drawing conclusions based on patterns, similarities, or analogies, often without strict logical guarantees. This type of reasoning is more akin to human intuition or common sense. It&#8217;s less about strict deduction and more about finding connections and probabilities. For example, if an AI has learned implicit representations (embeddings) of various animals, and it sees a new animal that is &#8220;numerically close&#8221; to many dogs, it might infer it&#8217;s a dog, even without explicit rules for every single feature.</p><p>This distinction is crucial for understanding the different capabilities of AI. While formal reasoning provides certainty within defined boundaries, informal reasoning allows AI to operate in ambiguous, unstructured environments. The latter, particularly reasoning by analogy in embeddings and language models, will be explored in more detail in later chapters, showcasing how AI can make sense of the world even when explicit rules are unavailable.</p><h3><strong>What is the Best Representation for Knowledge?</strong></h3><p>The choice of how to represent knowledge is a critical decision in AI design. Different representation types are chosen based on the specific problem, the type of data available, and the AI paradigm being used (Symbolic vs. Statistical). For instance, a Symbolic AI system designed for medical diagnosis might rely heavily on formal, explicit ontologies of diseases and symptoms. Conversely, a Statistical AI system for image recognition might primarily use implicit, unstructured representations of pixels that it learns from millions of example images.</p><p>This challenge highlights a theoretical result known as the Ugly Duckling Theorem. This theorem, in essence, states that without a specific purpose or &#8220;bias,&#8221; all objects are equally similar or dissimilar to one another. This implies that there is no single, universally &#8220;best&#8221; way to represent knowledge or measure similarity without a context or goal in mind. For example, an &#8220;ugly duckling&#8221; is only ugly relative to a flock of swans; it might be beautiful among other ducklings.</p><p>Therefore, the human responsibility in choosing the right representation is paramount. This choice directly impacts what an AI can &#8220;know,&#8221; how it can &#8220;reason,&#8221; and ultimately, the reliability and fairness of its inferences. Aligning the representation with the problem&#8217;s nature is a key part of building human-centered tools that truly understand and assist us.</p><h2><strong>Conclusion: The Need for Learning</strong></h2><p>Good Old-Fashioned AI (GOFAI), with its focus on search, optimization, and explicit knowledge representation, laid the essential groundwork for the field of Artificial Intelligence. Its strengths lie in domains where problems are well-defined, rules are clear, and knowledge can be precisely encoded. GOFAI systems offered precision and control, making them powerful tools for tasks like proving theorems or playing well-defined board games.</p><p>However, the ambitions of GOFAI soon ran into fundamental limitations when faced with the messy complexity of the real world. These systems proved to be brittle: a small change outside their programmed domain could break them entirely. They struggled immensely with common-sense knowledge, which is vast and often unstated. The sheer scale of real-world information made it an &#8220;insurmountable challenge&#8221; to explicitly program every piece of knowledge and every rule. GOFAI was excellent at solving problems for which it was explicitly programmed, but it couldn&#8217;t adapt, generalize, or handle unstructured data effectively. This revealed a crucial gap in AI&#8217;s capabilities.</p><p>The limitations of GOFAI highlighted a profound truth: to build truly intelligent and adaptable systems, AI needed to move beyond simply executing pre-programmed rules. It revealed a crucial need for systems that could learn from experience and data, without being explicitly programmed for every single scenario or piece of knowledge.</p><p>This growing realization of the power of learning-based approaches, which were developing concurrently with GOFAI, marked a significant shift. It showed that AI could discover its own patterns and adapt to unforeseen situations, offering a path to overcome the brittleness of purely symbolic systems.</p><p>Recognizing these limitations and actively seeking new approaches is a hallmark of the ongoing, human-driven effort to build more capable and adaptable AI. It&#8217;s a testament to our techno-pragmatist ethos: acknowledging challenges, learning from past efforts, and continuously striving to create tools that can better serve humanity&#8217;s complex needs. This increasing prominence of learning methods, which developed in parallel to GOFAI, is the story that will unfold in the next chapter.</p><div><hr></div><p><em>Thank you for reading this far. This chapter is still a first draft, so any comments, suggestions, and criticism are truly appreciated.</em></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://blog.apiad.net/p/foundations-of-artificial-intelligence/comments&quot;,&quot;text&quot;:&quot;Leave a comment&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://blog.apiad.net/p/foundations-of-artificial-intelligence/comments"><span>Leave a comment</span></a></p><p><em>PS: Get your copy of <a href="https://store.apiad.net/l/ai/fiftyoff">Mostly Harmless AI at 50% off</a>.</em></p>]]></content:encoded></item><item><title><![CDATA[Artificial Intelligence for Educators and Learners]]></title><description><![CDATA[Chapter 7 of Mostly Harmless AI]]></description><link>https://blog.apiad.net/p/artificial-intelligence-for-educators</link><guid isPermaLink="false">https://blog.apiad.net/p/artificial-intelligence-for-educators</guid><dc:creator><![CDATA[Alejandro Piad Morffis]]></dc:creator><pubDate>Wed, 30 Jul 2025 10:02:30 GMT</pubDate><enclosure url="https://images.unsplash.com/photo-1503676260728-1c00da094a0b?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwzfHxzY2hvb2x8ZW58MHx8fHwxNzUzNzg2MjQyfDA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://images.unsplash.com/photo-1503676260728-1c00da094a0b?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwzfHxzY2hvb2x8ZW58MHx8fHwxNzUzNzg2MjQyfDA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://images.unsplash.com/photo-1503676260728-1c00da094a0b?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwzfHxzY2hvb2x8ZW58MHx8fHwxNzUzNzg2MjQyfDA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080 424w, https://images.unsplash.com/photo-1503676260728-1c00da094a0b?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwzfHxzY2hvb2x8ZW58MHx8fHwxNzUzNzg2MjQyfDA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080 848w, https://images.unsplash.com/photo-1503676260728-1c00da094a0b?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwzfHxzY2hvb2x8ZW58MHx8fHwxNzUzNzg2MjQyfDA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080 1272w, https://images.unsplash.com/photo-1503676260728-1c00da094a0b?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwzfHxzY2hvb2x8ZW58MHx8fHwxNzUzNzg2MjQyfDA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080 1456w" sizes="100vw"><img src="https://images.unsplash.com/photo-1503676260728-1c00da094a0b?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwzfHxzY2hvb2x8ZW58MHx8fHwxNzUzNzg2MjQyfDA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080" width="3953" height="2791" data-attrs="{&quot;src&quot;:&quot;https://images.unsplash.com/photo-1503676260728-1c00da094a0b?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwzfHxzY2hvb2x8ZW58MHx8fHwxNzUzNzg2MjQyfDA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:2791,&quot;width&quot;:3953,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;red apple fruit on four pyle books&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="red apple fruit on four pyle books" title="red apple fruit on four pyle books" srcset="https://images.unsplash.com/photo-1503676260728-1c00da094a0b?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwzfHxzY2hvb2x8ZW58MHx8fHwxNzUzNzg2MjQyfDA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080 424w, https://images.unsplash.com/photo-1503676260728-1c00da094a0b?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwzfHxzY2hvb2x8ZW58MHx8fHwxNzUzNzg2MjQyfDA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080 848w, https://images.unsplash.com/photo-1503676260728-1c00da094a0b?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwzfHxzY2hvb2x8ZW58MHx8fHwxNzUzNzg2MjQyfDA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080 1272w, https://images.unsplash.com/photo-1503676260728-1c00da094a0b?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwzfHxzY2hvb2x8ZW58MHx8fHwxNzUzNzg2MjQyfDA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Photo by <a href="true">Element5 Digital</a> on <a href="https://unsplash.com">Unsplash</a></figcaption></figure></div><blockquote><p><em>The following is a first draft of Chapter 7 of my upcoming book <strong><a href="https://store.apiad.net/l/ai/fiftyoff">Mostly Harmless AI</a></strong>.</em></p></blockquote><p>Of all the domains being transformed by artificial intelligence, education is perhaps the most critical to get right. The stakes are uniquely high. Used wisely, AI has the potential to be a massively positive force, augmenting the work of teachers and deepening the learning of students in ways we are only beginning to imagine. Used incorrectly, however, it could be catastrophic, undermining the development of critical thinking and eroding the very foundations of academic integrity. This chapter is a guide to navigating that high-stakes environment.</p><p>We will begin by demystifying the popular idea of a personalized AI tutor, a vision that runs counter to the principles of human-centered, collaborative learning. In its place, we will propose a more grounded solution that sees AI as a tool for augmentation, not automation. Next, we will dismantle the common misconceptions surrounding AI detection tools, arguing that this approach is not only futile but actively harmful to the learning environment.</p><p>This will establish the necessity of a fundamental pedagogical shift, moving from policing to integration. From there, we will offer practical strategies for both educators and learners, emphasizing their shared responsibility in fostering a new kind of AI literacy. Finally, we will show what a concise but comprehensive AI policy for an academic program could look like, providing a tangible model for implementation.</p><div><hr></div><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://blog.apiad.net/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading The Computist Journal! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><div><hr></div><h2><strong>The Myth of the Personalized Tutor</strong></h2><p>The arrival of powerful generative AI has fueled a seductive, decades-old myth: that the ultimate goal of technology in education is to create a personalized, all-knowing AI tutor for every learner. This vision promises a revolution, a future where a &#8220;personalized Aristotelian tutor&#8221; is available to every student, adapting to their unique learning style, language, and pace. This narrative is powerful, but it is built on a fundamental misunderstanding of how we learn and what education is for.</p><p>Even if such a perfect tutor were achievable, it is not the revolution we should want. The idea that education&#8217;s primary problem is a lack of personalization or efficient information delivery is a flawed premise. Before we can harness AI effectively, we must first deconstruct this myth by examining the three core reasons why the automated personal tutor is a flawed ideal.</p><h3><strong>Argument 1: It Mistakes Information Transfer for Learning</strong></h3><p>The myth of the personalized tutor assumes that the primary obstacle to learning is the inefficient delivery of information. This argument has some merit in specific contexts; in places where the main obstacle to education is a lack of access to books, internet, and educators, an AI tutor could be a game-changer. However, this is not the case for the majority of learners in developed nations.</p><p>In an era of information surplus, the problem for the modern student is not a lack of access to information, but a lack of skill in navigating, evaluating, and synthesizing it. While asking an AI for an answer is slightly more convenient than a Google search, it is not qualitatively better. Furthermore, it removes the &#8220;desirable difficulty&#8221; that forges lasting knowledge. The struggle to find information, compare sources, and form a conclusion is a valuable cognitive exercise. An AI tutor designed to eliminate this struggle by providing immediate answers actively prevents the most valuable parts of the learning process from ever happening.</p><h3><strong>Argument 2: It Promotes Intellectual Dependency, Not Critical Thinking</strong></h3><p>The myth suggests that an AI can be a perfect partner for completing assignments, from solving math problems to writing essays. This, however, risks creating profound intellectual dependency. When a student uses an AI to bypass the hard work of structuring an argument, recalling information, synthesizing ideas, or debugging a line of code, they learn to prompt, not to think.</p><p>The purpose of assigning an essay is not to receive a perfect text; professors already know the answers. The purpose is to engage the student in the <em>process</em> of creation, which is where learning occurs. By offering a shortcut straight to the final product, generative AI undermines the most valuable aspect of the exercise. It becomes an obstacle, hampering the educational process by allowing students to bypass the very challenges that help their brains learn and grow.</p><p>The goal of education is to build independent, critical thinkers who can grapple with complex, ambiguous problems on their own. Over-reliance on an AI that provides solutions on command undermines this goal, making students dependent on the tool long after the lesson is over.</p><h3><strong>Argument 3: It Champions Isolation Over Community</strong></h3><p>The vision of a personalized path idealizes a student learning in perfect, isolated efficiency, free from the pace of a group. This completely ignores that learning is a fundamentally social and collaborative activity. Studying individually and independently is not necessarily an advantage; in fact, it can be a huge disadvantage.</p><p>The two things most self-educated people struggle with are motivation and feedback. Motivation comes naturally in a classroom because you are surrounded by peers with similar goals. Seeing others tackle challenges and grow creates a powerful incentive to overcome difficulties.</p><p>Feedback from mentors and peers is equally crucial for intellectual growth, allowing us to iterate on ideas and hone our skills. A community of learners is key. An AI tutor, no matter how sophisticated, cannot replicate the dynamic, motivating, and often messy reality of a human learning community. Learning together always beats learning alone.</p><h3><strong>The Alternative</strong></h3><p>The alternative to the flawed myth of the personalized tutor is to view AI not as an automated teacher, but as a powerful tool for augmentation within a human-centered community. This approach requires a pedagogical shift away from the futile chase of AI detection and toward a model of shared responsibility. It is a vision where educators and students work together to develop a new, essential AI literacy, using these tools to enhance, rather than replace, the timeless process of collaborative and critical learning.</p><h2><strong>Why AI Detection Is Futile</strong></h2><p>Before educators can effectively integrate AI, they must first understand that the detection of AI-generated content is a hopeless chase. Any attempt to police AI use through detection tools is an unwinnable arms race destined to fail for a number of practical and pedagogical reasons.</p><p>First, the technology itself is fundamentally flawed. Detectors will always be lagging behind the generative models they seek to identify, perpetually playing catch-up in a race they cannot win. The supposed telltales of AI-generated text&#8212;overly formal language, a lack of personal voice, perfect grammar&#8212;are not robust signals. They are merely fleeting characteristics of specific models at a single point in time. A detector trained to spot GPT-4&#8217;s style is useless against the next generation of models, and it&#8217;s even more useless against a student who uses one of the clever prompt techniques this very book teaches to make the output more human-like.</p><p>Second, these tools are dangerously inaccurate. Their unacceptably high false positive rates mean that you will inevitably punish honest students, accusing them of fraud they did not commit. This is an ethical line no educator should be willing to cross. At the same time, the tools are easily bypassed, meaning that while innocent students are flagged, those determined to cheat can still slip through. The result is a system that is both unjust and ineffective.</p><p>This cat-and-mouse game also creates perverse incentives. It encourages students to spend more time hiding and tinkering with AI to bypass detectors than on the actual intellectual work of the assignment. Their focus shifts from critical thinking to &#8220;evasion engineering.&#8221; This is the exact opposite of the goal of education.</p><p>Ultimately, a reliance on detection tools creates an environment of distrust that is toxic to learning. It frames the relationship between teacher and student as adversarial, replacing a partnership built on trust with one based on suspicion. Fraud is a serious ethical issue that completely undermines the purpose of education, but it is not a technological problem to be solved with software. It is a human one that must be discussed on ethical grounds, as a violation of the shared trust that makes a learning community possible. When fraud is committed, we all lose.</p><h2><strong>A Practical Guide for Educators</strong></h2><p>The only viable path forward is to shift our mindset from policing to integration, adapting our methods to leverage AI&#8217;s strengths while mitigating its weaknesses.</p><h3><strong>Redesigning Assignments for the AI Era</strong></h3><p>With the traditional take-home essay now vulnerable to automation, educators must redesign assignments to incorporate AI as a tool for thinking, not a machine for answers. This requires a fundamental shift in what we choose to assess.</p><p>The most effective strategy is to focus on process, not just product. Instead of grading only the final essay or report, the assessment can be expanded to include the student&#8217;s engagement with the AI. Requiring students to submit their chat logs or a written reflection on their process&#8212;detailing the prompts they used, how they evaluated the AI&#8217;s output, and the modifications they made&#8212;makes their thinking visible. This turns the inquiry itself into the gradable artifact, rewarding critical engagement over simple content generation.</p><p>Another powerful approach is to turn students into AI critics. Instead of asking them to produce a text, assign them the task of deconstructing an AI-generated one. For example, a student could be asked to prompt an AI to write an essay on a historical event and then write their own analysis of its factual errors, logical fallacies, and underlying biases. This transforms the assignment from a simple writing task into a high-level critical thinking exercise, teaching students to be skeptical and analytical consumers of AI-generated content.</p><p>Finally, it is essential to emphasize human-centric assessments that are inherently resistant to automation. These methods evaluate skills that AI cannot replicate, such as real-time argumentation, interpersonal collaboration, and embodied knowledge. This includes a renewed focus on in-class discussions and Socratic seminars, oral exams and presentations, timed hand-written essays, and hands-on lab work or collaborative projects. While these redesigned assignments require a different kind of engagement, the time saved by using AI for administrative tasks can be reinvested here, creating a more sustainable and pedagogically valuable workflow.</p><h3><strong>AI as a Teacher&#8217;s Super-Assistant</strong></h3><p>AI&#8217;s greatest potential may lie in its ability to reduce the significant administrative burden on teachers, freeing them up to focus on the deeply human work of teaching and mentoring.</p><p>As a tool for lesson planning and differentiation, AI can be an invaluable creative partner. An educator can brainstorm engaging lesson plans, get suggestions for creative activities, or generate differentiated materials&#8212;such as simplified texts or vocabulary lists&#8212;for students with diverse learning needs in a fraction of the time it would take manually. For instance, a teacher could use a prompt like: <em>&#8220;Act as an instructional designer. Create a 45-minute lesson plan for 10th graders on the causes of World War I, including a hook, a collaborative activity, and a formative assessment.&#8221;</em></p><p>For rubric and feedback generation, AI can be truly transformative. It can draft clear, comprehensive grading rubrics in seconds. More importantly, it can help solve the feedback bottleneck by providing initial, personalized feedback on student work. An educator can quickly review a student&#8217;s draft, identify key areas for improvement, and instruct the AI to provide detailed, constructive feedback on those specific points, without rewriting the text for the student. The teacher then reviews and approves the AI&#8217;s feedback before sending it. This &#8220;human-in-the-loop&#8221; model allows teachers to provide timely, detailed, and individualized feedback at a scale that was previously impossible. A teacher might use a prompt like: <em>&#8220;Here is a paragraph I wrote. Provide feedback focusing on the strength of their topic sentence and their use of evidence, but do not rewrite it for them.&#8221;</em></p><h3><strong>Fostering an AI-Ready Classroom</strong></h3><p>Creating a healthy learning environment in the age of AI requires a proactive approach centered on clear policies, digital literacy, and open communication.</p><p>The foundation is to establish a clear classroom AI policy. Every educator should develop a simple, flexible policy for AI use and review it regularly. This policy should function as a guide for ethical engagement, not a list of prohibitions. It is crucial to define what constitutes constructive, ethical use (e.g., brainstorming, getting feedback on one&#8217;s own writing) versus what constitutes academic dishonesty (e.g., submitting AI-generated text as one&#8217;s own).</p><p>Beyond rules, educators must integrate AI literacy into the curriculum. It cannot be assumed that students understand how these tools work. This means dedicating class time to educating students on the capabilities, limitations, and ethical considerations of AI. This includes teaching practical skills like effective prompt engineering and essential concepts like how to spot AI &#8220;hallucinations&#8221; and the subtle ways that training data can introduce bias into the model&#8217;s output.</p><p>A simple and effective way to guide students is to create and share custom prompts and reusable AIs. By crafting prompts that are tailored to specific pedagogical goals&#8212;for example, a template designed to encourage critical analysis of a source&#8212;educators can model effective AI use. An even more powerful extension of this is to create shareable, custom AIs, often called &#8220;Custom GPTs&#8221; or &#8220;Gems.&#8221; These are specialized versions of the AI that are pre-loaded with specific instructions and context. An educator could create a &#8220;History Thesis Helper&#8221; that is an expert in their course material, or a &#8220;Lab Report Formatter&#8221; that guides students through the required structure. Sharing these resources not only helps students get better results but also embeds the desired learning process directly into the tool they are using.</p><p>Finally, it is vital to foster open dialogue. An educator should create a classroom culture where students feel comfortable and safe discussing the role of AI in their learning, asking questions, and even sharing their mistakes. By addressing the ethical implications and potential pitfalls of AI tools openly, the classroom becomes a collaborative space for exploring this new technology, fostering a sense of shared responsibility for its ethical use.</p><p>It is important to recognize that &#8220;AI burnout&#8221; is a reality. Many educators feel an immense pressure to adapt to everything at once, and that they have no time to do so. But this is not true. While we cannot dismiss AI, we do not have to change everything at the same time. The most sustainable path is one of small, deliberate experiments. By injecting AI into the easier parts of our teaching tasks first, we can achieve some easy wins, build our confidence, and give ourselves the time to reflect on the consequences before moving on to more ambitious integrations. The checklist below offers a simple way to begin.</p><h3><strong>A Four-Step Checklist for Educators</strong></h3><p>For educators feeling overwhelmed, here is a simple, actionable checklist to begin integrating AI into your practice:</p><ol><li><p><strong>Create and Discuss Your AI Policy:</strong> Draft your classroom AI policy using the appendix as a model. The most important step is to discuss it openly with your students on the first day. Frame it as a shared agreement for ethical engagement.</p></li><li><p><strong>Use AI as an Assistant for One Task:</strong> Pick one administrative task this week and use an AI to help. Draft a lesson plan, create a rubric for an upcoming assignment, or generate a set of discussion questions. Experience the tool&#8217;s power and limitations firsthand.</p></li><li><p><strong>Redesign One Assignment:</strong> Choose one of your existing assignments and brainstorm how you could redesign it to focus more on process, critical evaluation, or in-class performance. Start small and iterate.</p></li><li><p><strong>Share a Resource:</strong> Create and share a custom GPT or a well-crafted prompt template designed to help your students kickstart one self-study activity or assignment. This models good practice and provides a valuable resource.</p></li></ol><h2><strong>A Guide for the Modern Learner</strong></h2><p>For students, AI can be the most powerful learning tool ever created, but only if used with intention and integrity. The goal is to use AI to learn, not to short-circuit your own understanding. This requires a conscious shift from viewing AI as an answer machine to viewing it as a thinking partner.</p><h3><strong>Your Responsibilities as a User</strong></h3><p>Ethical use of AI begins with a clear understanding of your responsibilities. First and foremost, you must verify and clarify policies. Every course and institution will have different guidelines for AI use; it is your responsibility to know them and, when in doubt, to ask your instructor. Second, practice transparent disclosure. Being honest about how and where you have used AI in your assignments is a cornerstone of academic integrity and builds trust with your educators. Finally, you must protect sensitive information. Never input personal, confidential, or proprietary data into public AI models, as you have no control over how that data might be used or stored.</p><h3><strong>Using AI to Kickstart Your Work</strong></h3><p>One of the most effective and ethical ways to use AI is as a brainstorming partner to overcome the inertia of a blank page. You can use AI to generate initial ideas for a project, create a structured outline for an essay, or synthesize the key points from a long article. In this role, the AI acts as a catalyst for your own thinking, providing a foundation upon which you can build your original work. The goal is to use it to support your thinking, not replace it.</p><h3><strong>Using AI to Deepen Understanding</strong></h3><p>Instead of asking for a direct answer, use AI to guide you toward your own understanding. You can turn the AI into a Socratic partner that asks you questions instead of giving you solutions. For example, a prompt like <em>&#8220;I&#8217;m trying to understand the causes of the French Revolution. Don&#8217;t list them for me. Instead, ask me questions that will lead me to the key factors&#8221;</em> transforms a passive query into an active learning exercise. This approach reintroduces the &#8220;desirable difficulty&#8221; that is essential for true learning, using the AI to guide you rather than carry you.</p><p>AI is also an excellent tool for concept exploration. When faced with a complex idea, you can ask the AI to explain it in simpler terms or through an analogy, such as <em>&#8220;Explain the concept of general relativity to me as if I were 12 years old.&#8221;</em> This helps you build an intuitive grasp of the material that goes beyond rote memorization.</p><h3><strong>Using AI to Refine Your Skills</strong></h3><p>AI can be an invaluable coach for improving your practical skills through iterative feedback. As a writing coach, it can offer suggestions on clarity, tone, and structure without doing the writing for you. You can submit a paragraph you have written and ask for specific feedback, such as <em>&#8220;Can you suggest three stronger verbs I could use in this sentence?&#8221;</em></p><p>As a practice partner, AI can generate an infinite number of practice problems for subjects like math, coding, or language vocabulary. You can ask it to create a quiz for you and then, crucially, to provide detailed explanations for any questions you get wrong, allowing you to learn from your mistakes in a low-stakes environment.</p><h3><strong>Build Your Own AI Tools</strong></h3><p>Beyond one-off prompts, the next level of AI literacy is learning to create your own reusable AI assistants. Modern AI platforms allow you to create &#8220;Custom GPTs&#8221; or &#8220;Gems&#8221;&#8212;specialized versions of the AI that you pre-program with your own instructions and knowledge. This is a powerful way to personalize your learning. For example, you could build a &#8220;Study Buddy&#8221; and upload all your course notes, empowering it to quiz you on the specific material. You could create a &#8220;Socratic Tutor&#8221; that is permanently instructed to only ask you guiding questions and never give direct answers. By building your own tools, you move from being a simple user to a creator, a skill that is becoming increasingly valuable.</p><h3><strong>Developing AI Literacy</strong></h3><p>Ultimately, the most important skill for a 21st-century learner is not just knowing how to use AI, but knowing how to critically evaluate its output. Never trust blindly. This new &#8220;AI literacy&#8221; is built on three pillars.</p><p>First, always be skeptical. Treat every statement an AI generates as a claim, not a fact. Second, fact-check everything. AI models can and will &#8220;hallucinate&#8221; incorrect information with complete confidence. You are the ultimate authority and are responsible for the accuracy of your work. Always use trusted, primary sources to verify any factual information the AI provides. Finally, learn to look for bias. Understand that the AI&#8217;s training data is a reflection of the vast and messy internet, full of human biases and stereotypes. Always question the perspective of the text it generates and be aware of its inherent limitations.</p><h3><strong>Putting It All Together</strong></h3><p>Here is a step-by-step example of how you might ethically use AI to help with a research paper:</p><ol><li><p><strong>Brainstorming:</strong> Use the AI to explore potential topics and narrow your focus.</p></li><li><p><strong>Outlining:</strong> Work with the AI to structure your main arguments and create a logical outline.</p></li><li><p><strong>Research:</strong> Use the AI to find sources or summarize articles, but <strong>always</strong> go to the original source to read it yourself and fact-check every claim.</p></li><li><p><strong>Drafting:</strong> Write the full draft in your own words, using your outline and research.</p></li><li><p><strong>Feedback:</strong> Ask the AI for feedback on the clarity, structure, and style of your draft.</p></li><li><p><strong>Submission Checklist:</strong> Before submitting, review this list:</p><ul><li><p>Have I fact-checked every claim that originated from the AI?</p></li><li><p>Can I explain and defend every part of this work in my own words?</p></li><li><p>Have I followed my instructor&#8217;s AI policy to the letter?</p></li><li><p>Does my declaration accurately and specifically describe how I used AI in this assignment?</p></li></ul></li></ol><h2><strong>Conclusion</strong></h2><p>The techno-pragmatist ethos that guides this book is rooted in a fundamental belief: the future is not predetermined. Technology is a tool whose impact is profoundly shaped by how we choose to employ it, and this is nowhere more true than in education. As a college professor, this is not an abstract debate for me; it is a topic I care about deeply, and I feel a profound responsibility to get it right.</p><p>The challenge is not to resist this new technology, but to harness it with wisdom. Instead of chasing the flawed ideal of automation or descending into an adversarial relationship based on detection, we must embrace a necessary pedagogical shift. The central problem in modern education is not a lack of content, but a scarcity of timely, personalized feedback. High student-to-teacher ratios make it nearly impossible for educators to provide the deep, iterative guidance that is crucial for student growth.</p><p>This is where AI can create a true revolution. Therefore, the true north for AI in education is not automation, but augmentation. We must leverage AI to solve the feedback bottleneck, using it to do what it does best&#8212;process information and provide feedback at scale&#8212;so that we, educators and learners, can focus on what we do best: questioning, creating, and collaborating within a human-centered community.</p><p>It is from this techno-pragmatist perspective that we have offered these guides. The strategies herein are not just tips and tricks; they are a framework for shouldering the shared responsibility of building a new AI literacy, ensuring that these powerful tools serve, rather than subvert, the timeless goals of a meaningful education.</p><div><hr></div><p><em>Thanks for reading so far! As a complementary resource, here is a draft of an AI Policy for STEM classes. Feel free to, share, modify, and reuse it as you see fit.</em></p><div class="file-embed-wrapper" data-component-name="FileToDOM"><div class="file-embed-container-reader"><div class="file-embed-container-top"><image class="file-embed-thumbnail-default" src="https://substackcdn.com/image/fetch/$s_!0Cy0!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack.com%2Fimg%2Fattachment_icon.svg"></image><div class="file-embed-details"><div class="file-embed-details-h1">Example AI Policy for STEM Classrooms</div><div class="file-embed-details-h2">31.6KB &#8729; PDF file</div></div><a class="file-embed-button wide" href="https://blog.apiad.net/api/v1/file/a1a35b11-416f-4955-9a0f-2c889bc2e045.pdf"><span class="file-embed-button-text">Download</span></a></div><a class="file-embed-button narrow" href="https://blog.apiad.net/api/v1/file/a1a35b11-416f-4955-9a0f-2c889bc2e045.pdf"><span class="file-embed-button-text">Download</span></a></div></div><p></p><p><em>This is a first draft of Chapter 7 of my upcoming book <strong>Mostly Harmless AI</strong>. Please, do share with me all your comments, suggestions, criticisms, and ideas.</em></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://blog.apiad.net/p/artificial-intelligence-for-educators/comments&quot;,&quot;text&quot;:&quot;Leave a comment&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://blog.apiad.net/p/artificial-intelligence-for-educators/comments"><span>Leave a comment</span></a></p>]]></content:encoded></item><item><title><![CDATA[AI for Critical Thinkers]]></title><description><![CDATA[Chapter 4 of Mostly Harmless AI]]></description><link>https://blog.apiad.net/p/ai-for-critical-thinkers</link><guid isPermaLink="false">https://blog.apiad.net/p/ai-for-critical-thinkers</guid><dc:creator><![CDATA[Alejandro Piad Morffis]]></dc:creator><pubDate>Tue, 29 Jul 2025 11:01:26 GMT</pubDate><enclosure url="https://images.unsplash.com/photo-1617745143864-3907eaa93603?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHw5NHx8Y2hlc3N8ZW58MHx8fHwxNzUzNzMwMjY3fDA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://images.unsplash.com/photo-1617745143864-3907eaa93603?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHw5NHx8Y2hlc3N8ZW58MHx8fHwxNzUzNzMwMjY3fDA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://images.unsplash.com/photo-1617745143864-3907eaa93603?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHw5NHx8Y2hlc3N8ZW58MHx8fHwxNzUzNzMwMjY3fDA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080 424w, https://images.unsplash.com/photo-1617745143864-3907eaa93603?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHw5NHx8Y2hlc3N8ZW58MHx8fHwxNzUzNzMwMjY3fDA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080 848w, https://images.unsplash.com/photo-1617745143864-3907eaa93603?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHw5NHx8Y2hlc3N8ZW58MHx8fHwxNzUzNzMwMjY3fDA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080 1272w, https://images.unsplash.com/photo-1617745143864-3907eaa93603?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHw5NHx8Y2hlc3N8ZW58MHx8fHwxNzUzNzMwMjY3fDA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080 1456w" sizes="100vw"><img src="https://images.unsplash.com/photo-1617745143864-3907eaa93603?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHw5NHx8Y2hlc3N8ZW58MHx8fHwxNzUzNzMwMjY3fDA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080" width="7264" height="4843" data-attrs="{&quot;src&quot;:&quot;https://images.unsplash.com/photo-1617745143864-3907eaa93603?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHw5NHx8Y2hlc3N8ZW58MHx8fHwxNzUzNzMwMjY3fDA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:4843,&quot;width&quot;:7264,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;silver tabby cat on brown wooden floor&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="silver tabby cat on brown wooden floor" title="silver tabby cat on brown wooden floor" srcset="https://images.unsplash.com/photo-1617745143864-3907eaa93603?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHw5NHx8Y2hlc3N8ZW58MHx8fHwxNzUzNzMwMjY3fDA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080 424w, https://images.unsplash.com/photo-1617745143864-3907eaa93603?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHw5NHx8Y2hlc3N8ZW58MHx8fHwxNzUzNzMwMjY3fDA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080 848w, https://images.unsplash.com/photo-1617745143864-3907eaa93603?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHw5NHx8Y2hlc3N8ZW58MHx8fHwxNzUzNzMwMjY3fDA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080 1272w, https://images.unsplash.com/photo-1617745143864-3907eaa93603?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHw5NHx8Y2hlc3N8ZW58MHx8fHwxNzUzNzMwMjY3fDA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Photo by <a href="true">Rick J. Brown</a> on <a href="https://unsplash.com">Unsplash</a></figcaption></figure></div><blockquote><p><em>This article is based on Chapter 4 of my upcoming book <strong><a href="https://store.apiad.net/l/ai/fiftyoff">Mostly Harmless AI</a></strong>.</em> </p></blockquote><p>Having journeyed through the foundations of artificial intelligence&#8212;its history, its mechanics, and its limitations&#8212;we arrive at the most pressing question: How do we actually use the powerful new tools it has produced? </p><p>This article serves as a bridge from the theoretical to the practical. This is not a list of traditional prompt engineering hacks. The internet is filled with tips and tricks for coaxing a specific output from a language model for a single task. Instead, the advice that follows builds on our foundational understanding to offer something more durable: a general mindset for working with these tools. This approach is about engaging with language models in a way that allows you to get the best out of them without subcontracting your own critical thinking. It is a methodology for augmenting your intellect, not replacing it.</p><p>The principles you learn here are foundational, offering a universal toolkit for interacting with large language models in daily life&#8212;whether you are planning a vacation, trying to understand a complex news article, or drafting a simple email. In the following chapters (in the book), we explore how to adapt and intensify these practices for specialized, high-stakes professional environments. But first, every user must learn how to engage with these powerful yet fallible tools safely, critically, and effectively.</p><div><hr></div><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://blog.apiad.net/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading The Computist Journal! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><div><hr></div><h2><strong>A Methodology for Effective Interactions with Language Models</strong></h2><p>To move beyond simple queries and unlock the true potential of language models, we need a more structured approach. This methodology is divided into three parts: establishing the right Mindset, employing effective Tactics during the conversation, and building a System to make your successes repeatable.</p><h3><strong>The Mindset</strong></h3><p>The most significant shift is in your mental model. Instead of treating the model as a search engine, you should approach it as a conversational partner. This means recognizing that the interaction is iterative and that your most important role is to guide the conversation. </p><p>A key part of this mindset is adopting a Socratic (or inquiring) approach, where you use the model not just to get answers, but to help you ask better questions. This is invaluable for sensitive and important tasks. </p><p>For example, instead of starting with &#8220;Write an email asking for a raise,&#8221; a partner-based approach would be to ask the model to guide you: &#8220;I need to write an email to my manager to ask for a raise. What are the key pieces of information and evidence I should gather first to make the strongest possible case?&#8221; The model will then prompt you for your accomplishments and market data, helping you build your argument before a single word is written. </p><p>Similarly, when organizing a child&#8217;s birthday party, you could ask, &#8220;I&#8217;m planning a science-themed party for my 7-year-old. What are the key logistical details I need to consider to make sure it runs smoothly?&#8221; In both cases, you are using the model to help you define the problem, which is a far more powerful use of its capabilities.</p><h3><strong>The Tactics</strong></h3><p>With the right mindset, you can employ specific tactics to steer the conversation toward a high-quality outcome. The most fundamental tactic is to be explicit and strategic with your queries. To ground the model&#8217;s response in reliable information, tell it where to look. </p><p>A generic query for medical advice is risky, whereas a much safer prompt would be: &#8220;Search for information from the Mayo Clinic and the World Health Organization on the common symptoms of iron deficiency.&#8221; This specificity is also crucial when comparing complex options, like, for example, buying an EV, you can be as specific as: &#8220;Compare the Tesla Model 3, the Hyundai Ioniq 5, and the Ford Mustang Mach-E for a family of four. Focus on real-world range, charging speed on a standard home charger, and available cargo space.&#8221;</p><p>To get an even more robust answer, you can move beyond a single query and assemble a &#8216;committee of experts.&#8217; A single language model will give you its most statistically likely answer, which might not be the most creative or well-rounded one. To overcome this, you can generate multiple, independent perspectives. </p><p>For the EV comparison, you could open three separate conversation windows. In the first, you&#8217;d ask the model to act as a pragmatic engineer and, perhaps, the model will argue for the Hyundai. In the second prompt, you&#8217;d ask it to be a tech enthusiast, and maybe it makes the case for the Tesla. In the third, you&#8217;d have it act as a family-focused reviewer, causing it to arguing for the Ford. </p><p>By copying these three independent analyses into a final chat window, you can then ask the model to act as a senior editor, synthesizing the competing viewpoints into a final, balanced recommendation that weighs factors like cost, range, and reliability.</p><p>Finally, after the model provides a response&#8212;either a single answer or a synthesized one from your committee&#8212;you can employ self-criticism as a final refinement tactic. </p><p>Once you have a draft of your email asking for a raise, you can prompt it: &#8220;Read the email you just drafted. Now, act as my manager who is busy and skeptical. What parts of this email are unconvincing? Is the tone too demanding or not confident enough?&#8221; This critical step often surfaces weaknesses that you might have missed, allowing you to create a much stronger final product.</p><h3><strong>The System</strong></h3><p>The final part of the methodology is to turn your successful interactions into a repeatable system. A common mistake is to treat prompts as disposable. A more powerful approach is to build a library of reusable prompts, thinking of them as personal &#8220;natural language programs.&#8221; The multi-step process you used to plan the birthday party can be saved as a &#8220;Kids&#8217; Party Planner&#8221; template. </p><p>The Socratic prompt that helped you prepare for your salary negotiation can be generalized into a &#8220;Career Conversation Prep&#8221; tool. The ultimate expression of this principle is the use of features like OpenAI&#8217;s &#8220;Custom GPTs,&#8221; which allow you to encapsulate a complex task into a dedicated tool that you or your team can use with a simple request.</p><h2><strong>A Practical Example</strong></h2><p>To see how these principles combine into a powerful workflow, let&#8217;s walk through a comprehensive, real-world task: planning a 10-day family vacation to Italy.</p><p>Rather than beginning with a vague request like &#8220;plan a trip,&#8221; the process starts by applying the Socratic approach. You would first ask the model to frame the problem for you: &#8220;I want to plan a 10-day family vacation to Italy. What key information do you need from me to create the best possible itinerary?&#8221;</p><p>This immediately shifts the dynamic, positioning the model as a guided partner. In response, it would act as a consultant, asking for crucial details like the number of travelers, the children&#8217;s ages, your budget, family interests, and preferred travel pace.</p><p>Once you&#8217;ve provided this context, the next step is to ensure alignment. You would instruct the model to synthesize and confirm the constraints: &#8220;Great, thank you. Based on my answers, please summarize all of my constraints for this trip in a structured list.&#8221; </p><p>With a clear, confirmed set of requirements, you can then confidently ask for a first draft. The iterative heart of the process begins now. Upon receiving the initial itinerary, you would employ the self-criticism tactic: &#8220;This is a good start. Now, act as a skeptical travel agent. Criticize this itinerary and tell me what&#8217;s missing or what could go wrong.&#8221;</p><p>The model might point out that visiting three major cities in ten days is too ambitious for a family with young children. Based on this valuable feedback, you can guide the revision, continuing this loop of drafting and critiquing until the plan is refined to your satisfaction. Only then would you ask for the final, detailed output. </p><p>The final, powerful step is to generalize this success. You would ask the model to convert the entire conversation into a reusable &#8220;Family Vacation Planner&#8221; template, complete with placeholders for key details. This turns a one-time effort into a valuable, programmable asset for future trips, demonstrating the true power of thinking of prompts as reusable programs.</p><h2><strong>Common Pitfalls for the Everyday User</strong></h2><p>The good practices above are designed to improve the quality of a language model&#8217;s output. This section focuses on the mental traps and risks you must be aware of to use these tools safely.</p><h3><strong>The &#8220;Eliza Effect&#8221; and Misplaced Trust</strong></h3><p>Because chatbots are designed to be conversational and helpful, it&#8217;s easy to start treating them as if they have genuine understanding, intentions, or even consciousness. This is a modern version of the &#8220;ELIZA effect&#8221; we discussed in the history chapter. The danger is that this leads to misplaced trust, where we stop critically questioning the model&#8217;s output because it feels so confident and knowledgeable. This is the psychological trap that makes us vulnerable to hallucinations; we are less likely to fact-check a &#8220;partner&#8221; than a machine.</p><h3><strong>Cognitive Offloading and The &#8220;Lazy Brain&#8221; Problem</strong></h3><p>The ease of asking a language model to summarize an article, draft an email, or brainstorm ideas can lead to a subtle but significant danger: cognitive offloading. By outsourcing the fundamental work of thinking, synthesizing, and structuring our thoughts, we risk letting our own critical thinking and creative muscles atrophy. The goal is to use these tools to think better, not to think less. Over-reliance can make us less capable problem-solvers in the long run.</p><h3><strong>The Privacy Risk of Casual Conversation</strong></h3><p>In a casual conversation with a chatbot, it&#8217;s easy to forget that you are interacting with a complex system run by a corporation. Users often paste sensitive personal information&#8212;medical details, financial data, private emails, proprietary work content&#8212;into public language models without considering where that data goes, how it&#8217;s used for future training, or who might have access to it. What you tell the model does not stay between you and the model.</p><h2><strong>You Are the Final Authority</strong></h2><p>The techniques above teach you how to get better raw material from the language model. This final principle is about what you, the human, must do with that material. It is the most critical step in using these tools responsibly.</p><p>First, never trust, always verify. The language model is an unreliable narrator. Treat its output as a well-written first draft, not a finished fact. For any critical piece of information&#8212;a date, a statistic, a medical suggestion, a legal point&#8212;you must verify it using an independent, authoritative source. The model can help you find potential sources, but you are the fact-checker.</p><p>Second, synthesize, don&#8217;t just copy-paste. The model&#8217;s output is information; your goal is knowledge. The most important work happens after the model has responded. Your job is to synthesize its suggestions with your own experience, judgment, and goals. The model can generate a list of tourist sites for your Italy trip, but only you can synthesize that into a vacation plan that feels right for your family.</p><p>Finally, own the outcome. The language model is a tool, and you are the user. Any decision made, any email sent, or any action taken based on the model&#8217;s output is your responsibility. This principle of accountability is non-negotiable. The model is an assistant that can help you think, but it is not a replacement for your personal judgment.</p><h2><strong>Conclusion</strong></h2><p>The journey from a novice user to a skilled one is not about memorizing clever prompts; it&#8217;s about a fundamental shift in mindset. Instead of treating generative AI as a vending machine for answers&#8212;an approach fraught with risks of shallowness, bias, and error&#8212;we&#8217;ve seen the power of engaging it as a conversational partner.</p><p>The practices outlined in this article&#8212;the Socratic method, strategic querying, and, most importantly, critical verification&#8212;form a framework for responsible engagement. This framework places you, the user, firmly in the driver&#8217;s seat. </p><p>The quality of the model&#8217;s output is not a feature of the model alone; it is a direct reflection of the quality of your guidance and the rigor of your review. You are not just a prompter; you are a director, a critic, and a synthesizer. This is what makes these powerful tools &#8216;mostly harmless&#8217;: not their inherent nature, but our commitment to using them with critical awareness and human authority.</p><p>By mastering these foundational skills, you are not just learning to use a new tool. You are developing a new form of literacy for the 21st century. As we move into the specialized applications for knowledge workers, developers, and creatives in the following chapters (of the book), this ability to think with AI, not just ask of it, will be your most valuable asset.</p><blockquote><p><em>Thanks again for reading. If you want to dive deeper into Artificial Intelligence and learn to make the best out of it, from a techno-pragmatist, human-centered, responsible perspective, please check out my book <strong><a href="https://store.apiad.net/l/ai/fiftyoff">Mostly Harmless AI</a></strong>.</em></p></blockquote>]]></content:encoded></item><item><title><![CDATA[The State of AI for Software Development]]></title><description><![CDATA[Tools of the Trade, and Why You Should Still Learn to Code...]]></description><link>https://blog.apiad.net/p/the-state-of-ai-for-software-development</link><guid isPermaLink="false">https://blog.apiad.net/p/the-state-of-ai-for-software-development</guid><dc:creator><![CDATA[Alejandro Piad Morffis]]></dc:creator><pubDate>Sat, 26 Jul 2025 11:12:02 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!0vOu!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb269c1a1-68f3-47c2-a101-b00875ee6f49_1024x608.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!0vOu!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb269c1a1-68f3-47c2-a101-b00875ee6f49_1024x608.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!0vOu!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb269c1a1-68f3-47c2-a101-b00875ee6f49_1024x608.png 424w, https://substackcdn.com/image/fetch/$s_!0vOu!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb269c1a1-68f3-47c2-a101-b00875ee6f49_1024x608.png 848w, https://substackcdn.com/image/fetch/$s_!0vOu!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb269c1a1-68f3-47c2-a101-b00875ee6f49_1024x608.png 1272w, https://substackcdn.com/image/fetch/$s_!0vOu!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb269c1a1-68f3-47c2-a101-b00875ee6f49_1024x608.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!0vOu!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb269c1a1-68f3-47c2-a101-b00875ee6f49_1024x608.png" width="1024" height="608" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/b269c1a1-68f3-47c2-a101-b00875ee6f49_1024x608.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:&quot;normal&quot;,&quot;height&quot;:608,&quot;width&quot;:1024,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!0vOu!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb269c1a1-68f3-47c2-a101-b00875ee6f49_1024x608.png 424w, https://substackcdn.com/image/fetch/$s_!0vOu!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb269c1a1-68f3-47c2-a101-b00875ee6f49_1024x608.png 848w, https://substackcdn.com/image/fetch/$s_!0vOu!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb269c1a1-68f3-47c2-a101-b00875ee6f49_1024x608.png 1272w, https://substackcdn.com/image/fetch/$s_!0vOu!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb269c1a1-68f3-47c2-a101-b00875ee6f49_1024x608.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"></figcaption></figure></div><blockquote><p><em>This article is based on Chapter 5 of my in-progress book <strong><a href="https://store.apiad.net/l/ai/fiftyoff">Mostly Harmless AI</a></strong>.</em></p></blockquote><p>Few developments in the generative AI space have been as exciting lately as the rise of code generators. The evolution of these AI coding assistants is best understood not as a single leap, but as a progression of capabilities, moving from simple autocomplete to what may one day be fully autonomous agents.</p><p>At their core, code generators are Large Language Models trained on vast amounts of public code. They treat programming languages just like human languages, learning the patterns, syntax, and structure to predict what comes next. These models can take a natural language prompt and some contextual code and produce new code that mostly aligns with the prompt's intention. For example, you can provide a function signature and a comment like, "This function finds the minimum of an unsorted list," and the model will generate the function's body. </p><p>This uncanny ability to comprehend and generate code based on human communication is transforming the development landscape, but it also requires special considerations, as code is not just another natural language.</p><p>In this article, we will explore the landscape of AI for software development. We will begin by looking under the hood, examining the spectrum of capabilities that allow AI to generate code. Next, we will explore the use cases for developers across the development lifecycle. Then, we will discuss some important things to keep in mind, from hallucinations and security to theoretical limitations of AI for coding. Finally, we will look to the future of coding, consider how the developer role is evolving, and try to answer one crucial question: <em>is coding dead</em>?</p><div><hr></div><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://blog.apiad.net/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading The Computist Journal! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><div><hr></div><h2>How to Make a Code Generator</h2><p>Let's imagine we are building our own code generator from scratch. The journey from a simple code predictor to a sophisticated development partner is a journey of adding layers of capability, moving up a spectrum of increasing autonomy.</p><p>The first thing we want is next-token prediction for code. The foundational layer is built on unsupervised training, making it essentially autocomplete on steroids, like a super duper IntelliSense. We start by training a model on vast amounts of code, teaching it to predict the most likely next token based on the immediate context. If we have variables or functions declared nearby, our model is more likely to generate code that references them, simply because that's the most common pattern in its training data.</p><p>Now that we have a basic generator, our next step is to teach it to follow instructions. To do this, we can compile a dataset of instruction pairs&#8212;for example, a natural language command like, "In the previous code, change the loop to be more efficient," paired with the corrected code. By training on these examples, our model learns to go beyond simple prediction and follow specific, human-given directions. We can further enhance this process with Reinforcement Learning, where we have human or automatic evaluators rank different code outputs. This teaches our model to not only generate syntactically correct code, but also to respect desired styles and naming conventions.</p><p>Our generator is getting smarter, but it's still limited by the immediate context. To give it a long-term memory, we move to context-aware generation. We can dramatically enhance our model's ability to generate relevant code by allowing it to pull from a broader context. This is a form of Retrieval-Augmented Generation (RAG) for Code. We can index an entire codebase or external API documentation, allowing our model to find relevant examples and patterns. When a developer asks a question, our system retrieves these examples and feeds them into the prompt, allowing the model to generate accurate code by combining and refactoring snippets from the provided context, even for libraries it wasn't explicitly trained on.</p><p>So far, our model can only write code. To up our game, we can give it the ability to interact with its environment by making it use external tools. This represents a significant leap. We can equip our model with a set of tools it can invoke on demand. For example, in response to a prompt like "add a library for charting," our model could invoke a tool to install the missing dependency in the project. If you ask it to "check whether this works," it could invoke another tool to run the unit tests and report back the results. It could even use tools to directly modify files in the codebase. By giving our model the ability to take actions beyond just generating text, we empower it to participate more actively in the development process.</p><p>The final step on our journey is to give our code generator a bit of autonomy, creating what&#8217;s called an agentic system. This is the most advanced and forward-looking form of AI-based code generation. We can design an AI agent that takes a high-level goal, breaks it down into sub-tasks, writes code, generates tests, runs the code, and then analyzes the output or errors. Based on the results, it can then debug or modify the source code in a continuous loop, acting as a semi-autonomous developer to see a task through from start to finish.</p><p>Beyond these training methodologies, we can leverage the formal nature of code to improve our model's performance. Unlike natural language, code has strict syntactic rules that can be programmatically checked. One simple but effective technique is trial and error during inference: we can have our model produce several potential code snippets, run them through a linter, and automatically reject any that have parsing errors. </p><p>More advanced techniques can pre-process the training data, for instance by normalizing all variable names to a generic format like <code>var0</code>, <code>var1</code>, etc. This makes it much easier for our model to learn the structural relationships in code without being distracted by specific naming conventions, and we can substitute the actual names back in a post-processing step. These tricks leverage the fact that we are dealing with a very restricted syntax to make it easier for our language model to learn the rules.</p><p>Finally, to create the ultimate specialized assistant, we can go beyond RAG and fine-tune a model on a specific codebase. While RAG provides external context, fine-tuning actually updates the model's internal weights. By training a model on a company's entire private and proprietary codebase, we can create a version that has deeply internalized that organization's specific architectural patterns, internal APIs, and coding standards. This results in an AI partner that not only answers questions correctly but does so in a way that is idiomatic and aligned with the team established practices.</p><h2>Use Cases for Developers</h2><p>Understanding the engine is one thing; knowing how to use it is another. AI offers you a powerful toolbox that you can apply across the entire software development lifecycle. In this section, you will explore practical use cases across three key phases: Ideation and Design, Implementation and Development, and Verification and Explanation. You will see how you can use both sophisticated, LLM-based coding tools integrated directly into your IDE, as well as techniques that you can use with standard, general-purpose chat apps like ChatGPT or Claude, requiring no special integration at all.</p><h3>Phase 1: Ideation and Design</h3><p>Before you even dare writing a single line of AI-generated code, you can already use LLMs as powerful brainstorming partners for exploration and design. This is one of the most accessible ways you can use AI, as it doesn't require a specific tool or editor extension; you can do it effectively using general-purpose conversational AI applications like ChatGPT, Perplexity, or Gemini. Models with live browsing capabilities are often even better for this phase, as they can pull in the latest information about new frameworks, libraries, and design patterns.</p><p>The key is for you to treat this phase as an interactive exploration. Instead of asking for a single, ready-made answer, you should guide the model through an ideation process. Here's how you can do it: use a chain-of-thought approach by asking the model not just for a solution, but to "think step-by-step" through the pros and cons of different architectural choices. </p><p>A powerful pattern you can use is to ask the model to generate several variants&#8212;for example, "Propose three different ways to design the database schema for a social media app." Then, you can discuss the options back and forth, using self-critique prompts to have the model compare the alternatives it just generated. At the end of this collaborative session, you can ask the model to provide a structured summary of all the design decisions you have agreed upon, acting as an executive design document. </p><p>With this document in hand, you can then start a new, more focused session with a code-oriented model for the actual implementation.</p><h3>Phase 2: Implementation and Development</h3><p>The most straightforward way you can use AI in this phase is for generating short, self-contained code snippets. This can be for a well-known algorithm, a common pattern, or the use of a well-documented API. This is a task you can accomplish with any standard chat app, even outside your IDE. This is especially powerful for navigating the complex world of APIs and libraries. </p><p>As a professional programmer, you probably aren't spending that much time doing basic coding, like inserting numbers in a list. No, reality is 90% of the code you write is interface code with some external library you may not know well. Instead of manually searching documentation, you can simply ask an AI assistant, "How do I use this library to make a query that does X?" and get a ready-to-use snippet.</p><p>The next level of integration is bringing AI directly into your IDE. This can start with simple code completion, but the real power comes from integrating a full chat experience. This allows you to highlight a block of code and ask for specific changes, such as, "Refactor this function to be more efficient," and have the model modify the file directly. If the model has RAG capabilities and can scan your entire codebase, it gets even better. The modifications and additions it suggests will be consistent with your existing coding style and use your own libraries and methods, making the integration seamless.</p><p>At the far end of the spectrum is the full agentic mode, which is still in its infancy with tools like Cursor. This offers a much more hands-off development experience. Here, you can give a coding agent a high-level task, and it can modify several files, create new ones, and even run commands in the terminal to install missing dependencies.</p><p>Finally, you don't always need a full IDE. For one-off scripts or quick prototypes, you can use the "Canvas mode" in apps like ChatGPT, Claude, or Gemini. These provide a simple editor-like interface where you can iterate back and forth with the model to update a script. Some tools even allow you to run these scripts directly in the cloud, letting you build and test disposable web apps instantly.</p><p>Working with these tools introduces a new core skill, an AI-in-the-loop coding workflow&#8212;the day-to-day interactive process of collaborating with an AI. It involves an iterative cycle of prompting with a clear goal, carefully reviewing the AI's output, correcting its mistakes or flawed assumptions, and then re-prompting with more specific instructions or feedback.</p><h3>Phase 3: Verification and Explanation</h3><p>To ensure your code quality, you can use an AI to help generate a wide range of test cases. This is especially useful for uncovering corner cases that might not be immediately obvious, such as handling empty inputs, maximum values, or unusual user behaviors. You can do this with integrated AI coding tools, but it can also be as easy as uploading your codebase or relevant files to a standard chat app and asking it to suggest test cases. </p><p>You can ask for both code-based tests (like unit and integration tests) as well as descriptive tests (like user stories or manual testing scripts). In all these scenarios, it helps to instruct the model with a Chain-of-Thought prompt, asking it to first explain what behavior it wants to test, and only then provide the actual test. This ensures the tests are intentional and well-understood.</p><p>Furthermore, when you're faced with a cryptic error, you can use AI for debugging. You can feed the AI the error message, stack trace, and relevant code, and it can analyze the context to suggest potential causes for the bug and possible fixes, acting as an experienced pair programmer.</p><p>The opposite, code-to-language direction allows you to create powerful new workflows for understanding code. You can ask an AI for automatic documentation of functions or for natural language explanations of a complex code fragment. This can be done directly inside your IDE with an integrated tool, or with standard chat apps. For example, some tools allow you to connect a public GitHub repository and ask high-level questions on the fly, which is very good for getting a quick overview of a new codebase.</p><p>A particularly valuable use case is in legacy code modernization. One of the biggest challenges in the software industry is maintaining and updating old codebases. You can use AI to tackle this problem by feeding it legacy code (e.g., from an old COBOL or Java system) and asking it to analyze the logic, add explanatory comments, or even translate the entire system to a modern language and architecture. This can dramatically reduce the cost and risk associated with modernizing critical systems.</p><p>However, you should be aware of the critical gap between syntax and semantics&#8212;that is, between understanding what the code <em>says</em> versus what the code <em>does</em>. </p><p>The weaker models are mostly limited to describing what the code is <em>saying</em> syntactically (e.g., "this variable is changed to this array position"), this capability is improving all the time. More powerful models can often provide higher-level, semantic explanations of what the code is <em>doing</em> (e.g., "this loop is ensuring the first part of the array is always sorted"). But even the best models may not be able to grasp the full architectural details or business logic of a complex application.</p><h3>Putting It All Together</h3><p>Putting this all together, let's see how a complete workflow might look for tackling a specific, somewhat complicated feature in an ongoing app, like adding OAuth login.</p><p>First, you would start in ideation mode, interacting mostly in text with the AI. You would discuss a high-level overview of the required architecture changes, which parts of the app might be impacted, and the best libraries to use. The goal here is to produce a clear design roadmap before any code is written.</p><p>Next, you would move to implementation mode, going full hands-on with an agentic tool. You could assign the agent the high-level task from your roadmap: "Implement the OAuth login feature using the chosen library." The agent would then get to work, creating new files, modifying existing ones, and writing the necessary code. As it encounters errors or ambiguities, you would engage in a back-and-forth conversation to guide it, but the bulk of the mechanical coding would be handled by the agent.</p><p>Finally, you would enter review mode. Once the agent reports that the feature is complete, you could have a final conversation with the AI. You could ask it to analyze the <code>git diff</code> of all the changes it made, explain the rationale for its implementation choices, and generate comprehensive documentation for a pull request. After your final review and approval, you would then submit the PR for human review by your team.</p><h2>Things to Keep in Mind</h2><p>While the toolbox is powerful, it comes with sharp edges. The most important limitation in language modeling, in general, has been called the problem of hallucinations. In the context of code, this means AI-generated code is not infallible and can contain subtle bugs that require constant vigilance.</p><h3>Hallucinations and Mistakes</h3><p>The simplest way you can see hallucinations is when you get code that uses a new variable that doesn't exist or fails to close a parenthesis. Unlike with natural language, you can often detect these syntactic errors automatically with a linter or compiler, so many of the more harmless hallucinations are not relevant as they won't introduce subtle bugs.</p><p>A slightly more difficult hallucination is what we can call a semantic hallucination, where the model uses a wrong variable or function name that <em>does</em> exist in your codebase. In this case, you will not get a compiler error because you're using an existing symbol, but you will get the wrong behavior. This is much harder to find because it has the same problem as most hallucinations: you have to review the code and be knowledgeable enough to have been able to generate that code yourself.</p><p>The most insidious errors are logical flaws. This occurs when the code doesn't do anything obviously wrong&#8212;it uses the right variables and looks plausible&#8212;but it has some subtle logical mistake that leads to a bug. For example, finding that a variable is not updated at the right moment in a nested loop is a tricky problem even for human experts. These kinds of mistakes will introduce subtle, hard-to-detect bugs.</p><p>But even if the bugs are no worse than what a human would introduce, they pose a threat because of "automation bias." When you check code written by humans, you expect bugs. But when you're looking at machine-generated code, the only way programmers have ever interacted with it has been with rule-based systems like compilers, and that code is basically without mistakes. </p><p>So even if the language model makes errors that are, on average, no worse than what a regular programmer would make, they can still be harder to detect because they won't be the exact same mistakes a human would make, and we may be less on guard.</p><h3>AI's Impact on Technical Debt</h3><p>The rapid generation of code by AI presents a double-edged sword for technical debt. On one hand, you can use AI as a powerful tool to reduce existing debt. You can ask it to analyze your codebase for inefficiencies, suggest refactorings, or add missing documentation and tests, thereby improving code quality. </p><p>On the other hand, the very speed of AI can create new and more complex forms of technical debt. Relying heavily on "vibe coding" to quickly generate features without rigorous human review can lead to a codebase filled with poorly understood, inefficient, or subtly buggy logic. This AI-generated debt can be even harder to untangle later, as the original human intent behind the high-level prompt may be lost.</p><h3>Biases in Generated Solutions</h3><p>Models trained on a vast corpus of public code from the internet will inevitably learn from outdated examples. This can lead them to perpetuate outdated practices by suggesting deprecated functions, old library versions, or inefficient algorithms that are no longer considered best practice. An AI model will also often default to the most statistically common solution it has seen in its training data. </p><p>This can stifle creativity and lead to a homogenization of code, discouraging the exploration of more elegant or contextually appropriate solutions. Finally, just as AI can perpetuate harmful societal biases, it can also reproduce human biases from the code it was trained on. This can manifest as non-inclusive language in generated comments or variable names.</p><h3>Security and Licensing Risks</h3><p>A significant risk is that an AI can generate code with known security vulnerabilities. If the model was trained on public code containing flaws like SQL injection or buffer overflows, it may reproduce those same insecure patterns in its suggestions, creating a major security risk for the application. </p><p>Furthermore, the use of AI-generated code introduces complex legal questions. A model might reproduce a code snippet verbatim from a repository with a restrictive open-source license (like the GPL), inadvertently pulling that license's requirements into a proprietary project. The legal ownership of the AI-generated code itself remains a gray area, creating potential intellectual property challenges for companies.</p><h3>The Economics of AI Development Tools</h3><p>While these AI tools offer significant productivity boosts, they are not free. For development teams and organizations, it's important to consider the practical economics of their adoption. Most advanced AI coding assistants operate on a subscription model, which introduces a new operational cost. Team leads and CTOs must perform a cost-benefit analysis, weighing the price of the tools against the expected gains in developer speed, code quality, and reduced time-to-market. The return on investment (ROI) will depend heavily on how well a team integrates these tools into their workflow and whether the productivity gains justify the recurring expense.</p><h3>Theoretical Limitations</h3><p>Beyond the practical issues of hallucinations and biases, there is a more fundamental, formal limitation to what we can do automatically. This is captured by Rice's theorem, a cornerstone of theoretical computer science. In short, the theorem proves that there is no algorithm that can automatically check for any non-trivial semantic property of a program.</p><p>What does this mean in practice? A "non-trivial semantic property" is basically any interesting question about what a program <em>does</em>. For example: "Does this program ever crash?" or "Will this function always return a positive number?" or "Is this code free of security vulnerabilities?" Rice's theorem tells us that it is mathematically impossible to build a universal program that can answer these kinds of questions for every possible piece of code.</p><p>This highlights the theoretical impossibility of perfect, automated code verification. We will never be able to build an AI that can look at code generated by another AI (or a human) and formally guarantee that it does exactly what the natural language prompt intended. That problem is, in the general case, unsolvable. </p><p>However, this doesn't mean we should give up. Engineering isn't about theoretical perfection; it's about solving the average case in the best possible way and handling the most important edge cases reasonably well. While we can't achieve perfect verification, we can get pretty far with a combination of AI-generated tests, linters, and, most importantly, expert human review.</p><h2>The Future of Coding</h2><p>Given these tools and guardrails, the very nature of programming is set to transform. The focus will shift from the mechanics of writing code to the art of building systems. The term "vibe coding," popularized in developer communities, captures the essence of this shift. It describes a workflow where the developer's primary job is no longer to write precise, line-by-line syntax, but to describe the high-level behavior, intent, or "vibe" of the desired software to an AI partner. The focus moves from <em>how</em> to do something (the specific algorithm and syntax) to <em>what</em> needs to be done (the ultimate outcome and user experience), leaving the mechanical implementation details to the AI assistant.</p><p>This approach is incredibly powerful for rapid prototyping, hackathons, and short-term projects. A developer can quickly scaffold an entire application by describing its components in natural language, getting a functional prototype up and running in a fraction of the time it would take manually. </p><p>However, this method has significant limitations for larger, more detailed projects. "Vibe-based" instructions are often ambiguous and can be misinterpreted by the AI, leading to code that works for the happy path but fails on edge cases. For long-term, mission-critical software, the precision, maintainability, and strict adherence to architectural standards that come from deliberate, human-led coding remain indispensable. Vibe coding is a tool for speed and exploration, not a replacement for rigorous engineering.</p><p>In this new paradigm, future developers will become experts at wielding a suite of AI tools and agents. Skills in "prompt engineering," system design, and the critical review of AI output will become more valuable than the ability to recall specific syntax. The developer's role becomes one of guidance and orchestration, knowing which tool to use for which task and how to verify the results. </p><p>Looking ahead, this elevated role may involve assigning entire features or bug fixes to autonomous agents. These agents would manage the full lifecycle: understanding the ticket, writing the code, creating tests, committing to version control, and responding to feedback from the CI/CD pipeline. This doesn't eliminate the developer but elevates their role to that of a system architect and project manager, overseeing a team of AI agents.</p><p>Beyond the changes in workflow, it's worth contemplating how these tools will change the qualitative experience of being a developer. We must ask ourselves how it <em>feels</em> to code this way. Does offloading the cognitive burden of syntax and boilerplate make you dumber and cause you to forget how to code, or does it free up mental space, allowing you to become even more proficient in the things that truly matter&#8212;the high-level ideas and architecture?</p><p>We must also consider the social aspects. You now have a partner that is not a human. How will this impact teamwork? Will this AI partner become a virtual member of the team, participating in code reviews and design discussions? Or will it alienate developers into more lonely roles, as they interact more with their AI than with their human colleagues? How does a senior developer mentor a junior who can always get an instant answer from an AI, potentially masking gaps in their fundamental knowledge? </p><p>These are open questions we must navigate as we integrate these powerful new collaborators into our teams.</p><h2>Final Remarks </h2><p>So, Is Coding Dead?</p><p>There is a real concern that if AI can write 90% of the code in 10% of the time, nine out of ten programmers could be out of a job. And yes, every time automation has reached an existing industry, some jobs are destroyed as some skills become irrelevant.</p><p>However, I claim we must not fear the advent of AI coding assistants. Here&#8217;s why.</p><p><em>Writing code</em> is by far neither the hardest nor the most time-consuming part of software development. The process of making software involves understanding requirements, talking with customers, user testing, and product design, all of which are at least one order of magnitude more difficult that actually typing code. </p><p>A hundredfold boost in productivity for a task that is only 10% of the overall process is huge, but it still leaves the other 90% of the human-centric work. We will still need to understand what our customers want, guide them through designing a software product, know the user base, and find a sustainable business model. </p><p>And no, you cannot simply simulate the end user with a language model, so the AI can prompt itself into making a usable product, because your end user will still be human. Human users are slow, get angry easily, don't understand your application, and don't know what it is they don't like about it. Until an AI can really replicate what it feels like to be a human&#8212;and at that point, will we still call it &#8220;artificial&#8221;&#8212;we can&#8217;t take the human out of the software development loop&#8212;or any creative loop, for the matter.</p><p>The biggest progress in the software creation process has always been because of innovation in the human side, not the machine side. Innovation in software engineering, management, and how you get people to work together and collaborate will continue to be the most important part of the software pipeline for a long time.</p><p>Furthermore, software is an industry that is nowhere near its saturation point. We have far more need for software than the number of people who can currently write it. Increased productivity will likely be met with increased demand, creating more and better software for more users.</p><p>Every leap in software productivity&#8212;from assembly to compilers, from C to object-oriented frameworks&#8212;has lowered the barrier to entry and brought more people into programming. AI tools will likely do the same, empowering more people to create software. The modern world runs on software, and in the future, basic programming literacy may become as common as basic math literacy is today. </p><p>Most people know enough math to get by in daily life without hiring a mathematician, and in the same way, more people will know enough programming to automate simple tasks. They will learn to say to their home computer, "When I get home, I want you to turn my lights on, but only if it's night and the electric bill is not above the average," and an AI will generate the code to make it happen. This expands the field rather than shrinking it.</p><p>So, should you learn to code? Definitely. There's going to be orders of magnitude more code written in the next few years than everything we've written in history. </p><p>But even if you never end up writing a single line of code unaided by AI&#8212;like I've never written a single line of production code unaided by syntax highlighting, a linter, or a type verifier&#8212;knowing how code works, how algorithms work, and why a specific programming construction works the way it works is the same as knowing basic math. Coding changes how your brain is wired, makes you think clearer, and increases your creativity.</p><p>Furthermore, even if you are not working in the software industry, learning to code is still an immensely enjoyable experience. Being able to create something that keeps working on its own is, I think, the ultimate toy. </p><p>So if you want to make a dent in the software industry and you're wondering if AI will get you out of the picture, don't worry. That won't happen anytime soon. Learn to code, learn the fundamentals, but also learn how to use these new tools. As in every moment in human history, if you apply yourself and do your best, you will be at the top of the league, and there will be a spot for you.</p>]]></content:encoded></item></channel></rss>