How to Train your Chatbot - Chapter Zero

Announcing a new series on building practical AI applications

Oct 20, 2025

TL;DR: In this upcoming series, we'll build an autonomous LLM-based agent from scratch, focusing on the fundamentals. We'll go from zero to fully autonomous deep research, one feature at a time, staying as close to the metal as possible. Subscribe to receive all posts in your inbox for free.

three crumpled yellow papers on green surface surrounded by yellow lined papers — Photo by Volodymyr Hryshchenko on Unsplash

Have you looked at the latest AI breakthroughs—agents that can plan, code, and research—and thought, "That's cool and all but… how the hell do one actually build something like that?"

If you're just getting started with Python development, or even if you’re a seasoned expert, you've probably felt this gap too. Maybe you've played with the demos, called an API or two, and seen just enough of the magic to get excited.

But moving from just prompting an LLM to building intelligent, autonomous agents is a different beast entirely. Things start to break pretty soon as you stack up more and more complexity. And it seems every tutorial out there is oblivious to this. Anyone can show you how to code a simple chatbot maybe with some RAG, but very few resources take you from that humble beginnings to something that actually looks like a modern multi-agent system that works autonomously.

This series is my answer to that problem.

Over the next 10 or so posts—the next couple of months—we are going to build a comprehensive, autonomous AI agent. And we're going to do it incrementally.

We'll start with the simplest possible application: the basic chatbot loop that stores conversation history. Why? Well, because even if everyone has already seen this, we need proper foundations to build on top. We need to see the full flow—how messages are stored, how a system prompt works, how to interact with the LLM, and what's really happening under the hood.

But from there we'll level up pretty fast, adding new capabilities piece by piece until we have a personal agent capable of doing deep research and generate long, coherent reports all on its own.

Tools of the Trade

Now, you might be thinking, why build from scratch and not just use LangChain, or LlamaIndex, or some other well known framework?

Here's my thesis. When you start with those massive, professional frameworks, you learn more about the framework than you do about the fundamentals. Those tools are powerful, but they are also black boxes. They hide the logic and complexity behind layers of abstraction. You end up learning their way of doing things, their design patterns, and their API. And frameworks change, but fundamentals remain.

I believe it's far more valuable to learn how to do things as close to the metal as possible. Our goal is to write the least amount of boilerplate code possible, but—and this is the key—without hiding the any of core logic. We will build this entire thing ourselves, understanding every single piece of the puzzle.

We'll use a tech stack that gets out of our way. Only three tools:

For the UI we’ll use Streamlit. For those who don’t know it, Streamlit is a radically simple application framework that turns any Python script into a fully fledged web app. The key here is what it doesn't make you do: you write zero boilerplate, zero layout, zero presentation, and zero state management code. You just write core logic, and a good enough web UI appears.

For the actual chatbot we'll use the ARGO framework. I like to call it FastAPI for AI agents because it has that same lightweight, decorator-based feel. It is the simplest, most Pythonic agent framework you’ve seen, and I promise you’ll fall in love with it. Despite being super simple, it forces you to build the logic using a very clean and lightweight skills-based pattern for modularizing our agent's intelligence.

And for data we’ll BeaverDB. This is a lightweight wrapper on top of SQLite that gives you everything you could ever want from a modern database. But it is not a modern database server. It's just a single, embedded binary file on your disk. There's no Docker, no authentication, no connection strings, no schemas, no boilerplate. Yet, on top of this simplicity, BeaverDB gives us a very Pythonic, very comfortable API for a document database with vector and full text search, and cool features like persistent dictionaries, lists, priority queues, and even pub/sub. It makes building something like a RAG pipeline extremely simple without any complex setup.

That's it. With just these three basic libraries, we will build a fully autonomous agent.

The Journey Ahead

Here's the roadmap I've thought so far. We'll add a new level of intelligence in each post.

Level 0: The Conversationalist. The foundation. A simple Streamlit UI with a stateful chat loop that understands conversation history and system prompts.
Level 1: The Assistant. We'll give our agent long-term memory. We'll build a full Retrieval-Augmented Generation (RAG) pipeline from scratch using BeaverDB so it can answer questions based on a private knowledge base.
Level 2: The Researcher. We'll break our agent out of its box and give it a tool to search the web, allowing it to access up-to-the-minute information.
Level 3: The Analyst. This is where it gets really fun. We'll give the agent the ability to write and execute its own Python code in a safe sandbox to solve problems, analyze data, and even debug its own mistakes.
Level 4: The Editor. We'll upgrade our UI from a simple chat to a persistent, shared canvas, where you and the agent can work side-by-side to co-write documents and refine plans.
Level 5: The Scientist. The finale. We'll integrate all previous capabilities, giving the agent a master prompt that allows it to take a high-level goal, create a multi-step plan, and execute it independently from start to finish.

So here is my promise. It may take us more than 5 posts, but we are going to learn how to build a chatbot by building a chatbot, step by step. No unnecessary theory, just working code and clear intuitions.

I'll be providing a public GitHub repository with all the code, and each post will be a direct, straightforward Python script that you can type in half an hour.

When you're done with this series, it won't matter if you want to keep using ARGO and BeaverDB—which I hope you will—or move on to more “professional” tools. You'll be ready anyway, because these tools are so thin, what you'll have learned isn't how to use a specific library. You will have learned how the logic works underneath.

You'll understand how to orchestrate the different components involved in a typical chatbot, how to modularize the agent's brain, and more generally, how to think about building capable, intelligent, LLM-based applications. That's the real takeaway.

In the next post, we'll jump straight into Level 0: The Conversationalist. So hit that subscribe button and you'll get all posts in your inbox, 100% free.

Ready to build?

Leo

Oct 20

Is this series an update or a reposting of your previous series on "Mostly Harmless Ideas"? Appreciated the first set.

Expand full comment

2 replies by Alejandro Piad Morffis and others

Rodolphe Nogbou

Thank You.

I will always keep you in mind for future projects.

It's exciting to see that, standard computing devices will yield the way to personalized (unique configuration) devices.

That is, there will never be two same smartphones or computers again; each one will be configure to the "genetic fingerprint" (personal preferences) of the owner or user.

Thank You Very Much.

See You Later.

Sincerely,

2 more comments...

Discussion about this post

Ready for more?