AI-Driven Storytelling with Multi-Agent LLMs - Part III
Closing the Loop with Planning and Structure
In the first two parts of this series, we explored the fascinating, chaotic world of emergent storytelling. We saw how complex narratives can arise from simple, "bottom-up" rules in Part I, and how LLM-powered agents can co-create stories through dynamic, unpredictable interaction in Part II. It’s a world of digital improv, where the story finds its own way.
But that’s only half the picture.
Those bottom-up approaches excel at creating novelty and believable micro-interactions. They are fantastic at answering "What happens next?" But what about the "top-down"? What about the grand narrative arc, the deliberate plot structure, and the long-term coherence that defines the stories we remember? This is where even the most advanced Large Language Models (LLMs) stumble. They are masters of prose but poor architects. They can write a beautiful paragraph, but struggle to build a cathedral.
In this final installment of the series, we close the loop. We'll explore a complementary, top-down approach based on another excellent thesis I supervised this year, this one by Roger Fuentes RodrÃguez. His work focuses on high-level planning and structure. I'll argue that this architectural approach doesn't just produce better stories; it provides a powerful model for building more robust and governable AI systems.
LLMs are Brilliant Amnesiacs
Let’s backtrack to the root of the problem we’ve been discussing the past two weeks. If you've ever had a long, meandering conversation with an LLM, you've likely seen it happen. After a while, it starts to forget key details from the beginning of the chat. This isn't a bug; it's a fundamental feature of their design.
Think of it as a brilliant mind with no long-term memory. An LLM can only "see" the last few thousand words (tokens) of a conversation. And even in LLMs with Everything before that effectively ceases to exist. For short tasks, this is fine. For writing a novel, it's a disaster.
This limitation leads to critical failures in long-form storytelling:
Character Amnesia: A hero who is terrified of spiders on page 5 suddenly keeps one as a pet on page 50.
Plot Holes: The magical sword that can only be wielded by the pure of heart is inexplicably used by the villain to open a tin of beans.
Structural Breakdown: The story loses its narrative drive. The rising action plateaus, the climax never quite lands, and the resolution feels unearned, violating foundational structures like Freytag's Pyramid.
The core challenge is this: a story is not a linear sequence of words. It's a complex, interconnected web of causal relationships, character motivations, and thematic consistency. A monolithic LLM, with its limited memory, simply can't manage this web on its own. It needs an architecture.
The Writer's Room
Instead of relying on a single, all-knowing AI, the thesis proposes a "divide and conquer" strategy. The system's core is a central story blueprint, which a team of specialized AI agents collaborates on. It’s less like a single author and more like a Hollywood writer's room.
The Story Blueprint
At the heart of this system is a Directed Acyclic Graph (DAG). Forget the jargon for a second; think of it as the story's complete, interconnected timeline and causal web, all mapped out on a giant whiteboard.
Nodes are Events: Each point on the whiteboard is a specific event in the story. Not prose, but a structured object with the characters that participate, and all the necessary metadata.
Edges are Causality: The arrows connecting the events represent cause-and-effect. An arrow from "Hero finds the key" to "Hero opens the chest" means the first event must happen before the second. This simple rule makes temporal paradoxes and plot holes structurally impossible. You can't use the key before you find it.
This graph is the long-term memory the LLM lacks. It is the single source of truth that codifies the story's logic, ensuring every part is connected to the whole.
The Writers
The agents in this system aren't working in isolation. They are all reading from, and writing to, that central graph.
The Architect: This agent builds the initial skeleton of the graph. It takes the user's high-level prompt and lays out the main plot points—the inciting incident, the major turning points, the climax—as the first nodes on the graph.
The World Builder: This agent is the lore master. It goes through the graph and enriches the event nodes with crucial details: defining the characters, describing the locations, and specifying the properties of important objects.
The Drama Coach: This agent's job is to make the story interesting. It analyzes the graph's structure to find flat or boring sequences. It then adds or modifies nodes to inject conflict, suspense, or character development. It asks, "Wouldn't it be more interesting if the hero's mentor betrayed them at this point?" and adds that event to the graph.
The Dependency Manager: This is the ultimate fact-checker. It constantly validates the graph, ensuring there are no paradoxes or broken rules. It checks things like, "Does the character have the required item from a previous node before attempting this action?" or "Is this supposedly dead character trying to speak?"
The Narrator: Only when the graph is complete, enriched, and validated does this agent step in. It performs a "topological sort" of the graph (reading the events in a valid causal order) and, one node at a time, uses an LLM to translate each structured event into compelling prose, feeding it only the context it needs for that specific scene.
The Proof is in the Plot
Does this actually work? In short, yes.
The evaluation for the thesis involved having real users interact with the system via a Telegram bot and compare its output to that of a monolithic LLM. The stories generated by this structured, multi-agent system were consistently rated higher in structural coherence and narrative depth. The system excelled at maintaining consistent character motivations, building a more detailed and believable world, and avoiding the plot holes that plague simpler approaches.
But while this approach has clearly many strengths, it currently lacks in precisely what the previous articles excel: emergence and interaction. So, that’s our next step.
Unifying Top-Down and Bottom-Up
So where do we go from here? The clear path forward is to build a unified theory of AI storytelling, combining the top-down planning from this article with the bottom-up emergence and interaction from the previous two.
Imagine a hybrid system with two layers operating at once:
The Macro-Narrative (Top-Down): The story graph we've discussed acts as the "grand narrative," defining the key plot beats that must happen for the story to be satisfying.
The Micro-Narrative (Bottom-Up): Within each scene (each node of the graph), we unleash the autonomous, LLM-powered characters from the previous two articles. They have their own goals, personalities, and memories, and they interact freely, creating emergent and unpredictable dialogue and actions.
The magic lies in connecting these two layers. We reintroduce a crucial agent: the Director (or God/Game Master). The Director's job is not to puppet the characters. Instead, it subtly nudges the simulation. It knows the next required beat in the story graph is "The hero must discover the secret map." It can't force the hero to look for it, but it can introduce an NPC who mentions a rumor, make a book fall off a shelf to reveal a hidden compartment, or create a sudden downpour that forces the characters to take shelter in the very cave where the map is hidden.
This creates the best of both worlds: the structural integrity of a planned narrative, combined with the organic, believable, and often surprising behavior of autonomous agents. It’s the holy grail: a story that is both well-plotted and truly alive.
Beyond Storytelling as a Toy
Let's be clear: the goal of this research is not to automate creativity or replace human authors. The real value of computational storytelling lies elsewhere. It serves as the perfect playground for tackling some of the most critical open problems in Artificial Intelligence.
A story is a microcosm of our complex world. Forcing an AI to generate a coherent one is an extreme stress test for its most important faculties:
Reasoning: A good story is a monumental feat of causal reasoning. Characters must have consistent motivations, actions must have logical consequences, and plot threads must resolve. Maintaining this web of dependencies is a powerful way to measure and improve an AI's ability to reason in non-formal, unstructured, but still challenging scenarios.
Governance & Safety: Think of our writer's room architecture. The Dependency Agent acts as a safety and ethics system, enforcing the rules of the world. The Drama Coach agent governs the narrative, steering it towards a desired outcome (an "interesting" story) without violating core constraints. This is a perfect sandbox for studying AI alignment: how do we build systems that can pursue complex goals while adhering to a set of inviolable rules?
For decades, AI research advanced by mastering abstract games like Chess and Go. The breakthroughs required to win those games, particularly in deep learning, didn't just stay in the game. They became foundational for solving real-world scientific problems, most famously protein folding with AlphaFold.
I argue that storytelling is the next grand challenge for conversational AI. It's a game with infinitely more complex rules than Go, one that involves social dynamics, common sense, and long-term planning. The novel architectures we must invent to "master" storytelling—systems that can plan, reason, and govern themselves—could be the key to unlocking the next generation of safer, more robust, and more capable Artificial Intelligence.
And if you liked this article, feel free to check the full thesis (in Spanish but AI can translate it pretty well) and the repository to read some of the generated stories.
Audio stops at one minute