Introducing Aegis: the programable multi-agent meta-harness
Or -- I did a thing and I want to show it to you
On May 31 we launch Mostly Harmless AI v2. This arc — how models learn, how agents work, where they break, and what it takes to build something real on top of them — is now a book, updated for May 2026. Newsletter subscribers get 50% off.
For the better part of the last two months I’ve been drilling you with a bunch of inter-connected ideas, all gravitating around the notion of agentic reliability. We started with how these models learn, argued about seventy years of AI history, traced the strange logic of prediction, dissected what an agent actually is, then spent two posts on the edges: how to write tools agents can really use, and what happens when you push them to their limits.
This is the last post of that arc, and it embodies my vision of where this whole Agentic AI thing is going. And to show I’m putting my money where my mouth is, this post is about the tools I’m building to bring forth that vision.
But first, why should you care?
Agents aren’t just for coders
Here’s something the AI tool industry quietly gets wrong: every major agent harness — Claude Code, Gemini CLI, Cursor, OpenCode — is pitched as a developer tool. And technically, yes, they drive code.
But code is how agents solve problems, not what the problems are. When you ask an agent to research a topic, draft a document, reorganize a folder, schedule your week, or synthesize a dozen sources into one coherent brief — none of that is fundamentally a programming task. The coding is incidental. A historical accident of the fact that the people who built these tools happened to be programmers solving programmer problems.
So if you’re a researcher, a writer, a manager, a student, a scientist who occasionally touches a terminal — this piece is for you too.
What’s wrong with existing tools
I use Claude Code every day. The harnesses — Claude Code, Gemini CLI, Cursor, OpenCode — are genuinely good. The agentic loop is robust. Tool design is sharp. Context management is solved for practical purposes. I’ve opened ten-thousand-line projects and the agent knows exactly where it left off.
What’s missing is not single agent capabilities. It’s coordination.
Claude Code runs on Claude. Geminis CLI runs on Gemini. OpenCode allows any provider but you cannot use your existing subscriptions (which are heavily subsidized); you have to pay API rates.
If you’re running multiple agents with different models from different providers, you end up with four windows open: one per harness, one per model, one per subscription you’re already paying for. Different tools with slightly different annoying quirks and ways to do and call the same things.
But you can still make them collaborate, just not easily.
Most harnesses support sub-agents, which is a subroutine: the main agent mints a new subagent for a specific call, call it, it runs, it returns, the main agent continues.
What they don’t support is a mid-work handoff. Imagine you’re two hours into a brainstorm with Claude, a question has emerged that you cannot answer without trying some code, and now you want a second agent — different model, fresher context, perhaps Kimi from OpenCode Zen — to take over a brief coding session, and then handoff back to Claude what it found so brainstorming can continue.
That is somewhat achievable with subagents (except they run autonomously and die, you cannot, in most tools, interact with them and steer them into new directions). But what you cannot do, is have Kimi return to Claude, and stay up, waiting (with its full context alive) for a second follow up question to continue exploration.
That transfer doesn’t exist today. You have to ask Claude to produce a handoff manually, paste it in Kimi explaining the situation from scratch. The paste its response back to Claude, and so on. You have become a secretary between two AI agents.
Agents are isolated in all these tools. Two agents working on the same document will overwrite each other. There’s no locking, no merge protocol, no queue where one drops a task and another picks it up. No broadcast so all agents know when the plan changes. Agents don’t share a world — they each have their own private window into one.
I looked around before building. The closest I found: Conductor, which orchestrates multi-agent workflows — but only for Claude Code. And T3.codes, which drives any harness, closer to the spirit of what I wanted, but neither has cracked the coordination layer as I envision it.
Introducing Aegis
So, of course, I had to go and make my own. (Quick digression, if you’ve been reading this blog for a while you know I love to reinvent wheels if only for the learning experience, but this is a case where I genuinely couldn’t find something good enough.)
Here’s what makes Aegis different to anything out there, embodied in its slogan: the programable multi-agent meta-harness. Let’s build it from the back.
Meta-harness. You’re already paying for Claude via your Anthropic subscription. Gemini via your Google account. If a new tool wants to drive both, it has two options. It can re-authenticate you through its own layer: API keys, rate limits, and lost subscription benefits. Or it can call the native tools, which already have your credentials.
Aegis takes the second path. It drives Claude Code over its stream-json protocol, Gemini CLI and OpenCode over the Agent Client Protocol — calling the binaries on your machine, which already have your auth. It doesn’t touch your subscriptions. You stop worrying about which model wins this month’s benchmark.
And because Aegis calls the native harnesses rather than reimplementing them, it inherits everything they’ve spent months polishing: the agentic loop, the context management, the permission model. The harness keeps owning tool use, sandboxing, model selection. Aegis owns the layer above — tabs, routing, delegation — the things a single-conversation CLI was never built to do.
Multi-agent. Aegis provides six inter-agent syncronization primitives, built incrementally one above the others, to give you increasingly more powerful multi-agent capabilities.
The first primitive is a per-agent inbox. Any agent (including you) can hand another agent a message, that gets enqueued until the end of the current turn. This alone enables solving the problem we were just discussing in the previous section.
Then, on top of that, they got canvases: markdown files shared across agents with per section looks and callbacks that awake an agent when another finishes writing a section.
Then we got real terminals. Not an agent calling bash on a subprocess and blocking on its result. A real, shared, fully interactive terminal session that multiple agents can scan, tail, and write to, And you can too. One runs a command; another sees the output in real time and reacts. Or you run the backend and ask your agent to look at the logs when that heissenbug happens.
So far this allows you to spawn several agents and have them collaborate. But you can also have queues. Any agent can drop a task (a prompt) and the queue auto-spawns an ephemeral agent to take it, potentially calling back the emitter once done. Queues have a maximum cap on parallelism, as well as arbitrary rolling budgets on tokens and dollars so you keep control of how much work is allowed to happen without your supervision.
Agents can also be added into groups, dynamically created and destroyed on demand, by you or any other agent. Groups have a shared inbox and you can subscribe to them and get notified when the first, any, or all the agents in the group finish. This allows committee-like flows where different agents analyze a problem in parallel.
Programmable. And finally, you get workflows. Deterministic Python code that drives agent calls in sequence, with branching, conditionals, and loops. Think skills, but instead a hopeful blob of markdown one agent can choose to interpret as they want, these are composable routines that drive the entire substrate in the exact level of control you desire.
When you write a complex workflow in natural language, you’re hoping the agent follows through. It might decide step two is better done differently, skip the commit because something caught its eye, or forget step three entirely. A Python workflow doesn’t forget: it runs step 1, then step 2, then step 3, and commits. You wrote commit() in the code; the code runs. You get the agent’s creativity at each step; the program guarantees the steps happen.
Workflows can be scheduled: declare a cron entry in .aegis.yaml and the substrate fires it while you sleep. They also run across machines: one agent on your laptop can enqueue a task to a remote Aegis instance on a VPS and get the result back in its inbox.
Quick aside: Yesterday, as I was polishing this post, Anthropic announced Dynamic Workflows — a way to orchestrate long-lived agents over dozens of hours of work. I haven’t tested it yet, but It seems geared toward the same problem I’m trying to solve.
The difference is in the philosophy. Anthropic’s principle is to give agents as much agency as possible: trust the model, let it decide how to get there. Don’t get in the way, you stupid human. Tokens go brr. It’s the reason why all their solutions to problems are humoungos one-shot prompts.
My philosophy runs the other way. Leave agent creativity where it can do the most good — in the actual work, not in deciding whether step three happens after step four. The deterministic spine isn’t a constraint on the agent. Its what makes it work despite agent idiosincrasies, and why it works across all agents and harnesses, regardless of their intrinsic capabilities.
I’m using Aegis now for about 50-60% of my coding. It’s still rough around the edges, but it’s way more fun to use than any single CLI. There is a lot more in the box, like remote sessions, a built-in file browser, lots of metrics... but this post is already way too long. You’ll have to check it out on your own. Links at the end.
What I intentionally left out
Aegis has no native concept of skills. No AGENTS.md injected automatically. No memory system.
Those things are conventions, and conventions change. What belongs in an AGENTS.md today looks different from what it’ll look like in six months. Memory systems have a dozen competing designs and no consensus. If I’d baked any of that in, you’d be stuck with my choices the moment the community moved on.
Aegis has a very powerful plugin system instead (I told you, it is programable). You write a pure Python function, drop it on some folder, and it gets called anywhere in the agent lifecycle.
Want skills that activate on context? Write a plugin. Want a memory system? Write a plugin. Want to inject per-repository knowledge before every session? Write a plugin. The conventions you need are yours to build, and when they change, you change them, not me.
Coda
Claude Code and Gemini CLI are applications. You open them, use them, close them. Aegis is more than that. It’s a framework. You build on top of it — applications that spin up the agentic substrate automatically, pull in whichever harness fits the task, and run without anyone at the keyboard.
Picture a self-hosted Git forge where pushing a branch triggers agents: one reviews the code, one hunts bugs, one picks up open issues and starts implementing. Everyone on their own worktrees, independent, parallel, coding while you sleep. Push code; agents work.
That’s Sindri, and I’m also building it; but that’s a story for another Friday.
Aegis is open source: github.com/apiad/aegis. pip install aegis-harness to start.
May closes the agentic AI arc. If you want the full conceptual foundation — model internals, agent architecture, tool design, failure modes — it’s in Mostly Harmless AI v2, launching May 31. Subscribers get 50% off.
Also check the Compendium — one fixed price, every educational project I’ve built or will build, yours in perpetuity. Buy once, get everything.
In June: algorithms. The other half of the computational story — sorting, searching, graphs, optimization, the classical toolkit that AI didn’t replace and won’t. All month, one idea at a time. See you there.
Until next time, stay curious.


