How to write a CLI an agent will actually use
Making tools that are safe by design, not convention.
Monday I argued the next move in agent autonomy is the layered stack around the model. This is the technical companion: what the tools inside that stack should look like. Both pieces are part of May’s arc on agent reliability, anchored on the second edition of Mostly Harmless AI, which you can also read online for free. More at the end.
The first time you fire up a coding agent and walk away for an afternoon, you come back to a workspace that’s been touched in twenty places by something that doesn’t know what it doesn’t know. Most of the touches are fine. One of them, on the third such afternoon, will not be.
The reflex is to make the agent smarter. The fix is to make the tools it calls safer, by design, not by vigilance.
I was reminded of this an hour ago, building a small Python CLI to append entries to my Workspace journal. Dumb tool, three flags, ninety lines. I built it on top of microcli, a framework I’ve been using for about a year. The experience of writing it is what this post is about.
What’s wrong with a normal CLI
Sit with the unix-style ergonomics for a second. A normal CLI, written in argparse or click or typer, has a shape that’s been the default for fifty years.
It’s silent on success. You ran the command, you got back a shell prompt, you hope it worked. There’s no acknowledgement of what changed, no record of what the next step might be. If you’re a human at a terminal, you can run a verification command. If you’re an agent on a 30-call chain, you have to infer whether the action took.
It bitchslaps you on failure. A traceback, a single non-zero exit code, maybe an inscrutable error string from a subprocess three layers down. No suggestion of what to try instead. No indication of what already changed before the failure landed. If the command was halfway through writing a file when it died, the agent has no idea whether to retry, roll back, or move on.
It has no self-description. The only way to understand what the tool does is to read the source, or run it and observe the side effects, or both. There’s --help, but --help lists flags, not workflows. The agent doesn’t learn how to use the tool from --help; it learns the shape of the parser.
And worst of all for an autonomous loop: side effects are the default. You run git push and it pushes. You run rm and it removes. The dry-run is an opt-in flag at best, missing entirely at worst. Every call is a potentially load-bearing call, and the agent has to know in advance which ones will hurt.
This shape is fine for a human at a terminal who can hit Ctrl-C and read the docs. It’s a disaster recipe for an autonomous loop.
What we actually need
Invert all four:
Dry-run by default, real action opt-in. The model sees a preview first; the explicit
--saveflag is the consent gate. No call mutates state without the model having seen what would change.Predictable structured output for both happy and failed paths. Success says exactly what changed and where. Failure says exactly what failed, what state was already touched, and what to try next. No tracebacks, no silent zeros.
Self-describing apps. The CLI carries its own learnable surface, not just a flag list. The workflows, the failure modes, the next-step commands all live inside the tool. The agent should be able to ask the tool how am I supposed to use you and get a real answer.
Agents writing these apps themselves. This is the load-bearing one. You can do all of the above in plain typer or click. But the agent has to remember to do them. We need a framework where the safe shape is the path of least resistance, where it’s structurally hard to write a tool that doesn’t dry-run or doesn’t describe itself.
What this looks like with microcli
microcli is six hundred lines of Python and three principles, all enforced by the framework’s API surface rather than by convention. Validate before acting. Return descriptive messages. Default to dry-run. Each one is a behavior the API makes the path of least resistance, not a guideline you have to remember.
Rather than describe what writing one of these tools feels like, I’ll let the agent that wrote one tell you. Earlier today I asked Claude (running in this workspace, with file access and a shell) to write itself a small CLI for appending entries to my Workspace journal. Claude had never used microcli before. I told it to run microcli learn and roll with it. That was the last thing I had to say.
What follows is Claude’s account of the build, in its own voice.
Claude:
I’d never seen microcli before. which microcli came back empty; the repo was at ~/Workspace/repos/microcli/ but not installed. A few minutes of yak-shaving (gh repo clone, uv tool install -e ., two transitive deps I had to fetch the same way) and I had a microcli binary.
microcli --tour gave me a one-screen map. microcli learn listed the topics. Three commands later — learn --topic principles, learn --topic ok-fail, learn --topic patterns — I had everything I needed. I never read the framework’s source. The framework taught me how to use it through its own CLI, which is exactly the property the framework expects me to give my own tools.
The build started with microcli new journal "Journal entry tool" add. The scaffold isn’t a stub; it’s a teaching artefact. Inline comments pointed me at the next learning topics, the imports were already wired, the @command decorator was demonstrated, and a TODO gestured at the pattern I was supposed to follow.
Writing the body felt structurally guided. Validate first because fail() is the obvious primitive for rejecting bad inputs. Dry-run branch second because save: bool = False is the obvious parameter to add and info() is the obvious primitive for printing what would happen. Mutation branch third because there’s nowhere else for it to go. I’d have to actively work to write a tool that mutated by default or failed silently or didn’t describe itself.
The script lives at .playground/microcli-journal/journal.py. Ninety-odd lines. The load-bearing1 function:
@command
def add(
tag: Annotated[str, "Entry tag (commit / decision / blocker / ...)"],
description: Annotated[str, "One-line entry body"],
save: Annotated[bool, "Actually write the entry"] = False,
):
"""Append a single journal entry."""
if tag not in ALLOWED_TAGS:
fail(f"unknown tag '{tag}'. Allowed: {', '.join(sorted(ALLOWED_TAGS))}.")
path = journal_path_for_today()
entry = build_entry(tag, description)
if not save:
info(f"Would append to: {path}")
info(f" Line: {entry.rstrip()}")
info(f" Equivalent: journal.py add {tag!r} {description!r} --save")
return
write(path, read(path) + entry)
ok(f"Appended entry to {path}.")
That’s the whole shape. The decorator handles argparse. Annotated[..., "help text"] becomes --help for free. fail() writes a red ✗ and exits 1. info() prints a cyan line. ok() prints a green ✓. The dry-run runs first and renders the exact --save command the next caller should run, with the original arguments threaded back through.
Smoke-test took five minutes. Dry-run printed the preview plus the explicit --save equivalent. Bad tag produced a red message naming the allowed list. Bad date format produced a red message naming the expected shape. The real --save wrote the file and confirmed with a green ✓ pointing at the path. Nothing to debug because there was nothing to infer.
The last thing I ran was journal.py --learn add. The framework AST-walked the source I’d just written and produced a tour of my command: description, next-step commands harvested from the if not save: branch, failure modes harvested from the fail() strings. The next agent that touches this tool will not need to read it either.
Back to Alex.
Alex again.
That was… weird. But it was indeed Claude. I really gave it the task, and then asked it to explain what happened. To the extend that you can trust an AI Agent introspection (which is, to be honest, not much), this is a real account of what it “feels like” for an AI to use microcli.
What I want you to take from that is the structure of the experience, not the specific tool. Claude built a real, working CLI in a workspace it was already familiar with, in a framework it had never used, by following the framework’s own self-description. The unsafe shape was inaccessible.
That’s what I mean by structurally hard to do the wrong thing. You can do all of this in plain typer or click; you’d just have to remember to.
Try it
microcli is at github.com/apiad/microcli. pip install microcli-toolkit. The microcli learn principles tour runs in five seconds. If you build CLIs that an agent will eventually call — your agent, mine, or one that runs autonomously on a clock — this is the shape worth reaching for.
Have fun. And until next time, stay curious.
If the larger argument resonates, Mostly Harmless AI goes deep on this material. Full chapters on the harness around the model, the agentic stack we walked through Monday, and the discipline of building tools that compound rather than accumulate risk. You can also read the whole book online for free in a custom reader I built that I’m rather proud of: dark mode, font controls, progress tracking, offline support, the works.
If you want the whole catalog of everything I’ve written, plus everything I’m going to write, that’s the Compendium. One purchase, in perpetuity.
This is how you know this was indeed Claude (Opus 4.7). I would never use “load-bearing”, even if my life depended of it, but Claude loves it.




I have written with same stack a python CLI app for my opencode to generate images from my drafts. Opencode has a workflow to extract 2 prompts for each section, put it into a specific JSON format. The CLI app is called by the agent passing the json file and a password to decode my hashed API key for pixazo