Drafting Runbooks From Resolved Incidents With AI

Every incident leaves behind a perfect runbook draft and almost nobody writes it down. You just spent two hours figuring out that the order service wedges when the Redis connection pool saturates, that the symptom is a specific latency pattern, and that the fix is a precise sequence of commands. That knowledge is at its absolute freshest the moment you resolve the incident — and within 48 hours it’s already fading. Six months later, when it happens again to someone else, the whole investigation gets repeated from zero.

The fix is obvious and almost universally skipped: write the runbook while it’s fresh. The reason it’s skipped is just as obvious — you’re exhausted and the last thing you want to do is write documentation. This is where AI earns its place, by doing the first draft so the tired human only has to correct.

Why “write it later” never happens

Runbooks are a classic victim of the energy curve. At the moment you have the most knowledge, you have the least energy. By the time you’ve recovered, the details have gone fuzzy and the urgency has evaporated. So the runbook either never gets written or gets written badly from a half-remembered version of events.

AI breaks the energy barrier. If a model can produce a structured first draft from the incident’s own artifacts — the chat log, the commands you ran, the resolution — then your job drops from “write a runbook from scratch while exhausted” to “review and correct a draft,” which is a vastly smaller ask. Smaller asks actually get done.

Feeding the resolution into a draft

Right after resolution, while the channel is still warm, I gather the inputs: the incident channel transcript, the actual commands run (terminal history is gold here), and a couple of sentences of my own about what the real cause was. Then: “From these incident artifacts, draft a runbook with these sections — symptoms/how you’ll recognize this, quick diagnosis steps, the mitigation procedure with exact commands, how to verify it’s resolved, and when to escalate. Use only commands and steps that actually appear in the artifacts. Mark anything you’re unsure about with a TODO for human confirmation.”

The output is a structured skeleton populated with the real commands from the real incident. It’s not publishable, but it’s 80% of the tedious work, and the remaining 20% is judgment I’m happy to supply.

Pro Tip: Tell the model to mark every command it’s unsure about with an explicit TODO: verify rather than presenting it confidently. A runbook’s whole value is that a tired person trusts it at 3am — so an unverified command in a runbook is worse than no runbook at all. Surfacing the uncertainty is what keeps the document trustworthy.

The verification pass is the whole point

A runbook is a trust document. The next engineer will run its commands half-awake, under pressure, without questioning them — that’s literally what a runbook is for. Which means an AI-drafted runbook that ships unverified is a loaded gun. Every command has to be checked by a human who understands what it does and what it could do to production.

So my review is unglamorous and essential. I read every step. I confirm each command is correct and safe in context. I delete the model’s well-meaning additions that didn’t actually happen (it loves to add a “restart the service” step that nobody ran). I add the warnings only a human would think to add — “do NOT run this during peak hours,” “this assumes the replica is healthy; if it isn’t, escalate.” The model drafts; I own. That ownership is non-negotiable precisely because runbooks are designed to be trusted blindly.

What AI is genuinely better at here

Two things, actually. First, completeness of structure — humans writing tired tend to jot “restart redis, done” and skip the verification and escalation sections. The model reliably produces the full skeleton, prompting you to fill the parts you’d have lazily omitted. Second, clarity of language — turning your terse incident-speak into clear, ordered, imperative steps that someone unfamiliar with the system can follow. That readability gap is what separates a runbook that helps from one that confuses.

This is the same theme as building runbooks engineers actually trust, covered in depth across the incident-response category: trust comes from accuracy and clarity, and AI helps with clarity if you guard the accuracy.

Keeping runbooks alive

A runbook written once and never touched rots as the system changes. The lightweight discipline I like: every time an incident touches a system that already has a runbook, the resolving engineer does a quick AI-assisted diff — “here’s the existing runbook and here’s what actually happened this time; what’s now out of date?” That keeps the corpus current with far less effort than periodic manual audits, which never happen.

Tooling

Because runbooks are mostly code and commands, in-editor assistants like Cursor or GitHub Copilot fit well if your runbooks live in a repo. A terminal tool like Warp is handy for capturing the exact command history that becomes the draft’s backbone. For the drafting itself, Claude handles the structure-and-clarity work nicely, and the free AI Incident Response Assistant is built for this post-incident capture moment. Keep a standard runbook-drafting prompt in your prompt workspace so every responder produces the same shape.

The bottom line

The runbook you write tonight, while the incident is fresh, is worth ten you write from memory next month — and worth infinitely more than the one you never write at all. AI removes the exhaustion excuse by handling the first draft. Your job is the verification, and the verification is the part that makes the document trustworthy. AI for the tedious draft, humans for the trust. That’s the deal, and it turns “we should write that up” into something that actually happens.