Building Rollback Decision Criteria With AI Before the Page

The most expensive ten minutes of any deploy-related incident is the argument about whether to roll back. Half the bridge wants to revert immediately; the other half worries the rollback itself is risky because of a schema migration. Meanwhile the error rate climbs and nobody is actually deciding. I have lived this several times, and the pattern is always the same: the team is making a high-stakes call under stress with no agreed criteria. The fix is to define those criteria before the incident, and AI is a genuinely useful partner for drafting them.

Why the rollback decision is so hard

Rollback feels like it should be simple — something broke, undo it. In reality it is tangled. A rollback might require reversing a database migration that is not cleanly reversible. It might reintroduce the bug the deploy was meant to fix. It might be slower than rolling forward with a hotfix. And under incident pressure, nobody has the calm bandwidth to reason through all of that.

The result is paralysis or, worse, a panicked rollback that makes things worse. The antidote is pre-agreed decision criteria: a clear set of conditions under which you roll back, roll forward, or hold. AI helps you draft those conditions thoughtfully, while you are calm, so the in-the-moment call is mostly lookup rather than debate.

Drafting criteria for a service ahead of time

For each critical service, I sit down with a tool like Claude — or ChatGPT — and describe the deploy mechanism, whether migrations are involved, how reversible they are, and the typical failure modes. Then I ask it to draft a rollback decision matrix: under what conditions is rollback the right move, when is roll-forward safer, and what signals distinguish the two.

The model is good at surfacing considerations I would forget. It reminds me to account for in-flight transactions during the rollback window, to check whether feature flags offer a faster path than a full revert, and to define a hard “if error rate exceeds X for Y minutes, roll back regardless” tripwire. The output is a draft. My team reviews and owns it.

Pro Tip: Ask the model to write the criteria as if-then tripwires with explicit thresholds, not vague guidance. “Roll back if checkout error rate exceeds 5 percent for 3 minutes” is actionable at 3am. “Consider rolling back if things look bad” is not, and you will end up arguing anyway.

The two-way-door test

One framing AI helps me apply consistently is the reversibility test. I ask it to classify each rollback as a two-way door (cleanly reversible, low risk to attempt) or a one-way door (involves irreversible data changes, high stakes). Two-way-door rollbacks should have a low trigger threshold — when in doubt, revert. One-way-door rollbacks need much more deliberate criteria because the rollback itself is dangerous.

Having the model tag each service’s rollback this way, ahead of time, means that during the incident the commander already knows whether they are dealing with a quick safe revert or a careful, high-stakes maneuver. That knowledge alone removes most of the panic.

Where this lives: in the runbook

Rollback criteria are useless if they live in someone’s head. I bake the AI-drafted, human-reviewed criteria directly into each service’s runbook, right next to the actual rollback procedure. When the page fires, the on-call opens the runbook and finds both the decision tripwires and the steps. The decision becomes “check the tripwire, follow the procedure” instead of “convene a debate.”

For drafting and maintaining these, I keep templates in my prompt workspace so every service’s criteria follow the same structure. The prompts library has runbook-authoring prompts that work well as a starting point.

AI drafts criteria, humans pull the trigger

This is the boundary I will not move. AI drafts the decision criteria; humans make the rollback decision and execute it. The model helps you think clearly about thresholds ahead of time. It does not get to monitor error rates and trigger an automatic rollback on its own. That is an action with real consequences — a rollback at the wrong moment, or one that reverses a migration mid-flight, can turn a contained incident into a data-integrity disaster.

I am specifically wary of “AI-driven auto-rollback” pitches. Automated rollback on a deterministic, well-tested rule is fine — that is just engineering. But an LLM deciding to roll back production based on its own judgment is exactly the kind of autonomous action that should never exist. The model’s role ends at helping a human define and understand the criteria. The free AI Incident Response Assistant respects that line.

Rehearsing the decision

Criteria you have never exercised will fail under pressure. I use AI to generate rollback scenarios for game-day exercises — “deploy X is live, error rate is climbing on path Y, the migration is half-applied; what is your call?” — and the team walks through the decision using the documented criteria. The drills expose gaps in the criteria, which we then revise. AI generates the scenarios; humans make the practice calls.

This rehearsal is where the criteria earn their trust. The first time you apply a rollback tripwire should not be during a real Sev1.

Conclusion

The rollback decision is high stakes precisely because it happens under stress with no agreed rules. Define the rules first. Use AI to draft per-service decision criteria as explicit tripwires, classify each rollback by reversibility, and bake the human-reviewed result into your runbooks. Rehearse the decisions in game days. Then, when the page fires, keep the call and the execution firmly in human hands — let the model help you prepare, never let it pull the trigger. More deploy-incident tactics are in the incident-response category, and our prompt packs include runbook templates to build from.