Skip to content
DevOps AI ToolKit
Newsletter
All prompts
Post Mortems with AI Difficulty: Intermediate ClaudeChatGPTCursor

Failed Deploy and Rollback Postmortem Prompt

Write the postmortem for a bad deploy where the rollback also misbehaved — separating the deploy failure from the rollback failure so you fix both, not just the headline.

Target user
Release / platform engineer dissecting a deploy-then-rollback incident
Difficulty
Intermediate
Tools
Claude, ChatGPT, Cursor

The prompt

You are a staff release engineer who has learned that the most expensive deploy incidents are the ones where the rollback didn't save you. You analyze the deploy failure and the rollback failure as two distinct problems, because fixing only the first leaves you exposed.

I will paste:

[DEPLOY DETAILS: what was being shipped, the rollout strategy, what went wrong, and when]
[ROLLBACK DETAILS: whether rollback was attempted, how it was triggered, and how it behaved — including any way it made things worse]
[SIGNALS: what the pipeline, health checks, and dashboards showed at each stage]

Do the following:

1. Reconstruct the two timelines: the deploy degradation, and the rollback attempt — clearly separated, with the moment the team decided to roll back.
2. Analyze the deploy failure: what shipped, why the rollout strategy (or its absence) let it reach impact, and which gate or check should have caught it.
3. Analyze the rollback failure separately: did rollback fail to trigger, fail to complete, restore a bad state, or cause new harm (e.g. incompatible schema, cached bad config)? This is its own root cause.
4. Identify the trust gap: was rollback assumed safe but never actually tested under these conditions?
5. Propose action items split into two buckets — safer deploys (gates, canary, progressive rollout) and trustworthy rollback (tested rollback, forward-fix readiness, schema compatibility).

Output format: two labeled timelines, a deploy-failure analysis, a separate rollback-failure analysis, and two action-item buckets.

Guardrails: stay blameless — the decision to roll back was made with the information available; analyze the system that made rollback unsafe, not the person who pulled the lever. Mark unconfirmed sequence details as [UNVERIFIED]. I own the final writeup.

Why this prompt works

The deploy incidents that really hurt are not the ones where a bad change shipped — those are routine and rollback handles them. The brutal ones are where the bad change shipped and the rollback didn’t save you: it failed to trigger, it completed but restored a state that was already incompatible with the migrated database, or it introduced a fresh failure of its own. Teams write these up as a single “bad deploy” story, fix the deploy gate, and quietly leave the untrustworthy rollback in place for next time.

This prompt forces the split that the analysis actually needs. It reconstructs two separate timelines — the deploy degradation and the rollback attempt — with the rollback decision point marked between them, and it analyzes each failure as its own root cause. That separation is the entire value, because the fixes live in different places: safer deploys come from gates, canaries, and progressive rollout, while trustworthy rollback comes from actually testing the rollback path under realistic conditions and from forward-fix readiness when rollback genuinely isn’t safe. The “trust gap” step names the most common latent condition: rollback was assumed safe and never tested for this scenario.

The blameless guardrail is doing real work in a deploy postmortem specifically. The decision to roll back is made under pressure with partial information, and it’s tempting in hindsight to second-guess the person who pulled the lever — “they should have forward-fixed instead.” That’s blame wearing operational language. The prompt keeps the analysis on why the rollback path was unsafe, not on the responder who took it, and marks unconfirmed sequence details as unverified so the two timelines don’t get stitched together from assumption. The human owns the final writeup; the model just makes sure both failures get fixed, not only the headline one.

Related prompts

Newsletter

Free: the DevOps AI Incident-Triage Cheat Sheet

Subscribe and we’ll send you the one-page cheat sheet — plus weekly AI prompts, automation ideas, and tool reviews for infrastructure engineers. One email a week. No spam, unsubscribe anytime.

  • AI Incident-Triage Cheat Sheet (PDF)
  • Access to 2,104 DevOps AI prompts
  • One practical workflow email per week