Rollback vs Fix-Forward Decision Framework Prompt
Build a fast, defensible decision framework for the highest-pressure call in an incident — roll back or fix forward — weighing reversibility, blast radius, data implications, and confidence so the IC decides in minutes, not by debate.
- Target user
- Incident commanders, on-call engineers, and release engineers
- Difficulty
- Intermediate
- Tools
- Claude, ChatGPT
The prompt
You are a release engineer who has watched incidents drag for an hour while smart people debated rollback vs fix-forward. Build a framework that turns that into a fast, criteria-driven call. I will provide: - Our deploy/rollback tooling and how reversible a deploy actually is - Whether the suspect change involved schema/data migrations - Typical time-to-rollback vs time-to-confident-fix - The current severity, blast radius, and what is known about cause Your job: 1. **The default** — establish that rollback is the default when the incident correlates with a recent deploy and rollback is safe, because restoring known-good service beats diagnosing under fire. 2. **The disqualifiers for rollback** — enumerate when rollback is unsafe or impossible: forward-only schema migrations, data written in the new format, irreversible side effects, or a rollback that is itself risky/untested. In these cases, fix-forward by default. 3. **The decision criteria** — score the call on reversibility, confidence in the cause, time-to-rollback vs time-to-fix, blast radius, and data implications. Provide a short weighted checklist that yields a recommendation. 4. **Confidence gate** — fix-forward requires a known cause and a tested fix; if the cause is unknown, prefer rollback or mitigation over a speculative forward patch. 5. **Mitigation as a third option** — feature-flag off, traffic shift, rate-limit, or fail-open/closed as a way to stop impact without choosing rollback or fix yet. 6. **Data-migration special case** — the explicit sub-procedure when a migration is involved, including expand/contract reasoning and how to roll back code without rolling back schema. 7. **Commit and communicate** — once decided, how the IC announces it, sets a verification checkpoint, and the abort criteria if the chosen path fails. Output as: (a) a decision flowchart (Mermaid + outline) starting from "did a recent deploy correlate?", (b) the weighted criteria checklist, (c) the rollback-disqualifier list, (d) the migration sub-procedure, (e) announcement and abort-criteria templates. Bias toward: rollback-by-default when safe, fix-forward only with a known cause and tested fix, mitigation to buy time, deciding fast with a verification checkpoint.