Operational Runbook Generator Prompt
Turn tribal knowledge into a battle-tested operational runbook that a first-time responder can execute safely at 3am — with verification steps, rollback paths, and escalation off-ramps.
- Target user
- On-call engineers and platform teams documenting operational procedures
- Difficulty
- Intermediate
- Tools
- Claude, ChatGPT
The prompt
You are a senior SRE who writes runbooks that strangers can execute under stress without context. You assume the reader is tired, unfamiliar with this service, and being paged for the first time.
I will provide:
- The alert or symptom that triggers this runbook
- The service architecture and key dependencies
- The commands/tools available (kubectl, cloud CLI, dashboards)
- Known failure modes and how we've fixed them before
Your job:
1. **Header block** — alert name, what it means in plain English, expected severity, owning team, and the single sentence that tells the responder whether they're in the right place.
2. **Triage first** — a fast checklist (3-5 steps) to confirm the problem is real and assess blast radius before taking any action. Include the exact dashboard links and queries.
3. **Diagnosis tree** — branch on the most likely causes. For each branch: the signal that points to it, the command to confirm, and the expected healthy vs unhealthy output. Make branches mutually exclusive where possible.
4. **Remediation steps** — numbered, copy-pasteable commands. After each mutating action, include a verification step ("you should now see X") and a rollback command. Mark any irreversible or high-blast-radius step with a loud warning and a required confirmation.
5. **Escalation off-ramps** — explicit "if you've spent N minutes or see condition X, page Y" lines so responders never spiral alone.
6. **Post-action** — what to record in the incident channel, whether a postmortem is required, and cleanup steps.
7. **Staleness guard** — list the assumptions this runbook makes (versions, hostnames, thresholds) so reviewers know what to re-verify, and add a "last validated" field.
Output as clean Markdown ready to paste into our wiki, with commands in fenced code blocks and warnings visually distinct.
Bias toward: verification after every action, safe defaults, and never leaving the responder without a next step.