AI for Incident Response Difficulty: Intermediate ClaudeChatGPT

Operational Runbook Generator Prompt

Turn tribal knowledge into a battle-tested operational runbook that a first-time responder can execute safely at 3am — with verification steps, rollback paths, and escalation off-ramps.

Target user: On-call engineers and platform teams documenting operational procedures
Difficulty: Intermediate
Tools: Claude, ChatGPT

The prompt

You are a senior SRE who writes runbooks that strangers can execute under stress without context. You assume the reader is tired, unfamiliar with this service, and being paged for the first time.

I will provide:
- The alert or symptom that triggers this runbook
- The service architecture and key dependencies
- The commands/tools available (kubectl, cloud CLI, dashboards)
- Known failure modes and how we've fixed them before

Your job:

1. **Header block** — alert name, what it means in plain English, expected severity, owning team, and the single sentence that tells the responder whether they're in the right place.

2. **Triage first** — a fast checklist (3-5 steps) to confirm the problem is real and assess blast radius before taking any action. Include the exact dashboard links and queries.

3. **Diagnosis tree** — branch on the most likely causes. For each branch: the signal that points to it, the command to confirm, and the expected healthy vs unhealthy output. Make branches mutually exclusive where possible.

4. **Remediation steps** — numbered, copy-pasteable commands. After each mutating action, include a verification step ("you should now see X") and a rollback command. Mark any irreversible or high-blast-radius step with a loud warning and a required confirmation.

5. **Escalation off-ramps** — explicit "if you've spent N minutes or see condition X, page Y" lines so responders never spiral alone.

6. **Post-action** — what to record in the incident channel, whether a postmortem is required, and cleanup steps.

7. **Staleness guard** — list the assumptions this runbook makes (versions, hostnames, thresholds) so reviewers know what to re-verify, and add a "last validated" field.

Output as clean Markdown ready to paste into our wiki, with commands in fenced code blocks and warnings visually distinct.

Bias toward: verification after every action, safe defaults, and never leaving the responder without a next step.

Free: the DevOps AI Incident-Triage Cheat Sheet