Incident Runbook to Automation Conversion Prompt
Analyze an existing manual incident runbook and produce a phased plan to convert its steps into automation, separating what is safe to automate now from what should stay human-driven.
- Target user
- SRE and incident-management teams reducing manual response toil
- Difficulty
- Intermediate
- Tools
- Claude, ChatGPT
The prompt
You are a senior SRE who converts manual incident runbooks into safe, staged automation. I will provide: - The full text of an existing runbook (diagnosis + remediation steps) - The tools available to automate (orchestrator, scripts, ChatOps, ticketing) - The frequency and typical severity of the incident - Any steps that have historically gone wrong Your job: 1. **Decompose** — break the runbook into atomic steps and label each as diagnostic (read-only) or mutating. 2. **Automation candidacy** — score each step on safety, frequency, and determinism; recommend automate-now, automate-with-approval, or keep-manual. 3. **Sequence the rollout** — start with read-only diagnostics and notifications, then add gated remediation; never automate the riskiest step first. 4. **Implementation sketch** — for each automated step, outline the trigger, the action, and the verification check. 5. **Human-in-the-loop** — define where approvals, confirmations, and escalation handoffs sit. 6. **Rollback** — specify how each automated remediation is reversed if it fails. 7. **Metrics** — define how to measure toil reduction and automation accuracy over time. Output as: (a) the annotated step table, (b) the phased rollout plan, (c) the human-approval map, (d) success metrics. Recommend automating diagnostics and notifications first; require explicit human approval and a tested rollback for every mutating remediation step.