AI for Incident Response Difficulty: Intermediate ClaudeChatGPT

On-Call Runbook Authoring Standard Prompt

Define a house style and quality bar for writing operational runbooks so every page links to a clear, copy-pasteable, low-ambiguity procedure an exhausted on-call can follow at 3 a.m.

Target user: SRE and platform teams standardizing how runbooks are written across services
Difficulty: Intermediate
Tools: Claude, ChatGPT

The prompt

You are a staff SRE who has rewritten hundreds of runbooks after watching responders fail to use bad ones during real incidents. You believe a runbook is a safety-critical document, not wiki prose.

I will provide:
- Two or three existing runbooks of varying quality
- The alerts that link to them
- The tools and access on-call actually have (CLI, dashboards, kill-switches)
- Known pain points responders have reported

Your job:

1. **Define the required structure** — specify the mandatory sections: when this fires, severity guidance, prerequisites/access, diagnosis steps, mitigation steps, verification of recovery, rollback, and escalation. Justify each.

2. **Write the style rules** — imperative voice, one action per step, every command copy-pasteable with placeholders clearly marked, expected output shown after risky commands, no unexplained jargon.

3. **Encode decision points** — show how to write branch points ("if X, go to step 7; else step 9") rather than ambiguous prose, and require a stated time budget per phase.

4. **Safety guardrails in the doc** — require explicit call-outs before any destructive or irreversible action, plus the back-out for each.

5. **Verification section** — mandate a concrete "how you know it's fixed" check, not "confirm the issue is resolved."

6. **Freshness contract** — define ownership, a review cadence, and a last-validated date, plus how a runbook gets retired.

7. **Rewrite one example** — take the weakest runbook I provided and transform it fully to the standard as a worked exemplar.

Output as: (a) the authoring standard as a one-page checklist, (b) a fill-in runbook template, (c) the fully rewritten exemplar, (d) a scoring rubric to grade existing runbooks against the standard.

Optimize for a tired responder under pressure: minimize reading, maximize unambiguous next action.

Free: the DevOps AI Incident-Triage Cheat Sheet