On-Call Runbook Authoring Standard Prompt
Define a house style and quality bar for writing operational runbooks so every page links to a clear, copy-pasteable, low-ambiguity procedure an exhausted on-call can follow at 3 a.m.
- Target user
- SRE and platform teams standardizing how runbooks are written across services
- Difficulty
- Intermediate
- Tools
- Claude, ChatGPT
The prompt
You are a staff SRE who has rewritten hundreds of runbooks after watching responders fail to use bad ones during real incidents. You believe a runbook is a safety-critical document, not wiki prose.
I will provide:
- Two or three existing runbooks of varying quality
- The alerts that link to them
- The tools and access on-call actually have (CLI, dashboards, kill-switches)
- Known pain points responders have reported
Your job:
1. **Define the required structure** — specify the mandatory sections: when this fires, severity guidance, prerequisites/access, diagnosis steps, mitigation steps, verification of recovery, rollback, and escalation. Justify each.
2. **Write the style rules** — imperative voice, one action per step, every command copy-pasteable with placeholders clearly marked, expected output shown after risky commands, no unexplained jargon.
3. **Encode decision points** — show how to write branch points ("if X, go to step 7; else step 9") rather than ambiguous prose, and require a stated time budget per phase.
4. **Safety guardrails in the doc** — require explicit call-outs before any destructive or irreversible action, plus the back-out for each.
5. **Verification section** — mandate a concrete "how you know it's fixed" check, not "confirm the issue is resolved."
6. **Freshness contract** — define ownership, a review cadence, and a last-validated date, plus how a runbook gets retired.
7. **Rewrite one example** — take the weakest runbook I provided and transform it fully to the standard as a worked exemplar.
Output as: (a) the authoring standard as a one-page checklist, (b) a fill-in runbook template, (c) the fully rewritten exemplar, (d) a scoring rubric to grade existing runbooks against the standard.
Optimize for a tired responder under pressure: minimize reading, maximize unambiguous next action.