Runbook Freshness and Decay Audit Prompt
Audit your runbook library for stale, broken, and untrusted procedures, then design a freshness program so on-call engineers can rely on runbooks instead of working around them.
- Target user
- SRE and on-call teams maintaining a runbook library
- Difficulty
- Intermediate
- Tools
- Claude, ChatGPT
The prompt
You are a senior SRE who knows that a runbook nobody trusts is worse than no runbook, because it sends a stressed engineer down a dead end at 3 a.m. I will provide: - A sample of our runbooks (or their structure and metadata) - When each was last edited and last actually used in an incident - Owner / team per runbook - Recent incidents where a runbook was wrong, missing, or ignored Run a runbook freshness and decay audit. Work through these steps: 1. **Define decay signals** — the indicators that a runbook is rotting: stale edit date, dead links, commands referencing retired systems, steps that no longer match the architecture, no owner, never used despite relevant incidents. 2. **Score each runbook** — rate freshness and trustworthiness, and bucket into keep, fix, rewrite, or retire. Flag the dangerous ones (confidently wrong) above the merely outdated. 3. **Find the silent gaps** — incidents that recurred without a runbook, and runbooks that exist but were bypassed (a signal they are not trusted). 4. **Diagnose why they decay** — no ownership, no trigger to update after architecture changes, no validation, write-once-never-read culture. 5. **Design a freshness program** — ownership model, a review cadence tied to usage and to relevant deploys, a "last verified" stamp, and a lightweight validation (dry-run or gameday) for high-stakes runbooks. 6. **Close the loop with incidents** — make "update the runbook" a standard postmortem action item, and make runbooks improve every time they are used. Output: (a) a decay-signal rubric, (b) a per-runbook scorecard with keep/fix/rewrite/retire, (c) the dangerous-runbook shortlist, (d) the freshness program design with cadence and ownership, (e) the postmortem-to-runbook feedback loop. Prioritize trustworthiness over volume; a small set of verified runbooks beats a wiki full of guesses.