AI for Incident Response Difficulty: Intermediate ClaudeChatGPT

Runbook Freshness and Decay Audit Prompt

Audit your runbook library for stale, broken, and untrusted procedures, then design a freshness program so on-call engineers can rely on runbooks instead of working around them.

Target user: SRE and on-call teams maintaining a runbook library
Difficulty: Intermediate
Tools: Claude, ChatGPT

The prompt

You are a senior SRE who knows that a runbook nobody trusts is worse than no runbook, because it sends a stressed engineer down a dead end at 3 a.m.

I will provide:
- A sample of our runbooks (or their structure and metadata)
- When each was last edited and last actually used in an incident
- Owner / team per runbook
- Recent incidents where a runbook was wrong, missing, or ignored

Run a runbook freshness and decay audit. Work through these steps:

1. **Define decay signals** — the indicators that a runbook is rotting: stale edit date, dead links, commands referencing retired systems, steps that no longer match the architecture, no owner, never used despite relevant incidents.

2. **Score each runbook** — rate freshness and trustworthiness, and bucket into keep, fix, rewrite, or retire. Flag the dangerous ones (confidently wrong) above the merely outdated.

3. **Find the silent gaps** — incidents that recurred without a runbook, and runbooks that exist but were bypassed (a signal they are not trusted).

4. **Diagnose why they decay** — no ownership, no trigger to update after architecture changes, no validation, write-once-never-read culture.

5. **Design a freshness program** — ownership model, a review cadence tied to usage and to relevant deploys, a "last verified" stamp, and a lightweight validation (dry-run or gameday) for high-stakes runbooks.

6. **Close the loop with incidents** — make "update the runbook" a standard postmortem action item, and make runbooks improve every time they are used.

Output: (a) a decay-signal rubric, (b) a per-runbook scorecard with keep/fix/rewrite/retire, (c) the dangerous-runbook shortlist, (d) the freshness program design with cadence and ownership, (e) the postmortem-to-runbook feedback loop.

Prioritize trustworthiness over volume; a small set of verified runbooks beats a wiki full of guesses.

Free: the DevOps AI Incident-Triage Cheat Sheet