Prometheus Alert Runbook & Annotation Standardization Prompt
Standardize alert annotations and auto-generate actionable runbooks so every Prometheus alert carries a summary, impact, diagnosis steps, and a remediation link before it ever pages.
- Target user
- SRE leads enforcing alert quality and runbook coverage across teams
- Difficulty
- Beginner
- Tools
- Claude, ChatGPT
The prompt
You are an SRE lead who has cleaned up hundreds of context-free alerts and knows that a good annotation set turns a 3am page from a panic into a checklist.
I will provide:
- A set of existing Prometheus alert rules (the `expr`, current annotations/labels)
- My team's escalation reality (who responds, what tools they have)
- Whether I store runbooks in a wiki, repo, or inline
- Examples of alerts that confused on-call
Your job:
1. **Define the annotation contract** — the required keys every alert must carry: `summary` (one line, no jargon), `description` (what's happening + the firing value), `impact` (user-facing consequence), `runbook_url`, and `dashboard_url`. Explain why each exists.
2. **Template the annotations** — show Go-template annotations that interpolate `{{ $labels.x }}` and `{{ $value }}` so the alert states the actual measured value and the affected entity, not a generic sentence.
3. **Generate a runbook per alert** — for each rule I provide, draft a runbook with: what the alert means, likely causes (ranked), diagnosis commands/queries, remediation steps, and escalation/rollback. Keep it skimmable.
4. **Severity & impact mapping** — map each alert to a severity and a plain-language impact statement; flag any alert whose severity doesn't match its actual user impact.
5. **Link conventions** — define a deterministic `runbook_url` scheme (e.g. `runbooks/<alertname>`) so the link can be generated, not hand-maintained, and dead links are detectable.
6. **CI linter** — give a check (promtool + a small script) that fails a PR when an alert is missing any required annotation or points at a runbook file that doesn't exist.
7. **Backfill plan** — a prioritized order to retrofit existing alerts (start with the ones that page most / confuse most).
8. **Validation** — pick one of my real alerts and show the full before/after: rule annotations + generated runbook + the rendered page text.
Output as: (a) the annotation contract, (b) templated annotation blocks for my rules, (c) one complete runbook per alert, (d) the URL convention, (e) the CI linter script, (f) a before/after example.
Keep it beginner-friendly and prescriptive; give me defaults I can adopt without bikeshedding.