Skip to content
CloudOps
Newsletter
All prompts
AI for Prometheus & Monitoring Difficulty: Beginner ClaudeChatGPT

Prometheus Alert Runbook & Annotation Standardization Prompt

Standardize alert annotations and auto-generate actionable runbooks so every Prometheus alert carries a summary, impact, diagnosis steps, and a remediation link before it ever pages.

Target user
SRE leads enforcing alert quality and runbook coverage across teams
Difficulty
Beginner
Tools
Claude, ChatGPT

The prompt

You are an SRE lead who has cleaned up hundreds of context-free alerts and knows that a good annotation set turns a 3am page from a panic into a checklist.

I will provide:
- A set of existing Prometheus alert rules (the `expr`, current annotations/labels)
- My team's escalation reality (who responds, what tools they have)
- Whether I store runbooks in a wiki, repo, or inline
- Examples of alerts that confused on-call

Your job:

1. **Define the annotation contract** — the required keys every alert must carry: `summary` (one line, no jargon), `description` (what's happening + the firing value), `impact` (user-facing consequence), `runbook_url`, and `dashboard_url`. Explain why each exists.

2. **Template the annotations** — show Go-template annotations that interpolate `{{ $labels.x }}` and `{{ $value }}` so the alert states the actual measured value and the affected entity, not a generic sentence.

3. **Generate a runbook per alert** — for each rule I provide, draft a runbook with: what the alert means, likely causes (ranked), diagnosis commands/queries, remediation steps, and escalation/rollback. Keep it skimmable.

4. **Severity & impact mapping** — map each alert to a severity and a plain-language impact statement; flag any alert whose severity doesn't match its actual user impact.

5. **Link conventions** — define a deterministic `runbook_url` scheme (e.g. `runbooks/<alertname>`) so the link can be generated, not hand-maintained, and dead links are detectable.

6. **CI linter** — give a check (promtool + a small script) that fails a PR when an alert is missing any required annotation or points at a runbook file that doesn't exist.

7. **Backfill plan** — a prioritized order to retrofit existing alerts (start with the ones that page most / confuse most).

8. **Validation** — pick one of my real alerts and show the full before/after: rule annotations + generated runbook + the rendered page text.

Output as: (a) the annotation contract, (b) templated annotation blocks for my rules, (c) one complete runbook per alert, (d) the URL convention, (e) the CI linter script, (f) a before/after example.

Keep it beginner-friendly and prescriptive; give me defaults I can adopt without bikeshedding.
Newsletter

Free: the DevOps AI Incident-Triage Cheat Sheet

Subscribe and we’ll send you the one-page cheat sheet — plus weekly AI prompts, automation ideas, and tool reviews for infrastructure engineers. One email a week. No spam, unsubscribe anytime.

  • AI Incident-Triage Cheat Sheet (PDF)
  • Access to 1,603 DevOps AI prompts
  • One practical workflow email per week