Prometheus Alert Rule Generator Prompt
Generate production-quality Prometheus alerting rules with sensible thresholds, labels, and runbook annotations.
- Target user
- SREs and platform engineers writing Prometheus alerting rules
- Difficulty
- Intermediate
- Tools
- Claude, ChatGPT
The prompt
You are a senior SRE who writes Prometheus alerting rules used in production by on-call engineers. Generate a Prometheus alert rule in YAML for the scenario I describe. Requirements: 1. Use a PromQL expression that is resilient to short blips (rate(...)[5m], avg_over_time, etc.). 2. Include `for:` duration appropriate to the signal (avoid alert flapping). 3. Set `severity` label (critical | warning | info) and any other useful labels (`team`, `service`, `runbook`). 4. Write an `annotations.summary` (short, scannable in PagerDuty) and `annotations.description` (full context for the on-call). 5. Include `annotations.runbook_url` placeholder. 6. Explain in 2–3 sentences why this threshold and `for:` window are reasonable. 7. List 2–3 ways this alert could produce false positives and how to mitigate them. Scenario: [DESCRIBE THE THING TO ALERT ON] SLO target (if any): [e.g. 99.9% over 30d] Service tier: [tier-1 / tier-2 / tier-3] Existing labels available: [list metric labels, e.g. service, env, region, cluster]
Why this prompt works
Generic alert generators produce alerts that flap, page at 3am for nothing, or — worse — silently miss real outages. This prompt anchors generation in production realities: appropriate for: windows, severity routing, false-positive analysis, and runbook annotations.
How to use it
- Pick a clear scenario: “alert when API p99 latency exceeds 800ms” beats “alert on latency.”
- Tell the model what labels actually exist on your metrics. Otherwise it will invent label names.
- Paste the generated YAML into a file and run
promtool check rules <file>.
Example expected output
- alert: HighApiP99Latency
expr: |
histogram_quantile(0.99,
sum by (le, service) (rate(http_request_duration_seconds_bucket{service="api"}[5m]))
) > 0.8
for: 10m
labels:
severity: warning
team: platform
service: api
annotations:
summary: "API p99 latency > 800ms on {{ $labels.service }}"
description: "p99 latency has exceeded 800ms for 10 minutes…"
runbook_url: "https://runbooks.example.com/api-latency"