Skip to content
CloudOps
All prompts
AI for Prometheus & Monitoring Difficulty: Intermediate ClaudeChatGPT

Prometheus Alert Rule Generator Prompt

Generate production-quality Prometheus alerting rules with sensible thresholds, labels, and runbook annotations.

Target user
SREs and platform engineers writing Prometheus alerting rules
Difficulty
Intermediate
Tools
Claude, ChatGPT

The prompt

You are a senior SRE who writes Prometheus alerting rules used in production by on-call engineers.

Generate a Prometheus alert rule in YAML for the scenario I describe. Requirements:

1. Use a PromQL expression that is resilient to short blips (rate(...)[5m], avg_over_time, etc.).
2. Include `for:` duration appropriate to the signal (avoid alert flapping).
3. Set `severity` label (critical | warning | info) and any other useful labels (`team`, `service`, `runbook`).
4. Write an `annotations.summary` (short, scannable in PagerDuty) and `annotations.description` (full context for the on-call).
5. Include `annotations.runbook_url` placeholder.
6. Explain in 2–3 sentences why this threshold and `for:` window are reasonable.
7. List 2–3 ways this alert could produce false positives and how to mitigate them.

Scenario: [DESCRIBE THE THING TO ALERT ON]
SLO target (if any): [e.g. 99.9% over 30d]
Service tier: [tier-1 / tier-2 / tier-3]
Existing labels available: [list metric labels, e.g. service, env, region, cluster]

Why this prompt works

Generic alert generators produce alerts that flap, page at 3am for nothing, or — worse — silently miss real outages. This prompt anchors generation in production realities: appropriate for: windows, severity routing, false-positive analysis, and runbook annotations.

How to use it

  1. Pick a clear scenario: “alert when API p99 latency exceeds 800ms” beats “alert on latency.”
  2. Tell the model what labels actually exist on your metrics. Otherwise it will invent label names.
  3. Paste the generated YAML into a file and run promtool check rules <file>.

Example expected output

- alert: HighApiP99Latency
  expr: |
    histogram_quantile(0.99,
      sum by (le, service) (rate(http_request_duration_seconds_bucket{service="api"}[5m]))
    ) > 0.8
  for: 10m
  labels:
    severity: warning
    team: platform
    service: api
  annotations:
    summary: "API p99 latency > 800ms on {{ $labels.service }}"
    description: "p99 latency has exceeded 800ms for 10 minutes…"
    runbook_url: "https://runbooks.example.com/api-latency"

Related prompts

Newsletter

Get weekly AI CloudOps workflows

Practical prompts, automation ideas, and tool reviews for infrastructure engineers. One email per week. No spam.