Skip to content
CloudOps
Newsletter
All prompts
AI for Prometheus & Monitoring Difficulty: Intermediate ClaudeChatGPT

Grafana Unified Alerting Prompt

Configure Grafana's unified alerting — contact points, notification policies, mute timings, multi-dimensional alerts, alert state.

Target user
SREs using Grafana for alerting
Difficulty
Intermediate
Tools
Claude, ChatGPT

The prompt

You are a senior SRE who has migrated dashboards from Grafana's legacy alerting to unified alerting — and operated alerts at scale.

I will provide:
- The alert use case
- Current rule and notification config
- Symptom (alert not firing, too noisy, wrong destination)

Your job:

1. **Unified alerting model**:
   - **Alert rules** in a folder (groups)
   - **Contact points** (Slack, PagerDuty, etc.)
   - **Notification policies** route alerts to contact points
   - **Mute timings** for time-based suppression
   - **Silences** for ad-hoc suppression
2. **Alert rule types**:
   - **Grafana managed** — Grafana stores rule; works with any DS
   - **Mimir/Loki managed** — rule lives in Mimir/Loki ruler
   - **Prometheus rule** — via Prometheus
3. **For "multi-dimensional"**:
   - Single rule with label-based instances
   - Each unique label combo = separate alert
4. **For "alert not firing"**:
   - Check rule "State" (Normal/Pending/Alerting)
   - For=2m means must be active 2m
   - Query returns no data → No Data behavior
5. **For notification policies**:
   - Tree structure like Alertmanager
   - Match by labels
   - Group by labels
   - Default policy at root
6. **For mute timings**:
   - Time-based (weekends, business hours)
   - Multiple intervals per timing
7. **For state history**:
   - Grafana stores recent state changes
   - Useful for postmortems
8. **For migration from legacy**:
   - Legacy alerts attached to panels
   - Unified alerts standalone
   - Migration tool

Mark DESTRUCTIVE: changing contact point to wrong webhook (alerts misrouted), mute timing too broad (silence everything), rule deletion without backup.

---

Alert use case: [DESCRIBE]
Rule config:
```
[PASTE]
```
Symptom: [DESCRIBE]

Why this prompt works

Grafana alerting is powerful but has its own model. This prompt walks it.

How to use it

  1. Define contact points first.
  2. Notification policies for routing.
  3. Rules per use case.
  4. Test with manual fire.

Useful UI / API

# API: list rules
curl -u admin:pass http://grafana:3000/api/v1/provisioning/alert-rules | jq

# Test rule
# UI: Alert rule → Preview

# Pause / resume
curl -X POST -u admin:pass http://grafana:3000/api/v1/provisioning/alert-rules/<uid>/pause -d '{"isPaused":true}'

# Contact points
curl -u admin:pass http://grafana:3000/api/v1/provisioning/contact-points

# Notification policies
curl -u admin:pass http://grafana:3000/api/v1/provisioning/policies

Patterns

Multi-dimensional alert

# Provisioning YAML
groups:
- name: cpu-alerts
  folder: SRE
  interval: 1m
  rules:
  - title: HighCPU
    condition: B
    data:
    - refId: A
      datasourceUid: prometheus
      model:
        expr: 'avg by (instance)(rate(node_cpu_seconds_total{mode!="idle"}[5m]))'
    - refId: B
      datasourceUid: __expr__
      model:
        type: threshold
        conditions:
        - evaluator: { params: [0.8], type: gt }
    for: 5m
    labels:
      severity: warning
      team: platform
    annotations:
      summary: "High CPU on {{ $labels.instance }}"
      runbook: "https://runbooks.example.com/high-cpu"

Each instance value produces its own alert instance.

Notification policy

policies:
- receiver: default-slack
  group_by: [alertname, cluster]
  routes:
  - receiver: pagerduty-critical
    matchers:
    - severity = critical
    group_wait: 10s
  - receiver: slack-platform
    matchers:
    - team = platform
    continue: true
  - receiver: slack-payments
    matchers:
    - team = payments

Mute timing

mute_time_intervals:
- name: weekends
  time_intervals:
  - weekdays: [saturday, sunday]
- name: maintenance-window
  time_intervals:
  - times:
    - start_time: 02:00
      end_time: 04:00
    weekdays: [sunday]

Common findings this catches

  • Alert doesn’t fire → Check rule state in UI; query returns nothing.
  • Wrong recipient → policy match condition wrong.
  • Mute timing affects everything → scoped to right policy?
  • Alert always firing → No Data behavior set wrong.
  • State lost on restart → ephemeral storage; need DB.
  • Multi-dim alert explosion → cardinality on labels.
  • Slow rule eval → query expensive; recording rule.

When to escalate

  • Migration from legacy at scale — staged.
  • Cross-team policy design — coordination.
  • Multi-org Grafana alerting strategy — strategic.

Related prompts

Newsletter

Free: the DevOps AI Incident-Triage Cheat Sheet

Subscribe and we’ll send you the one-page cheat sheet — plus weekly AI prompts, automation ideas, and tool reviews for infrastructure engineers. One email a week. No spam, unsubscribe anytime.

  • AI Incident-Triage Cheat Sheet (PDF)
  • Access to 1,603 DevOps AI prompts
  • One practical workflow email per week