Grafana Unified Alerting Prompt
Configure Grafana's unified alerting — contact points, notification policies, mute timings, multi-dimensional alerts, alert state.
- Target user
- SREs using Grafana for alerting
- Difficulty
- Intermediate
- Tools
- Claude, ChatGPT
The prompt
You are a senior SRE who has migrated dashboards from Grafana's legacy alerting to unified alerting — and operated alerts at scale. I will provide: - The alert use case - Current rule and notification config - Symptom (alert not firing, too noisy, wrong destination) Your job: 1. **Unified alerting model**: - **Alert rules** in a folder (groups) - **Contact points** (Slack, PagerDuty, etc.) - **Notification policies** route alerts to contact points - **Mute timings** for time-based suppression - **Silences** for ad-hoc suppression 2. **Alert rule types**: - **Grafana managed** — Grafana stores rule; works with any DS - **Mimir/Loki managed** — rule lives in Mimir/Loki ruler - **Prometheus rule** — via Prometheus 3. **For "multi-dimensional"**: - Single rule with label-based instances - Each unique label combo = separate alert 4. **For "alert not firing"**: - Check rule "State" (Normal/Pending/Alerting) - For=2m means must be active 2m - Query returns no data → No Data behavior 5. **For notification policies**: - Tree structure like Alertmanager - Match by labels - Group by labels - Default policy at root 6. **For mute timings**: - Time-based (weekends, business hours) - Multiple intervals per timing 7. **For state history**: - Grafana stores recent state changes - Useful for postmortems 8. **For migration from legacy**: - Legacy alerts attached to panels - Unified alerts standalone - Migration tool Mark DESTRUCTIVE: changing contact point to wrong webhook (alerts misrouted), mute timing too broad (silence everything), rule deletion without backup. --- Alert use case: [DESCRIBE] Rule config: ``` [PASTE] ``` Symptom: [DESCRIBE]
Why this prompt works
Grafana alerting is powerful but has its own model. This prompt walks it.
How to use it
- Define contact points first.
- Notification policies for routing.
- Rules per use case.
- Test with manual fire.
Useful UI / API
# API: list rules
curl -u admin:pass http://grafana:3000/api/v1/provisioning/alert-rules | jq
# Test rule
# UI: Alert rule → Preview
# Pause / resume
curl -X POST -u admin:pass http://grafana:3000/api/v1/provisioning/alert-rules/<uid>/pause -d '{"isPaused":true}'
# Contact points
curl -u admin:pass http://grafana:3000/api/v1/provisioning/contact-points
# Notification policies
curl -u admin:pass http://grafana:3000/api/v1/provisioning/policies
Patterns
Multi-dimensional alert
# Provisioning YAML
groups:
- name: cpu-alerts
folder: SRE
interval: 1m
rules:
- title: HighCPU
condition: B
data:
- refId: A
datasourceUid: prometheus
model:
expr: 'avg by (instance)(rate(node_cpu_seconds_total{mode!="idle"}[5m]))'
- refId: B
datasourceUid: __expr__
model:
type: threshold
conditions:
- evaluator: { params: [0.8], type: gt }
for: 5m
labels:
severity: warning
team: platform
annotations:
summary: "High CPU on {{ $labels.instance }}"
runbook: "https://runbooks.example.com/high-cpu"
Each instance value produces its own alert instance.
Notification policy
policies:
- receiver: default-slack
group_by: [alertname, cluster]
routes:
- receiver: pagerduty-critical
matchers:
- severity = critical
group_wait: 10s
- receiver: slack-platform
matchers:
- team = platform
continue: true
- receiver: slack-payments
matchers:
- team = payments
Mute timing
mute_time_intervals:
- name: weekends
time_intervals:
- weekdays: [saturday, sunday]
- name: maintenance-window
time_intervals:
- times:
- start_time: 02:00
end_time: 04:00
weekdays: [sunday]
Common findings this catches
- Alert doesn’t fire → Check rule state in UI; query returns nothing.
- Wrong recipient → policy match condition wrong.
- Mute timing affects everything → scoped to right policy?
- Alert always firing → No Data behavior set wrong.
- State lost on restart → ephemeral storage; need DB.
- Multi-dim alert explosion → cardinality on labels.
- Slow rule eval → query expensive; recording rule.
When to escalate
- Migration from legacy at scale — staged.
- Cross-team policy design — coordination.
- Multi-org Grafana alerting strategy — strategic.
Related prompts
-
Alert Fatigue Reduction Strategy Prompt
Reduce alert fatigue — SLO-based alerts vs symptom-based, severity tiers, runbook integration, deprecating noisy alerts.
-
Alertmanager Routing, Grouping & Receivers Prompt
Design Alertmanager routes — receivers (Slack, PagerDuty), grouping, inhibition, repeat intervals, mute timings.
-
Prometheus Alert Rule Generator Prompt
Generate production-quality Prometheus alerting rules with sensible thresholds, labels, and runbook annotations.