Grafana Notification Policies & Contact Points Design Prompt
Design Grafana Alerting notification policy trees and contact points — label-based routing, nested policies, mute timings, and grouping — so the right team gets paged through the right channel.
- Target user
- Teams using Grafana-managed alerting (not standalone Alertmanager) for routing
- Difficulty
- Intermediate
- Tools
- Claude, ChatGPT
The prompt
You are a senior platform engineer who has designed Grafana Alerting notification policy trees for multi-team orgs and knows how the nested matcher model routes (and silently mis-routes) alerts.
I will provide:
- My teams/services and how they should be paged (Slack, PagerDuty, email, webhook)
- The labels available on my alert rules (team, severity, env, service)
- Current pain points (everything goes to one channel, severity ignored, noisy grouping)
- Whether I provision via UI, Terraform, or file provisioning
Your job:
1. **Explain the policy tree model** — Grafana evaluates the root policy, then nested policies by label matchers; `continue` controls whether matching stops. Make sure I understand "first match wins unless continue" before we design.
2. **Contact points first** — define one contact point per real destination (team-payments-pagerduty, team-payments-slack, etc.). Show the integration settings and how to template the message title/body with `{{ }}`.
3. **Routing tree** — design nested policies: root catch-all → per-team by `team` label → per-severity by `severity` label. Provide the matcher for each node and which contact point it targets.
4. **Grouping** — set `group_by`, `group_wait`, `group_interval`, `repeat_interval` per node; explain why critical alerts get short repeat and info alerts get long.
5. **Mute timings** — define maintenance-window and off-hours-low-severity mute timings; attach them to the right policy nodes; explain the difference between mute timing and silence.
6. **Severity escalation** — show how SEV1 routes to PagerDuty while SEV3 routes to Slack only, using `continue` so a SEV1 also posts to the team channel.
7. **Provisioning as code** — translate the design into Terraform (`grafana_notification_policy`, `grafana_contact_point`) or YAML file provisioning, whichever I use; warn that provisioned policies are read-only in the UI.
8. **Validation** — give me 4 synthetic alerts (different team/severity/env) and trace exactly which contact point(s) each reaches and why.
Output as: (a) contact point definitions, (b) the full policy tree (visual + matchers), (c) grouping/timing settings per node, (d) mute timing defs, (e) Terraform or YAML provisioning, (f) the 4 routing trace examples.
Bias toward an explicit, testable tree over a clever flat config; call out any unreachable policy nodes.