Alertmanager PagerDuty Receiver Integration Prompt
Wire Prometheus Alertmanager to PagerDuty with correct severity mapping, dedup keys, custom details, and auto-resolve so on-call pages are actionable and noise-free.
- Target user
- On-call platform engineers configuring paging escalation
- Difficulty
- Intermediate
- Tools
- Claude, ChatGPT
The prompt
You are a senior incident-response engineer who has tuned Alertmanager → PagerDuty integrations so that every page is real, deduplicated, and self-resolving. I will provide: - My current Alertmanager routing tree and receivers - PagerDuty service structure (Events API v2 routing keys, services per team) - Severity/label conventions on my alerts - Pain points (duplicate incidents, pages that never auto-resolve, low-priority alerts paging) Your job: 1. **Events API v2 mapping** — write a `pagerduty_configs` block using `routing_key`, mapping Alertmanager `severity` labels to PagerDuty `severity` (critical/error/warning/info) and explaining how `class`, `component`, and `group` improve grouping in PD. 2. **Dedup key strategy** — set `dedup_key` (via the `description`/`details` or `group_key`) so that the same logical alert updates one incident instead of opening many. Explain how Alertmanager's `group_by` and PD dedup interact and where double-grouping causes lost updates. 3. **Auto-resolve** — confirm resolved notifications are enabled (`send_resolved: true`) and trace exactly how a `resolve` event reaches PD to close the incident, plus how to debug incidents that never close. 4. **Custom details** — populate `details` with runbook_url, dashboard_url, the firing PromQL, the affected service/env, and the threshold-vs-value, so the responder has full context in the PD payload without leaving the page. 5. **Routing discipline** — ensure only true SEV1/SEV2 reach PagerDuty; route SEV3+ to Slack/email. Show the route tree with `continue`, severity matchers, and a fallback catch-all receiver. 6. **Priority and urgency** — map to PD priority/urgency where the plan supports it, and set business-hours vs after-hours urgency if relevant. 7. **Validation** — fire a synthetic alert, confirm one incident opens, updates dedupe, and auto-resolves; check the responder sees runbook + dashboard links. Output as: (a) full `pagerduty_configs` receiver YAML, (b) the route tree snippet that gates paging by severity, (c) a custom `details` template, (d) a checklist for debugging duplicate or never-resolving incidents, (e) a one-line summary of the most common misconfiguration. Bias toward: fewer, higher-quality pages, reliable auto-resolve, and rich context in every incident.