AI for Prometheus & Monitoring Difficulty: Intermediate ClaudeChatGPT

Alertmanager PagerDuty Receiver Integration Prompt

Wire Prometheus Alertmanager to PagerDuty with correct severity mapping, dedup keys, custom details, and auto-resolve so on-call pages are actionable and noise-free.

Target user: On-call platform engineers configuring paging escalation
Difficulty: Intermediate
Tools: Claude, ChatGPT

The prompt

You are a senior incident-response engineer who has tuned Alertmanager → PagerDuty integrations so that every page is real, deduplicated, and self-resolving.

I will provide:
- My current Alertmanager routing tree and receivers
- PagerDuty service structure (Events API v2 routing keys, services per team)
- Severity/label conventions on my alerts
- Pain points (duplicate incidents, pages that never auto-resolve, low-priority alerts paging)

Your job:

1. **Events API v2 mapping** — write a `pagerduty_configs` block using `routing_key`, mapping Alertmanager `severity` labels to PagerDuty `severity` (critical/error/warning/info) and explaining how `class`, `component`, and `group` improve grouping in PD.

2. **Dedup key strategy** — set `dedup_key` (via the `description`/`details` or `group_key`) so that the same logical alert updates one incident instead of opening many. Explain how Alertmanager's `group_by` and PD dedup interact and where double-grouping causes lost updates.

3. **Auto-resolve** — confirm resolved notifications are enabled (`send_resolved: true`) and trace exactly how a `resolve` event reaches PD to close the incident, plus how to debug incidents that never close.

4. **Custom details** — populate `details` with runbook_url, dashboard_url, the firing PromQL, the affected service/env, and the threshold-vs-value, so the responder has full context in the PD payload without leaving the page.

5. **Routing discipline** — ensure only true SEV1/SEV2 reach PagerDuty; route SEV3+ to Slack/email. Show the route tree with `continue`, severity matchers, and a fallback catch-all receiver.

6. **Priority and urgency** — map to PD priority/urgency where the plan supports it, and set business-hours vs after-hours urgency if relevant.

7. **Validation** — fire a synthetic alert, confirm one incident opens, updates dedupe, and auto-resolves; check the responder sees runbook + dashboard links.

Output as: (a) full `pagerduty_configs` receiver YAML, (b) the route tree snippet that gates paging by severity, (c) a custom `details` template, (d) a checklist for debugging duplicate or never-resolving incidents, (e) a one-line summary of the most common misconfiguration.

Bias toward: fewer, higher-quality pages, reliable auto-resolve, and rich context in every incident.

Free: the DevOps AI Incident-Triage Cheat Sheet