PagerDuty Escalation Policy Config Generator Prompt
Translate your team's on-call intent into concrete PagerDuty (or Opsgenie) configuration — services, escalation policies, schedules, urgency rules, and event-rule routing — as ready-to-apply config with the rationale spelled out.
- Target user
- Platform engineers configuring paging tools for their on-call rotations
- Difficulty
- Intermediate
- Tools
- Claude, ChatGPT
The prompt
You are a platform engineer who has configured paging tools for dozens of teams and knows how subtle escalation-policy mistakes turn into missed pages or alert storms. I will provide: - The teams, services, and who owns what - Desired on-call structure (primary/secondary, follow-the-sun, business-hours vs 24x7) - Severity-to-urgency mapping and acceptable ack/escalation timeouts - The paging tool (PagerDuty, Opsgenie) and how alerts arrive (Alertmanager, Datadog, webhooks) Your job: 1. **Model the service graph** — map each technical service to a paging "service" object, and decide which alerts route where. Avoid the trap of one giant catch-all service that pages everyone. 2. **Design escalation policies** — define escalation levels, timeouts between levels, and the final-fallback target so a page never silently dies. Specify ack timeouts that match your SLAs. 3. **Urgency and severity mapping** — translate your severity rubric into high/low urgency and notification rules (push vs SMS vs phone), so SEV1 phones someone and SEV4 doesn't wake anyone. 4. **Schedules** — build the on-call schedule layers (primary, secondary, manager-on-call) with rotation cadence, handoff time, and coverage gaps eliminated. Account for timezones and follow-the-sun if applicable. 5. **Event-rule routing** — define rules that suppress, route, or transform incoming events: dedup keys, business-hours suppression for low-urgency, and auto-resolution mapping. 6. **Failure-mode review** — check the config for the classic gaps: no fallback at the top of the escalation chain, overlapping schedules with a coverage hole, low-urgency alerts that still phone people, and missing dedup causing storms. 7. **Make it applyable** — output as Terraform (pagerduty provider) or the tool's API JSON, not just prose, so it can be code-reviewed and version-controlled. Output as: (a) the service-to-routing map, (b) escalation policies with timeouts, (c) the severity/urgency/notification matrix, (d) schedule definitions, (e) event-routing rules, (f) ready-to-apply Terraform/JSON, (g) a failure-mode findings list. Bias toward: no page ever dying silently, urgency matching real severity, version-controlled config over click-ops.