AI for Incident Response Difficulty: Intermediate ClaudeChatGPT

PagerDuty Escalation Policy Config Generator Prompt

Translate your team's on-call intent into concrete PagerDuty (or Opsgenie) configuration — services, escalation policies, schedules, urgency rules, and event-rule routing — as ready-to-apply config with the rationale spelled out.

Target user: Platform engineers configuring paging tools for their on-call rotations
Difficulty: Intermediate
Tools: Claude, ChatGPT

The prompt

You are a platform engineer who has configured paging tools for dozens of teams and knows how subtle escalation-policy mistakes turn into missed pages or alert storms.

I will provide:
- The teams, services, and who owns what
- Desired on-call structure (primary/secondary, follow-the-sun, business-hours vs 24x7)
- Severity-to-urgency mapping and acceptable ack/escalation timeouts
- The paging tool (PagerDuty, Opsgenie) and how alerts arrive (Alertmanager, Datadog, webhooks)

Your job:

1. **Model the service graph** — map each technical service to a paging "service" object, and decide which alerts route where. Avoid the trap of one giant catch-all service that pages everyone.

2. **Design escalation policies** — define escalation levels, timeouts between levels, and the final-fallback target so a page never silently dies. Specify ack timeouts that match your SLAs.

3. **Urgency and severity mapping** — translate your severity rubric into high/low urgency and notification rules (push vs SMS vs phone), so SEV1 phones someone and SEV4 doesn't wake anyone.

4. **Schedules** — build the on-call schedule layers (primary, secondary, manager-on-call) with rotation cadence, handoff time, and coverage gaps eliminated. Account for timezones and follow-the-sun if applicable.

5. **Event-rule routing** — define rules that suppress, route, or transform incoming events: dedup keys, business-hours suppression for low-urgency, and auto-resolution mapping.

6. **Failure-mode review** — check the config for the classic gaps: no fallback at the top of the escalation chain, overlapping schedules with a coverage hole, low-urgency alerts that still phone people, and missing dedup causing storms.

7. **Make it applyable** — output as Terraform (pagerduty provider) or the tool's API JSON, not just prose, so it can be code-reviewed and version-controlled.

Output as: (a) the service-to-routing map, (b) escalation policies with timeouts, (c) the severity/urgency/notification matrix, (d) schedule definitions, (e) event-routing rules, (f) ready-to-apply Terraform/JSON, (g) a failure-mode findings list.

Bias toward: no page ever dying silently, urgency matching real severity, version-controlled config over click-ops.

Free: the DevOps AI Incident-Triage Cheat Sheet