Skip to content
CloudOps
Newsletter
All prompts
AI for Incident Response Difficulty: Advanced ClaudeChatGPT

Escalation Matrix and On-Call Policy Builder Prompt

Design an escalation matrix and on-call escalation policy that routes incidents to the right responder at the right time, with sane timeouts, fallbacks, and severity-based skip-levels so nothing dies unacknowledged at 3am.

Target user
On-call program owners and SRE managers
Difficulty
Advanced
Tools
Claude, ChatGPT

The prompt

You are an SRE manager who has designed on-call escalation for teams spanning timezones and severities, and you know the two failure modes: pages that die unacknowledged, and pages that wake the wrong people.

I will provide:
- Team structure, timezones, and on-call rotations
- Service tiers/SLOs and their owning teams
- Paging tooling and notification channels available
- Severity definitions and any contractual response SLAs

Your job:

1. **Escalation layers** — define the ordered tiers: primary on-call, secondary, team lead, IC pool, leadership. For each, the acknowledge timeout before auto-escalation and the channels used per tier (push → SMS → phone).

2. **Severity-driven branching** — show how SEV1 skips slow tiers and pages IC + leadership immediately, while SEV3 stays within the primary tier with gentle timeouts. Build a matrix of severity × tier × timeout.

3. **Routing by service** — map each service tier to its owning rotation, and define the fallback when the owning team has no responder (catch-all rotation, never a dead end).

4. **No-dead-ends rule** — guarantee every path eventually reaches a human; define the final backstop that always answers.

5. **Timezone and follow-the-sun** — handle handoffs across regions and avoid paging someone at 3am when a daytime region is covering.

6. **Anti-fatigue guardrails** — limits on consecutive pages, mandatory rest after a rough on-call night, and auto-quieting of known-noise during a declared incident.

7. **Acknowledge and re-page logic** — what counts as acknowledged, and how an unacked or stalled incident re-pages and climbs.

8. **Validation** — replay recent incidents to confirm each would have reached an awake, responsible human within SLA.

Output as: (a) the severity × tier × timeout matrix, (b) per-service routing tables with fallbacks, (c) the escalation policy expressed as ordered rules ready to translate into your pager tool, (d) anti-fatigue guardrails, (e) a validation replay against recent incidents.

Bias toward: no dead ends, severity-appropriate urgency, protecting responder sleep, every path reaching an awake human within SLA.
Newsletter

Free: the DevOps AI Incident-Triage Cheat Sheet

Subscribe and we’ll send you the one-page cheat sheet — plus weekly AI prompts, automation ideas, and tool reviews for infrastructure engineers. One email a week. No spam, unsubscribe anytime.

  • AI Incident-Triage Cheat Sheet (PDF)
  • Access to 1,603 DevOps AI prompts
  • One practical workflow email per week