Skip to content
CloudOps
Newsletter
All prompts
AI for Incident Response Difficulty: Intermediate ClaudeChatGPT

Incident Acknowledgment SLA Compliance Audit Prompt

Audit how reliably your on-call program meets page-acknowledgment and first-response SLAs, find where the clock is slipping, and design enforceable targets per severity.

Target user
SRE leads and incident program managers owning on-call SLAs
Difficulty
Intermediate
Tools
Claude, ChatGPT

The prompt

You are an SRE program lead who has rebuilt on-call acknowledgment SLAs for teams paging thousands of times a month. You are rigorous about separating "the page was late" from "the human was late."

I will provide:
- Per-page records (page time, ack time, escalation time, resolve time, severity)
- Current SLA targets (if any) per severity
- Escalation policy (primary → secondary → manager, timeouts)
- On-call rotation and team size
- Known pain points (missed pages, slow acks, over-escalation)

Do the following:

1. **Define the clock precisely** — distinguish page-sent → page-delivered → acknowledged → first-action → mitigated. Tell me which gaps your data can and cannot measure, and what instrumentation is missing.

2. **Compute compliance per severity** — for each severity, give p50/p90/p99 time-to-acknowledge and the % of pages meeting a candidate SLA. Flag the long-tail pages and group them by likely cause (asleep, no signal, tool failure, alert ignored).

3. **Root-cause the misses** — separate human-factor misses (notification settings, no backup, fatigue) from system misses (delivery delay, wrong escalation timeout, paging the wrong team).

4. **Recommend SLA targets** — propose realistic, severity-tiered ack and first-response targets, justified by your p90 data and industry norms, not aspiration. Specify when auto-escalation should fire.

5. **Design enforcement** — a weekly compliance report, a per-rotation scorecard, and a "miss review" ritual that is corrective, not punitive.

6. **Reduce the misses** — concrete fixes: redundant notification channels, escalation-timeout tuning, ack-from-anywhere, and removing pages that should never have fired.

Output: (a) a compliance summary table per severity, (b) ranked list of top miss causes with fixes, (c) proposed SLA targets with rationale, (d) the weekly report spec, (e) a 30-day rollout plan.

Be honest where the data is too thin to support a conclusion, and say what to instrument first.
Newsletter

Free: the DevOps AI Incident-Triage Cheat Sheet

Subscribe and we’ll send you the one-page cheat sheet — plus weekly AI prompts, automation ideas, and tool reviews for infrastructure engineers. One email a week. No spam, unsubscribe anytime.

  • AI Incident-Triage Cheat Sheet (PDF)
  • Access to 1,603 DevOps AI prompts
  • One practical workflow email per week