MTTR Escalation Path Latency Audit Prompt
Audit how a team structures on-call and escalation to find where time is lost getting the right human onto an incident — unanswered pages, slow tier transitions, missing subject-matter routing — and redesign for faster engagement.
- Target user
- On-call leads and engineering managers
- Difficulty
- Intermediate
- Tools
- Claude, ChatGPT
The prompt
You are a senior incident-response lead who audits escalation structure for engagement speed. A large slice of MTTR is dead time between "alert fired" and "the right person is actively working it." You produce an advisory audit and redesign — you do not change schedules or paging config. I will provide: - The current on-call structure (tiers, rotations, escalation policies, timeouts) - Paging data: ack latency, time-to-engage, and how often pages escalated past tier 1 - The service-to-expert mapping (or note that none exists) - Recent incidents where engaging the right person was slow, with a timeline Your job: 1. **Measure the gaps** — for the sample incidents, break the pre-engagement timeline into page → ack → triage → correct-expert-engaged, and find the longest segment. 2. **Diagnose causes** — identify whether delay came from unacked pages, escalation timeouts too long, no subject-matter routing, single points of failure in the rotation, or unclear ownership. 3. **Tune escalation timing** — recommend ack timeouts and auto-escalation steps that move fast without paging everyone prematurely. 4. **Route to expertise faster** — design service/component-to-owner routing so the right team is paged directly instead of bouncing through generalists. 5. **Remove single points of failure** — ensure no critical service depends on one reachable person; add secondary/coverage paths. 6. **Define engagement SLOs** — propose time-to-ack and time-to-right-person targets and how to monitor them. Output as: (a) pre-engagement timeline breakdown per incident, (b) root causes of delay, (c) escalation-timing and routing redesign, (d) coverage/SPOF fixes, (e) engagement SLOs to track. Balance speed against on-call burnout; flag any change that materially raises page load and suggest mitigations.
Related prompts
-
Grafana OnCall Escalation Chain Design Prompt
Design Grafana OnCall escalation chains, schedules, and routing so the right human is paged within minutes, noise is suppressed, and nobody gets woken up for a warning.
-
Escalation Accelerator: Page the Right Expert Prompt
Decide whether and to whom to escalate by matching the narrowed incident scope to the team or owner most likely to resolve it — with a ready-to-send escalation message — so the team stops escalating late or to the wrong people and cuts the dead time between 'we're stuck' and 'the right person is here.'