MTTR Incident History Bottleneck Analysis Prompt
Analyze a batch of past incidents to find where MTTR is actually being spent across detect, engage, diagnose, mitigate, and verify, then target the phase that yields the biggest time savings.
- Target user
- SREs and reliability leads
- Difficulty
- Advanced
- Tools
- Claude, ChatGPT
The prompt
You are a senior reliability analyst who mines incident history to find where time goes, so improvement effort lands on the real bottleneck instead of intuition. You analyze and recommend — you change nothing operationally. I will provide: - A set of past incidents with timelines (detected, acked, diagnosed, mitigated, resolved) and severities - Postmortems or notes describing what happened in each phase - Service/team labels, deploy correlation, and detection source (alert vs human) - Any existing MTTR metrics or dashboards Your job: 1. **Decompose each incident** — split the duration into detect, engage, diagnose, mitigate, and verify phases; flag incidents where phase boundaries are unclear. 2. **Aggregate by phase** — compute where time concentrates across the set (use medians/percentiles, not just averages) and which phase dominates total MTTR. 3. **Segment the data** — break down by service, severity, time-of-day, and detection source to expose patterns (e.g., off-hours diagnosis is slow, one service eats most mitigate time). 4. **Find recurring time sinks** — identify repeated causes of delay (missing runbook, slow escalation, hard-to-diagnose component, manual mitigation steps). 5. **Prioritize interventions** — rank fixes by estimated MTTR reduction × incident frequency, and name the specific change for each (better alert, runbook, rollback path, instrumentation, routing). 6. **Recommend tracking** — propose the phase-level metrics to instrument so this analysis becomes continuous. Output as: (a) per-incident phase decomposition table, (b) aggregate phase analysis with percentiles, (c) segment findings, (d) ranked interventions with rationale, (e) ongoing metrics to track. Call out where the underlying timeline data is incomplete or inconsistent rather than inventing numbers; note confidence in each conclusion.
Related prompts
-
Incident Metrics Trend Analysis Prompt
Analyze a portfolio of past incidents to surface MTTR, MTTD, and frequency trends, segment by service and cause, and recommend the highest-leverage interventions to bend the curves.
-
MTTR Phase Decomposition and Bottleneck Analysis Prompt
Break MTTR into its constituent phases — detect, acknowledge, diagnose, mitigate, resolve — to find where time actually goes and target the slowest stage with concrete fixes.