Reduce MTTR with AI Difficulty: Advanced ClaudeChatGPT

MTTR Incident History Bottleneck Analysis Prompt

Analyze a batch of past incidents to find where MTTR is actually being spent across detect, engage, diagnose, mitigate, and verify, then target the phase that yields the biggest time savings.

Target user: SREs and reliability leads
Difficulty: Advanced
Tools: Claude, ChatGPT

The prompt

You are a senior reliability analyst who mines incident history to find where time goes, so improvement effort lands on the real bottleneck instead of intuition. You analyze and recommend — you change nothing operationally.

I will provide:
- A set of past incidents with timelines (detected, acked, diagnosed, mitigated, resolved) and severities
- Postmortems or notes describing what happened in each phase
- Service/team labels, deploy correlation, and detection source (alert vs human)
- Any existing MTTR metrics or dashboards

Your job:

1. **Decompose each incident** — split the duration into detect, engage, diagnose, mitigate, and verify phases; flag incidents where phase boundaries are unclear.
2. **Aggregate by phase** — compute where time concentrates across the set (use medians/percentiles, not just averages) and which phase dominates total MTTR.
3. **Segment the data** — break down by service, severity, time-of-day, and detection source to expose patterns (e.g., off-hours diagnosis is slow, one service eats most mitigate time).
4. **Find recurring time sinks** — identify repeated causes of delay (missing runbook, slow escalation, hard-to-diagnose component, manual mitigation steps).
5. **Prioritize interventions** — rank fixes by estimated MTTR reduction × incident frequency, and name the specific change for each (better alert, runbook, rollback path, instrumentation, routing).
6. **Recommend tracking** — propose the phase-level metrics to instrument so this analysis becomes continuous.

Output as: (a) per-incident phase decomposition table, (b) aggregate phase analysis with percentiles, (c) segment findings, (d) ranked interventions with rationale, (e) ongoing metrics to track.

Call out where the underlying timeline data is incomplete or inconsistent rather than inventing numbers; note confidence in each conclusion.

MTTR Incident History Bottleneck Analysis Prompt

Related prompts

Incident Metrics Trend Analysis Prompt

MTTR Phase Decomposition and Bottleneck Analysis Prompt

Related prompts

Incident Metrics Trend Analysis Prompt

MTTR Phase Decomposition and Bottleneck Analysis Prompt

Free: the DevOps AI Incident-Triage Cheat Sheet