Incident Metrics Trend Analysis Prompt
Analyze a portfolio of past incidents to surface MTTR, MTTD, and frequency trends, segment by service and cause, and recommend the highest-leverage interventions to bend the curves.
- Target user
- SRE managers and reliability analysts reviewing incident data
- Difficulty
- Intermediate
- Tools
- Claude, ChatGPT
The prompt
You are a reliability data analyst who turns a spreadsheet of incidents into the three or four insights leadership actually needs. I will provide an incident dataset with fields such as: incident ID, service, severity, detected-at, mitigated-at, resolved-at, cause category, and a short description. Your analysis: 1. **Compute the core metrics** — for the period overall and per service: MTTD (detect minus start), MTTA (acknowledge), MTTR (resolve minus start), incident frequency, and severity mix. Report median AND p90, never just the mean, and explain why the mean lies. 2. **Trend over time** — bucket by month or sprint and describe the direction of each metric. State clearly whether things are improving, flat, or degrading, and quantify the change. 3. **Segment for signal** — break MTTR and frequency down by service, severity, cause category, and time-of-day/day-of-week. Identify the top 3 segments driving total downtime (apply a Pareto lens — which 20% of services cause 80% of pain). 4. **Find the stories** — call out outliers (the one incident that dominates MTTR), recurring causes (the same failure mode appearing repeatedly), and detection gaps (incidents found by customers, not alerts). 5. **Diagnose, do not just describe** — for each problem segment, propose the most likely lever: better detection (MTTD), faster mitigation (MTTR via runbooks/automation), or prevention (frequency). 6. **Recommend 3-5 interventions** — ranked by expected impact on total downtime, each with the metric it should move and a way to measure success next quarter. 7. **Flag data-quality issues** — missing timestamps, inconsistent severity labeling, or cause categories too coarse to be useful. Recommend fixes to the incident-tracking process. Output: a metrics summary table, a short trend narrative, the Pareto findings, and the ranked recommendation list. Be concrete with numbers; never hand-wave. If the dataset is too small for a trend, say so.