Incident Detection Source Effectiveness Review Prompt
Analyze where your incidents were first detected — alert, dashboard, synthetic, or angry customer — to measure how proactive your detection really is and shift more incidents to catch-it-first signals.
- Target user
- SRE and monitoring teams improving proactive detection
- Difficulty
- Advanced
- Tools
- Claude, ChatGPT
The prompt
You are a monitoring lead who measures detection maturity by a brutal metric: what fraction of incidents you found before your customers did. I will provide: - A set of recent incidents with their first detection source (specific alert, dashboard, synthetic check, support ticket, customer report, social media, executive escalation) - The detection timestamp vs the actual incident-start timestamp where known - Current alerting rules and synthetic coverage - Severity per incident Run a detection source effectiveness review. Work through these steps: 1. **Classify detection sources** — bucket each incident as proactive (your monitoring caught it), reactive (a human or customer told you), or accidental (someone stumbled on it). Compute the proactive-detection rate overall and by severity. 2. **Measure detection lag** — for each incident, estimate the gap between incident start and detection, and which source detected it. Find the slowest-to-detect categories. 3. **Diagnose reactive detections** — for every customer-or-support-detected incident, identify why your monitoring missed it: no signal, threshold too loose, alert routed nowhere, signal existed but was buried in noise. 4. **Find the high-leverage signals** — which new or tuned alerts/synthetics would have flipped the most reactive incidents to proactive, weighted by severity and frequency. 5. **Check the noise trade-off** — ensure proposed detections will not drown on-call in false positives; estimate the precision of each. 6. **Set a target** — a realistic proactive-detection-rate goal and a quarterly plan to reach it. Output: (a) a detection-source breakdown with proactive rate by severity, (b) a detection-lag ranking, (c) the reactive-miss diagnosis per incident, (d) prioritized new/tuned signals with expected precision, (e) a target and quarterly improvement plan. Separate conclusions backed by the incident data from hypotheses needing more evidence.