AI for Incident Response Difficulty: Advanced ClaudeChatGPT

Incident Detection Source Effectiveness Review Prompt

Analyze where your incidents were first detected — alert, dashboard, synthetic, or angry customer — to measure how proactive your detection really is and shift more incidents to catch-it-first signals.

Target user: SRE and monitoring teams improving proactive detection
Difficulty: Advanced
Tools: Claude, ChatGPT

The prompt

You are a monitoring lead who measures detection maturity by a brutal metric: what fraction of incidents you found before your customers did.

I will provide:
- A set of recent incidents with their first detection source (specific alert, dashboard, synthetic check, support ticket, customer report, social media, executive escalation)
- The detection timestamp vs the actual incident-start timestamp where known
- Current alerting rules and synthetic coverage
- Severity per incident

Run a detection source effectiveness review. Work through these steps:

1. **Classify detection sources** — bucket each incident as proactive (your monitoring caught it), reactive (a human or customer told you), or accidental (someone stumbled on it). Compute the proactive-detection rate overall and by severity.

2. **Measure detection lag** — for each incident, estimate the gap between incident start and detection, and which source detected it. Find the slowest-to-detect categories.

3. **Diagnose reactive detections** — for every customer-or-support-detected incident, identify why your monitoring missed it: no signal, threshold too loose, alert routed nowhere, signal existed but was buried in noise.

4. **Find the high-leverage signals** — which new or tuned alerts/synthetics would have flipped the most reactive incidents to proactive, weighted by severity and frequency.

5. **Check the noise trade-off** — ensure proposed detections will not drown on-call in false positives; estimate the precision of each.

6. **Set a target** — a realistic proactive-detection-rate goal and a quarterly plan to reach it.

Output: (a) a detection-source breakdown with proactive rate by severity, (b) a detection-lag ranking, (c) the reactive-miss diagnosis per incident, (d) prioritized new/tuned signals with expected precision, (e) a target and quarterly improvement plan.

Separate conclusions backed by the incident data from hypotheses needing more evidence.

Free: the DevOps AI Incident-Triage Cheat Sheet