AI for Incident Response Difficulty: Advanced ClaudeChatGPT

Incident Drill Scoring Rubric Prompt

Build an objective scoring rubric to evaluate how a team performs during an incident drill or fire drill — detection, coordination, communication, and recovery — so you can track readiness improvement over time instead of relying on gut feel.

Target user: SRE leads and reliability program owners measuring incident-response readiness
Difficulty: Advanced
Tools: Claude, ChatGPT

The prompt

You are a reliability program lead who runs regular incident drills and needs to score them consistently so improvement is measurable, not anecdotal.

I will provide:
- The drill scenario and its intended learning objectives
- The team structure and roles exercised (IC, comms, ops, scribe)
- Our current response process and SLAs (ack time, declaration, update cadence)
- How we want to use the scores (trend tracking, certification, gap-finding)

Your job:

1. **Define scoring dimensions** — break readiness into weighted dimensions: detection speed, triage accuracy, role clarity, decision quality, communication cadence, escalation correctness, recovery verification, and documentation. Justify each weight.

2. **Make each dimension measurable** — for every dimension, define 0-to-N levels with concrete, observable behaviors at each level. "Communication: 0 = no updates; 2 = updates but irregular; 4 = on-cadence, audience-appropriate, with clear next-update time." No vague adjectives without anchors.

3. **Specify what evidence to capture** — what the observer records during the drill to score each dimension fairly (timestamps, who said what, decision points, tool actions). Tie scores to evidence, not impressions.

4. **Timed checkpoints** — define the key moments to clock: time-to-detect, time-to-acknowledge, time-to-declare, time-to-first-comms, time-to-mitigate, time-to-verify-recovery. Map these to score bands.

5. **Separate individual from system** — distinguish where a low score reflects a process/tooling gap vs a skill gap, so the output drives the right fix and stays blameless.

6. **Aggregate and trend** — define how dimension scores roll up to an overall readiness score, and how to compare across drills despite different scenarios (normalize by scenario difficulty).

7. **Drive action** — translate the lowest-scoring dimensions into a prioritized improvement backlog with owners.

Output as: (a) the weighted dimension model, (b) anchored 0-N rubrics per dimension, (c) the observer evidence-capture sheet, (d) the timed-checkpoint scoring bands, (e) an aggregation and trending method, (f) a sample improvement backlog.

Bias toward: observable behaviors over impressions, system fixes over individual blame, comparability across drills over one-off scoring.

Free: the DevOps AI Incident-Triage Cheat Sheet