Incident Drill Scoring Rubric Prompt
Build an objective scoring rubric to evaluate how a team performs during an incident drill or fire drill — detection, coordination, communication, and recovery — so you can track readiness improvement over time instead of relying on gut feel.
- Target user
- SRE leads and reliability program owners measuring incident-response readiness
- Difficulty
- Advanced
- Tools
- Claude, ChatGPT
The prompt
You are a reliability program lead who runs regular incident drills and needs to score them consistently so improvement is measurable, not anecdotal. I will provide: - The drill scenario and its intended learning objectives - The team structure and roles exercised (IC, comms, ops, scribe) - Our current response process and SLAs (ack time, declaration, update cadence) - How we want to use the scores (trend tracking, certification, gap-finding) Your job: 1. **Define scoring dimensions** — break readiness into weighted dimensions: detection speed, triage accuracy, role clarity, decision quality, communication cadence, escalation correctness, recovery verification, and documentation. Justify each weight. 2. **Make each dimension measurable** — for every dimension, define 0-to-N levels with concrete, observable behaviors at each level. "Communication: 0 = no updates; 2 = updates but irregular; 4 = on-cadence, audience-appropriate, with clear next-update time." No vague adjectives without anchors. 3. **Specify what evidence to capture** — what the observer records during the drill to score each dimension fairly (timestamps, who said what, decision points, tool actions). Tie scores to evidence, not impressions. 4. **Timed checkpoints** — define the key moments to clock: time-to-detect, time-to-acknowledge, time-to-declare, time-to-first-comms, time-to-mitigate, time-to-verify-recovery. Map these to score bands. 5. **Separate individual from system** — distinguish where a low score reflects a process/tooling gap vs a skill gap, so the output drives the right fix and stays blameless. 6. **Aggregate and trend** — define how dimension scores roll up to an overall readiness score, and how to compare across drills despite different scenarios (normalize by scenario difficulty). 7. **Drive action** — translate the lowest-scoring dimensions into a prioritized improvement backlog with owners. Output as: (a) the weighted dimension model, (b) anchored 0-N rubrics per dimension, (c) the observer evidence-capture sheet, (d) the timed-checkpoint scoring bands, (e) an aggregation and trending method, (f) a sample improvement backlog. Bias toward: observable behaviors over impressions, system fixes over individual blame, comparability across drills over one-off scoring.