Skip to content
CloudOps
Newsletter Sign up
All prompts
AI for Slack Difficulty: Intermediate ClaudeChatGPT

Slack-Based On-Call Training & Game Day Simulation Prompt

Run game-day exercises and on-call training drills via Slack — injected alerts, scripted scenarios, blast-radius-controlled chaos, scoring, and post-exercise debrief.

Target user
SRE leads and incident commanders running structured training programs
Difficulty
Intermediate
Tools
Claude, ChatGPT

The prompt

You are a senior incident commander who has run quarterly game day exercises that measurably improved team MTTR through realistic Slack-based simulation.

I will provide:
- Team experience mix (junior / senior)
- Existing simulation tooling (Gremlin / LitmusChaos / nothing yet)
- Real services to simulate against (or staging)
- Existing real on-call patterns
- Pain points (untested runbooks, slow ramp for new engineers)

Your job:

1. **Exercise types**:
   - **Tabletop** — discussion-based; bot posts a scenario; team talks through response
   - **Light simulation** — bot injects fake alerts in a simulation channel; team responds without touching real systems
   - **Full game day** — controlled chaos against staging or carefully-scoped prod component; real responses
   - **Surprise drill** — random short scenario; tests whether on-call is alert

2. **Scenario catalog** — build a library:
   - **Standard scenarios** — failed deploy rollback, DB read replica lag, certificate expiry, region failover, runaway query, cache poison, dependency outage
   - **Stack-specific** — Kubernetes pod eviction storm, ArgoCD sync conflict, Prometheus query overload
   - **Recent-real** — turn last quarter's real incidents into training scenarios

3. **Scenario format** — each:
   - **Setup** — fictional service + observed symptoms
   - **Injection sequence** — alerts to post, log lines to share, customer reports
   - **Expected response** — what good looks like (diagnostic commands, decisions, escalations)
   - **Scoring rubric** — points for time-to-acknowledge, time-to-mitigate, blameless conduct, communication quality
   - **Twists** — optional complications (red herring, late-discovery context, dependency issue)

4. **Bot orchestration**:
   - `/drill schedule <scenario> [participants] [time]` — schedule
   - `/drill start <scenario>` — begin now in a dedicated `#drill-<id>` channel
   - Bot posts alerts at scripted intervals (T+0, T+5m, T+12m...)
   - Bot reacts to participant messages (acknowledges fictional commands they "ran")
   - `/drill pause` / `/drill resume` for instructor control

5. **Realism cues**:
   - Use real alert formatting (same Block Kit as production alerts)
   - Realistic log lines (anonymized real logs from past incidents)
   - Time pressure (T+30m the scenario escalates)
   - Stakeholder pressure (bot impersonates a "Customer Success" Slack message)

6. **Safety boundaries**:
   - Clear `[DRILL]` markers on every message
   - Simulation channel only; never inject into real prod channels
   - Bot identifies itself; never let participants think it's real
   - Real on-call rotation aware (separate roster for drills)
   - End-of-drill explicit "DRILL OVER" announcement

7. **Scoring**:
   - Team self-assesses against rubric
   - Optional: instructor scores
   - Compare to prior drills (track improvement)
   - Individual scores are private; team scores public

8. **Debrief structure**:
   - Immediate (30 min after): what went well / what struggled
   - 24h after: written summary + action items
   - Action items routed to runbook updates, training topics, tooling improvements

9. **New-hire ramp**:
   - First 30 days: shadow drills (observe, no participation)
   - Days 30-60: participate as junior role (not IC)
   - Days 60-90: lead a tabletop
   - Days 90+: full on-call shadow → primary

10. **Anti-patterns to avoid**:
   - Drills marketed as "we're testing you" → defensive culture, no learning
   - No psychological safety → people don't reveal what they don't know
   - Same scenarios repeated → drill-savvy not on-call-ready
   - No debrief / no action items → drills become theater
   - Real prod chaos that becomes a real incident

11. **Cadence**:
   - Weekly tabletop (lightweight, 30 min)
   - Monthly light simulation (90 min)
   - Quarterly full game day (4 hr)
   - Surprise drill: every other week, 15-30 min

Output as: (a) exercise type matrix, (b) scenario template + 3 example scenarios, (c) bot orchestration commands, (d) realism + safety boundary rules, (e) scoring rubric, (f) debrief structure, (g) new-hire ramp plan, (h) cadence schedule.

Bias toward: psychological safety, realistic-but-safe simulation, every drill drives a real improvement, measurable progression over time.
Newsletter

Free: the DevOps AI Incident-Triage Cheat Sheet

Subscribe and we’ll send you the one-page cheat sheet — plus weekly AI prompts, automation ideas, and tool reviews for infrastructure engineers. One email a week. No spam, unsubscribe anytime.

  • AI Incident-Triage Cheat Sheet (PDF)
  • Access to 600+ DevOps AI prompts
  • One practical workflow email per week