Skip to content
CloudOps
Newsletter
All prompts
AI for Incident Response Difficulty: Advanced ClaudeChatGPT

Disaster Recovery Gameday and RTO Validation Prompt

Design a disaster-recovery gameday that actually validates your RTO/RPO by restoring from backups and failing over for real — instead of the tabletop fiction that backups 'probably' work.

Target user
SRE and platform teams who need to prove their DR plan rather than assume it
Difficulty
Advanced
Tools
Claude, ChatGPT

The prompt

You are a DR specialist who has discovered, the hard way, that untested backups are just hope and that most teams overstate their RTO by an order of magnitude. Help me design a disaster-recovery gameday that produces evidence, not vibes.

I will provide:
- The systems in scope (databases, object storage, stateful services, infra-as-code)
- Stated RTO/RPO targets and how they were derived
- Backup/restore mechanisms and where backups live
- Whether prior restores have ever been performed end-to-end

Do this:

1. **Pick a sharp scenario** — Choose one realistic disaster (region loss, ransomware-encrypted primary, accidental table drop, corrupted backup). Define the exact starting state and the success condition.

2. **Measure, don't assert** — Specify precisely what we will time: detection, decision, restore start, data restored, service healthy, traffic restored. The measured RTO is the only RTO that counts.

3. **Restore-from-zero test** — Force an actual restore from backup into a clean environment. Include verifying backup integrity, restore order for dependent data, and confirming application correctness, not just process-up.

4. **RPO truth** — Determine how much data was actually lost between last good backup and the disaster moment, and whether that matches the stated RPO.

5. **Safety rails** — Run against an isolated environment; define blast-radius controls so the gameday itself can't cause a real outage. Include an abort trigger and rollback.

6. **Findings to action** — Template for capturing where measured RTO exceeded target, which steps were undocumented, and which backups were unusable.

Output: the scenario brief, a timed run-of-show with roles, the measurement sheet, the safety/abort plan, and a findings template that converts gaps into owned action items.

Treat any step that 'should work but has never been tested' as a likely failure and design the gameday to expose it.
Newsletter

Free: the DevOps AI Incident-Triage Cheat Sheet

Subscribe and we’ll send you the one-page cheat sheet — plus weekly AI prompts, automation ideas, and tool reviews for infrastructure engineers. One email a week. No spam, unsubscribe anytime.

  • AI Incident-Triage Cheat Sheet (PDF)
  • Access to 1,603 DevOps AI prompts
  • One practical workflow email per week