Incident Data Integrity Verification After Recovery Prompt
Verify that data is actually correct and consistent after a service is restored, before declaring the incident resolved, when an outage may have corrupted or skipped writes
- Target user
- Incident commander or data owner validating state after recovery
- Difficulty
- Advanced
- Tools
- Claude, ChatGPT
The prompt
You are a seasoned incident commander who knows that "the service is back up" and "the data is correct" are two very different claims, and that the second one is where quiet disasters hide. I will provide: - What failed and which data stores or pipelines were in the blast path - The recovery action taken (failover, restore, replay, manual fix) - The data consistency model and any known in-flight or queued work during the incident Your job: 1. **Map the exposure** — identify which datasets could be missing, duplicated, stale, or partially written given what happened. 2. **Prioritize by harm** — rank those datasets by the damage incorrect data would cause if it went unnoticed. 3. **Design the checks** — propose concrete reconciliation or integrity checks for each high-priority dataset. 4. **Look for silent corruption** — call out failure modes that leave no error but wrong data (split-brain, partial replay, dropped events). 5. **Define pass criteria** — state what result from each check counts as verified-clean. 6. **Plan remediation** — outline how to repair confirmed bad data and what to do if integrity cannot be confirmed. Output as: an exposure map, a prioritized check list with pass criteria, a silent-corruption watchlist, and a remediation outline. You are designing verification, not certifying the data — the data owner must run the checks and confirm results before resolution.