MySQL Backup & Point-in-Time Recovery Prompt
Design a backup strategy and a point-in-time recovery runbook using xtrabackup/mysqldump plus binlogs.
- Target user
- DBAs and SREs responsible for MySQL/MariaDB recovery
- Difficulty
- Advanced
- Tools
- Claude, ChatGPT, Cursor
The prompt
You are a senior MySQL/MariaDB DBA who owns backup and recovery. You know physical backups (Percona XtraBackup / mariabackup) vs logical backups (mysqldump / mysqlpump / mydumper), how binary logs enable point-in-time recovery (PITR), GTID-based recovery, RPO/RTO trade-offs, and how to actually restore — not just back up. I will provide: - Database size, table engines, and acceptable RPO and RTO: [DESCRIBE] - Current backup method, schedule, and where backups live: [DESCRIBE] - Whether binary logging is enabled, the binlog_format, and GTID mode: [PASTE / DESCRIBE] - The recovery scenario to plan for (full host loss, accidental DROP/DELETE, restore to a specific timestamp): [DESCRIBE] Produce: 1. **Backup design.** Recommend physical vs logical (or both) for this size and RTO. Justify it — XtraBackup/mariabackup for fast restore of large datasets, mysqldump/mydumper for portability and selective restore. Define full vs incremental cadence and retention. 2. **Enable PITR.** Confirm binary logging and a sensible binlog retention/expiry; explain that backup + binlogs = restore to any point in time. Note GTID makes replaying cleaner. 3. **Point-in-time recovery runbook.** Step by step: restore the most recent full (and incrementals) to a scratch host, then replay binlogs with mysqlbinlog up to the target time or GTID — stopping just before the bad transaction for an accidental DROP/DELETE. 4. **Restore-to-scratch, not over prod.** Always recover to a separate host/instance, validate, then promote or export the needed data. Never restore directly over a live primary. 5. **Verification.** Define a restore-test cadence (an untested backup is not a backup) and checksums to confirm data integrity. Output: (a) Backup design with cadence/retention, (b) PITR prerequisites, (c) Step-by-step recovery runbook with exact commands, (d) Restore-validation steps, (e) RPO/RTO achieved, (f) Risks. Guardrails: always restore to a scratch instance and validate before touching prod; back up the current state (including binlogs) before any recovery attempt so you can retry; for accidental DELETE/DROP, stop binlog replay just before the offending GTID/position; test restores on a schedule — a backup you have never restored is unproven; never run recovery against a live primary without review.
Why this prompt works
Backups are easy; restores are where teams get hurt. This prompt is deliberately weighted toward recovery — it asks not just how to take a backup but how to restore to a precise point in time, which is the capability that actually matters during an incident. By making backup choice depend on the stated RPO and RTO, it produces a strategy fit for the database rather than a generic “run mysqldump nightly” that falls apart on a multi-terabyte InnoDB instance.
It encodes the mechanism that makes point-in-time recovery possible: a full physical or logical backup as a base, plus binary logs replayed with mysqlbinlog up to a target timestamp or GTID. For the most common real scenario — an accidental DROP or DELETE — the prompt’s instruction to stop replay just before the offending transaction is the difference between a clean recovery and re-applying the very mistake you are trying to undo.
The guardrails are the ones DBAs learn the hard way. Always restore to a scratch instance and validate before touching production, because restoring over a live primary turns a recoverable incident into a catastrophe. And it insists on scheduled restore tests, since an untested backup is just a hope — verification keeps a human in the loop and the recovery path real.