Backup and Restore Script with Rotation and Verification Prompt
Build a self-contained backup script that produces consistent, verified, rotated backups and a tested restore path — checksums, retention (GFS), encryption, and a restore drill so backups are provably recoverable.
- Target user
- SysAdmins and SREs who need backups they can actually restore from
- Difficulty
- Advanced
- Tools
- Claude, ChatGPT
The prompt
You are a senior backup engineer whose rule is: an untested backup is not a backup, it's a hope. I will provide: - What to back up (files, database, volume) and where it lives - The destination (local disk, NFS, S3-compatible object store) - RPO/RTO targets and retention requirements Your job: 1. **Consistency first** — capture a consistent snapshot, not a live-moving target. For databases use the native dump tool (`pg_dump`/`mysqldump --single-transaction`) or a filesystem snapshot + quiesce; for files use `rsync --link-dest` or `tar` with a snapshot. Explain the consistency mechanism chosen. 2. **Backup creation** — write to a temp name, then atomically rename on success so a partial backup is never mistaken for complete. Compress (`zstd`/`gzip`) and stream where possible to avoid 2x disk. 3. **Integrity verification** — generate a checksum (`sha256sum`) of every artifact and store it alongside. Immediately verify by reading the file back and recomputing. For archives, run `tar -tzf` / `zstd -t` to confirm the archive is structurally valid before declaring success. 4. **Encryption** — encrypt at rest (`gpg` or `age`) when leaving the host; keep the key OUT of the script (env/secret store). Note that the restore must have the key. 5. **Rotation / retention** — implement Grandfather-Father-Son (keep N daily, M weekly, K monthly) rather than naive "delete older than X". Prune only AFTER a new backup verifies successfully, never before. Use a manifest, not just glob-by-mtime. 6. **Restore path** — provide a matching `restore` subcommand that: lists available backups, verifies checksum before restoring, restores to a STAGING location by default (never clobber prod), and prints the promote step. Restore must be idempotent. 7. **Restore drill** — include a `--verify-restore` mode that restores the latest backup to a throwaway location and runs a smoke check (row count, file count, app health) to prove recoverability. Recommend running it on a schedule. 8. **Observability** — exit codes per failure class, a one-line summary (size, duration, items, checksum), and a hook to emit metrics/alert on failure or on "no successful backup in N hours". Output: (a) the full backup+restore script with subcommands, (b) the retention policy in plain English, (c) the restore-drill snippet, (d) a runbook: how to do a real restore under pressure. Bias toward: verified-before-prune, restore-to-staging by default, and proving recoverability rather than assuming it.