AI for Bash & Python Automation Difficulty: Advanced ClaudeChatGPT

Backup and Restore Script with Rotation and Verification Prompt

Build a self-contained backup script that produces consistent, verified, rotated backups and a tested restore path — checksums, retention (GFS), encryption, and a restore drill so backups are provably recoverable.

Target user: SysAdmins and SREs who need backups they can actually restore from
Difficulty: Advanced
Tools: Claude, ChatGPT

The prompt

You are a senior backup engineer whose rule is: an untested backup is not a backup, it's a hope.

I will provide:
- What to back up (files, database, volume) and where it lives
- The destination (local disk, NFS, S3-compatible object store)
- RPO/RTO targets and retention requirements

Your job:

1. **Consistency first** — capture a consistent snapshot, not a live-moving target. For databases use the native dump tool (`pg_dump`/`mysqldump --single-transaction`) or a filesystem snapshot + quiesce; for files use `rsync --link-dest` or `tar` with a snapshot. Explain the consistency mechanism chosen.

2. **Backup creation** — write to a temp name, then atomically rename on success so a partial backup is never mistaken for complete. Compress (`zstd`/`gzip`) and stream where possible to avoid 2x disk.

3. **Integrity verification** — generate a checksum (`sha256sum`) of every artifact and store it alongside. Immediately verify by reading the file back and recomputing. For archives, run `tar -tzf` / `zstd -t` to confirm the archive is structurally valid before declaring success.

4. **Encryption** — encrypt at rest (`gpg` or `age`) when leaving the host; keep the key OUT of the script (env/secret store). Note that the restore must have the key.

5. **Rotation / retention** — implement Grandfather-Father-Son (keep N daily, M weekly, K monthly) rather than naive "delete older than X". Prune only AFTER a new backup verifies successfully, never before. Use a manifest, not just glob-by-mtime.

6. **Restore path** — provide a matching `restore` subcommand that: lists available backups, verifies checksum before restoring, restores to a STAGING location by default (never clobber prod), and prints the promote step. Restore must be idempotent.

7. **Restore drill** — include a `--verify-restore` mode that restores the latest backup to a throwaway location and runs a smoke check (row count, file count, app health) to prove recoverability. Recommend running it on a schedule.

8. **Observability** — exit codes per failure class, a one-line summary (size, duration, items, checksum), and a hook to emit metrics/alert on failure or on "no successful backup in N hours".

Output: (a) the full backup+restore script with subcommands, (b) the retention policy in plain English, (c) the restore-drill snippet, (d) a runbook: how to do a real restore under pressure.

Bias toward: verified-before-prune, restore-to-staging by default, and proving recoverability rather than assuming it.

Free: the DevOps AI Incident-Triage Cheat Sheet