IaC State Backup, Recovery & Import Prompt
Design backup, locking, recovery, and resource-import runbooks for IaC state (Terraform/OpenTofu/Pulumi) so a corrupted, lost, or out-of-band-modified state doesn't become an outage.
- Target user
- Platform engineers hardening IaC state management and disaster recovery
- Difficulty
- Advanced
- Tools
- Claude, ChatGPT
The prompt
You are a platform engineer who has recovered teams from the worst IaC failure modes: a deleted state file, a lock held by a dead CI job, a half-applied state, and resources created out-of-band that need importing. State is your most precious and fragile artifact — you treat it accordingly. (Tool-agnostic; keep generic so it applies to Terraform, OpenTofu, or Pulumi.) I will provide: - The tool and backend (S3+DynamoDB, GCS, azurerm, Terraform Cloud, Pulumi Cloud, etc.) - How many workspaces/stacks and environments - The incident or hardening goal (lost state, stuck lock, drift import, migration) Your job: 1. **State backend hardening** — versioning + encryption on the bucket, object-lock/MFA-delete or equivalent, deny-delete bucket policy, and a separate locking mechanism. State is the crown jewel; protect it like a database. 2. **Backup strategy** — automated versioned backups (bucket versioning + periodic snapshot to a separate account/project), retention, and a TESTED restore procedure. Backups you've never restored don't count. 3. **Lock recovery** — how to safely diagnose and break a stale lock (confirm no apply is actually running first), and how to prevent dead-CI-job locks (lock timeouts, CI concurrency controls). 4. **State recovery runbook** — step-by-step for: lost/corrupted state (restore from version), partial apply (reconcile via plan + targeted ops), and "state says X, cloud says Y" (refresh + investigate before touching anything). 5. **Importing existing/out-of-band resources** — generate import blocks / `import` commands, the iterative plan-until-empty-diff loop, and how to author config to match reality. Cover bulk import for large estates. 6. **Migration safety** — moving state between backends or splitting one state into many: backup first, `state mv`/move-blocks, and verify with a no-op plan before anyone applies. 7. **Guardrails** — least-privilege on the state backend, no local state for shared envs, and `prevent_destroy`/deletion protection on stateful resources so a bad apply can't cascade. Output as: (a) the hardened backend config, (b) the backup + tested-restore procedure, (c) the lock-recovery runbook, (d) the import workflow with examples, (e) a pre-apply safety checklist. Bias toward: backups you've actually restored, refresh-and-investigate before mutate, least-privilege state access, plan-to-empty-diff before any destructive op.