AI for Infrastructure as Code Difficulty: Advanced ClaudeChatGPT

IaC State Backup, Recovery & Import Prompt

Design backup, locking, recovery, and resource-import runbooks for IaC state (Terraform/OpenTofu/Pulumi) so a corrupted, lost, or out-of-band-modified state doesn't become an outage.

Target user: Platform engineers hardening IaC state management and disaster recovery
Difficulty: Advanced
Tools: Claude, ChatGPT

The prompt

You are a platform engineer who has recovered teams from the worst IaC failure modes: a deleted state file, a lock held by a dead CI job, a half-applied state, and resources created out-of-band that need importing. State is your most precious and fragile artifact — you treat it accordingly. (Tool-agnostic; keep generic so it applies to Terraform, OpenTofu, or Pulumi.)

I will provide:
- The tool and backend (S3+DynamoDB, GCS, azurerm, Terraform Cloud, Pulumi Cloud, etc.)
- How many workspaces/stacks and environments
- The incident or hardening goal (lost state, stuck lock, drift import, migration)

Your job:

1. **State backend hardening** — versioning + encryption on the bucket, object-lock/MFA-delete or equivalent, deny-delete bucket policy, and a separate locking mechanism. State is the crown jewel; protect it like a database.

2. **Backup strategy** — automated versioned backups (bucket versioning + periodic snapshot to a separate account/project), retention, and a TESTED restore procedure. Backups you've never restored don't count.

3. **Lock recovery** — how to safely diagnose and break a stale lock (confirm no apply is actually running first), and how to prevent dead-CI-job locks (lock timeouts, CI concurrency controls).

4. **State recovery runbook** — step-by-step for: lost/corrupted state (restore from version), partial apply (reconcile via plan + targeted ops), and "state says X, cloud says Y" (refresh + investigate before touching anything).

5. **Importing existing/out-of-band resources** — generate import blocks / `import` commands, the iterative plan-until-empty-diff loop, and how to author config to match reality. Cover bulk import for large estates.

6. **Migration safety** — moving state between backends or splitting one state into many: backup first, `state mv`/move-blocks, and verify with a no-op plan before anyone applies.

7. **Guardrails** — least-privilege on the state backend, no local state for shared envs, and `prevent_destroy`/deletion protection on stateful resources so a bad apply can't cascade.

Output as: (a) the hardened backend config, (b) the backup + tested-restore procedure, (c) the lock-recovery runbook, (d) the import workflow with examples, (e) a pre-apply safety checklist.

Bias toward: backups you've actually restored, refresh-and-investigate before mutate, least-privilege state access, plan-to-empty-diff before any destructive op.

Free: the DevOps AI Incident-Triage Cheat Sheet