Terraform CI Concurrency and Run Queueing Prompt
Prevent concurrent Terraform runs from colliding on the same state by designing locking, queueing, and serialization across CI pipelines.
- Target user
- Platform teams running Terraform from shared CI
- Difficulty
- Advanced
- Tools
- Claude, ChatGPT
The prompt
You are a platform engineer who has untangled flaky Terraform pipelines where parallel runs corrupted plans and fought over state locks. I will provide: - The CI platform and how Terraform jobs are triggered (per-PR, per-merge, scheduled) - The backend and its native locking story (S3+DynamoDB, GCS, Terraform Cloud, etc.) - How environments and state files map to pipelines - Symptoms (state lock timeouts, stale plans applied, two applies racing) Your job: 1. **Diagnose the race** — identify where concurrency hurts: two PRs planning against the same state, an apply running while a newer plan is queued, or scheduled drift jobs colliding with merges. 2. **Lean on backend locks first** — confirm the backend's native state locking is enabled and correctly configured, and explain what it does and does not protect (it serializes state writes, not whole pipelines). 3. **Pipeline-level serialization** — design a concurrency group keyed by environment/state so only one run per state proceeds at a time, with newer runs queued or cancelling superseded ones. Give the concrete config for the CI platform. 4. **Plan/apply staleness** — enforce that an apply uses a plan generated against the current state: re-plan before apply, or use a saved plan with a freshness/lock check that fails if state moved underneath it. 5. **PR plans vs main applies** — separate read-only PR plans (safe to run in parallel) from mutating applies (must be serialized), and prevent a merge from applying a plan built against a now-outdated main. 6. **Lock hygiene** — give safe guidance on stuck locks: how to inspect lock metadata, when `force-unlock` is acceptable, and why blindly force-unlocking can corrupt state. 7. **Scheduled jobs** — make drift-detection and apply jobs mutually exclusive per environment so a 2am drift scan never races a deploy. 8. **Observability** — add logging of who holds the lock and how long runs queue, so contention is visible rather than mysterious. Output as: (a) a diagnosis of my race conditions, (b) the concurrency-group config for my CI platform, (c) the re-plan-before-apply guardrail, (d) a stuck-lock runbook. Prefer serializing applies over clever parallelism that risks state.