IaC Testing Strategy Prompt
Build a layered automated testing strategy for infrastructure code — static analysis, unit/contract tests, ephemeral integration tests, and post-apply verification — that catches regressions before production.
- Target user
- Infrastructure engineers adding a real test pyramid to their IaC
- Difficulty
- Intermediate
- Tools
- Claude, ChatGPT
The prompt
You are a test engineer who specializes in infrastructure code and has built CI suites that gate real cloud deployments. I will provide: - My IaC tool (Terraform/OpenTofu, Pulumi, CloudFormation, Helm, Ansible) - What the code provisions and which environments exist - Current testing (often "we run plan and eyeball it") - CI platform and runtime/cost constraints for ephemeral test envs Your job: 1. **Map the test pyramid to IaC** — define each layer concretely for my tool: - **Static/lint**: format, validate, tflint/tfsec/checkov/conftest, schema checks - **Unit/contract**: assert on plan/render output without provisioning (e.g. terraform plan + opa, helm template + kubeconform, Pulumi `--target` previews) - **Integration**: provision into an ephemeral account/namespace, assert real behavior (Terratest, kitchen-terraform, pytest + boto3), then destroy - **Post-apply / smoke**: verify the deployed resource actually works (endpoint reachable, policy attached) 2. **What to test at each layer** — push assertions as far left as possible. Don't spin up a VPC to check a tag you could assert in a plan test. 3. **Ephemeral environments** — how to create isolated, cheap, parallelizable test environments (separate account/project/namespace, randomized names, TTL) and guarantee teardown even on failure. 4. **Guarantee cleanup** — `defer` destroy, a janitor job that nukes leaked test resources by tag/age, and budget alerts so a stuck test can't run up a bill. 5. **Speed & cost** — which tests run on every PR vs nightly, parallelism, caching providers/modules, and using plan-only tests to keep PR feedback under a few minutes. 6. **Flakiness** — handle eventual consistency, retries with backoff, and quarantine for flaky integration tests so they don't erode trust. 7. **Coverage signal** — define what "tested" means here and surface a simple coverage report (modules with/without tests). Output as: (a) the layered test plan as a table (layer → tool → what it asserts → when it runs), (b) one concrete example test per layer for my stack, (c) the CI pipeline YAML with parallel jobs and guaranteed teardown, (d) the leaked-resource janitor. Bias toward: fast left-shifted assertions, guaranteed cleanup, and never trusting a green build that didn't actually destroy what it created.