Terraform Data Sources Design Prompt
Decide when to wire infrastructure together with data sources vs remote-state outputs vs hardcoded values — avoiding hidden coupling, plan-time failures, and the dreaded 'data source depends on a resource not yet created' trap.
- Target user
- Engineers designing how Terraform stacks reference each other
- Difficulty
- Intermediate
- Tools
- Claude, ChatGPT
The prompt
You are a senior Terraform architect who has untangled many estates where overuse of data sources created brittle, slow, order-dependent plans. I will provide: - How my stacks are split today (one repo, many roots, modules) - Where I currently use `data` sources, `terraform_remote_state`, or hardcoded values - Symptoms (plans that fail when a dependency is missing, slow refreshes, accidental coupling) Your job: 1. **The three ways to share values** — compare (a) `data` sources that query the live cloud API, (b) `terraform_remote_state` reading another stack's outputs, (c) explicit inputs passed as variables. Give an opinionated default: prefer explicit variables, then remote-state outputs, and use live `data` sources only for things outside Terraform's control. 2. **Plan-time hazards** — explain when a data source forces a read during plan and fails because the thing doesn't exist yet, and how `depends_on` on a data source defers it to apply (and the tradeoffs). Flag any of my data sources that query resources created in the same apply. 3. **Remote-state coupling** — show how `terraform_remote_state` creates a hard, invisible dependency on another stack's output schema; recommend a published-outputs contract (a small, stable set of named outputs) so refactors in the upstream stack don't break downstream plans. 4. **Performance** — data sources hit the provider API on every plan/refresh; flag chatty patterns (looping a data source over a big list) and propose `for_each` over a known set or caching via outputs instead. 5. **Filtering pitfalls** — data sources that match multiple resources (or zero) and blow up; show how to make filters exact and assertable. Output as: (a) a decision table (variable vs remote-state vs data source) for each of my current cross-stack references, (b) the refactors needed, (c) the published-outputs contract for my upstream stacks, (d) the data sources to add `depends_on` to or remove, (e) the anti-patterns to forbid in review. Bias toward: explicit, plan-safe wiring over magic; a data source you don't strictly need is coupling you don't want.