Writing Terraform Data Source Queries With AI Instead of

There’s a particular smell in Terraform that always means trouble later: a hardcoded ami = "ami-0abc123def" sitting in a resource block. It works today. Six months from now the AMI is deprecated, or you copy the module to another region where that ID means nothing, and suddenly your “infrastructure as code” is infrastructure as a brittle string literal. The same rot hits hardcoded subnet IDs, security group ARNs, hosted zone IDs — anything that should be looked up but got pasted instead.

The fix is data sources, and writing them is fiddly: every provider has its own filter syntax and quirks, and the docs are a tab you keep forgetting to open. This is squarely an AI sweet spot. The model is a fast junior engineer who’s read the provider docs a thousand times and can draft the right data block in seconds. As always, it drafts and you verify — and the proof is a plan that resolves to the right value. It never applies, never holds credentials, never writes state.

Why hardcoded IDs are technical debt

Three failure modes, all avoidable:

They rot. AMIs get deprecated, resources get replaced, and the literal ID stops pointing at anything valid.
They don’t travel. A subnet ID is region- and account-specific. Copy the module elsewhere and it’s wrong.
They hide intent. ami-0abc123 tells a reader nothing. “Latest Ubuntu 22.04 LTS from Canonical” tells them everything.

A data source fixes all three: it expresses what you want and lets Terraform resolve which one at plan time.

Hand the AI the hardcoded value and the intent

The model can’t guess what an opaque ID means, so tell it. Give it the literal value and what it should represent:

“This resource hardcodes ami = \"ami-0abc123def\", which is the latest Ubuntu 22.04 LTS AMI from Canonical in us-east-1. Replace it with an aws_ami data source that looks this up dynamically using the correct owner and name filters, so it stays current and works across regions. Use the standard Canonical owner ID. Add a comment explaining the filter.”

A clean result:

data "aws_ami" "ubuntu" {
  most_recent = true
  owners      = ["099720109477"]  # Canonical

  filter {
    name   = "name"
    values = ["ubuntu/images/hvm-ssd/ubuntu-jammy-22.04-amd64-server-*"]
  }
  filter {
    name   = "virtualization-type"
    values = ["hvm"]
  }
}

resource "aws_instance" "app" {
  ami = data.aws_ami.ubuntu.id   # was a hardcoded literal
}

The model reliably knows Canonical’s owner ID and the AMI name pattern — that’s exactly the kind of well-documented detail it’s good at recalling.

Verify the lookup resolves to what you expect

A data source that returns the wrong thing is worse than a hardcoded ID, because it changes silently. So the verification is specific: confirm the lookup resolves, and resolves to the value you intended.

terraform plan

A data source is read during plan, so the plan output will show the resolved attributes — or, for a swap-in like this, show no change to the instance because the looked-up AMI matches the one already deployed. That no-change result is your proof the data source returns the same value the literal did. If the plan suddenly wants to replace the instance, the lookup found a different AMI, and you investigate the filter before going anywhere near apply.

Pro Tip: most_recent = true is convenient and occasionally dangerous — it means a new upstream image silently changes your plan. For anything where reproducibility matters, ask the model to filter to a specific version string instead, so the lookup is deterministic.

Watch for the over-eager filter

The common AI mistake here is a filter that’s too loose and matches multiple resources, which makes Terraform error or pick unpredictably. A data "aws_subnet" filtered only by VPC will match every subnet in that VPC. When you review, check that each data source has enough filters to resolve to exactly one result:

data "aws_subnet" "private_a" {
  vpc_id            = var.vpc_id
  availability_zone = "us-east-1a"
  tags = { Tier = "private" }   # narrow it to one
}

If the AI’s draft is under-specified, the plan will tell you — either an error about multiple matches or a resolved value that’s clearly wrong. Tighten the filter, re-plan, repeat. The plan is the feedback loop.

Use data sources to break cross-stack hardcoding

The other place literals breed is across stack boundaries — one configuration hardcodes a VPC ID or subnet ARN that another configuration created. The clean fix is a data source (or a remote state lookup) so the consumer queries the producer instead of copying its output by hand. The AI is good at drafting either form once you tell it where the value comes from:

“This module hardcodes vpc_id = \"vpc-0abc123\", which is created by our networking stack and tagged Name = core-vpc. Replace it with an aws_vpc data source that looks it up by that tag so this module isn’t pinned to a literal ID.”

data "aws_vpc" "core" {
  tags = { Name = "core-vpc" }
}

resource "aws_subnet" "app" {
  vpc_id = data.aws_vpc.core.id   # was vpc-0abc123
}

The verification is identical: terraform plan should resolve the lookup and show no change, proving the data source returns the same VPC the literal named. If your producer stack exposes the value as an output instead, ask the model for a terraform_remote_state data source — but be deliberate about the trade-off, since that couples the consumer to the producer’s state, which not every team wants.

Beware data sources that mask missing dependencies

One real footgun the AI won’t warn you about unless asked: a data source reads at plan time, so if it looks up something created in the same apply, the lookup can fail or return stale data. A data "aws_instance" querying an instance this same configuration creates is a dependency-ordering bug waiting to happen — you want a direct resource reference there, not a lookup. When you review the model’s draft, check that each data source reads something that already exists independently of this run. If it’s reading something this config also creates, push back and use a resource reference instead. The model optimizes for “replace the literal”; you supply the judgment about whether a lookup is even the right tool.

Keep the model in draft-only mode

The boundaries, same as ever:

The AI reads the hardcoded value, your stated intent, and provider docs. It drafts HCL. It has no cloud credentials, no state, no apply.
Every data source swap is verified with terraform plan on a human’s authenticated session — and a no-change plan (or a knowingly-correct change) is the acceptance test.
A human reviews the filters before merge. An under-specified lookup is a silent landmine, and only a person who knows the environment can confirm it resolves to the intended resource.

I keep the hardcoded-to-data-source prompts in the prompt library, and the Terraform prompt pack includes a data-source-conversion prompt with the single-match filtering guidance built in. For the human review of these swaps, the code review dashboard keeps the diff and the plan output next to the approval. Whichever assistant you draft with — Cursor inline or Claude AI in a workspace — the verify step is identical.

Conclusion

Hardcoded IDs are Terraform’s quietest form of debt: they work right up until they don’t, and they don’t travel. AI is excellent at converting them to data source lookups because it knows the provider filter syntax cold — but a lookup that resolves wrong is worse than the literal it replaced, so the verification is non-negotiable. Give the model the value and the intent, let it draft, and trust a clean plan to confirm it resolved correctly. The AI does the fiddly filter-writing; you confirm the result and own the apply; and your modules stop breaking every time an AMI gets deprecated. More HCL guides are in the Terraform category.

Writing Terraform Data Source Queries With AI Instead of Hardcoding IDs