Right-Sizing Terraform-Managed Resources With AI From Real

The first time I actually looked at utilization on our Terraform-managed fleet, I found a tier of m6i.2xlarge instances running at 8% CPU because someone sized them for a launch that happened two years ago. The HCL said instance_type = "m6i.2xlarge" and nobody had reason to question it. Multiply that by a few dozen resources and you’re burning real money on capacity nobody uses.

Right-sizing is a great AI task because it’s pattern-matching over numbers: given utilization metrics and a current size, suggest a better one. The model is a fast junior engineer who’ll happily crunch the data and draft the HCL change. The boundaries are firm: it reads metrics and current config, it suggests, a human reviews, and the change ships through the normal plan/review/apply gate. The AI never applies, never holds cloud credentials, and never writes state.

Right-sizing without metrics is guessing

The number-one way to do this wrong is to ask an AI “is this instance too big?” with no data. It’ll guess, and its guess is worthless. Right-sizing is only valid against real utilization. So step one is gathering metrics — separately, by a human or a read-only system — before any model is involved.

Pull the relevant stats for the resources in question. For an AWS instance over the last two weeks:

aws cloudwatch get-metric-statistics \
  --namespace AWS/EC2 --metric-name CPUUtilization \
  --dimensions Name=InstanceId,Value=i-0abc123 \
  --start-time "$(date -u -d '14 days ago' +%FT%TZ)" \
  --end-time "$(date -u +%FT%TZ)" \
  --period 3600 --statistics Average Maximum \
  > cpu.json

Do the same for memory (if you collect it) and any disk or network metric that bounds the workload. This data is the only thing that makes the AI’s suggestion meaningful.

Pair metrics with the Terraform resource

The model needs both halves: what’s deployed and how it’s actually used. Give it the resource block and the summarized metrics together:

resource "aws_instance" "api" {
  instance_type = "m6i.2xlarge"   # 8 vCPU, 32 GiB
  # ...
}

“This instance runs m6i.2xlarge. Over 14 days, average CPU is 8%, peak 23%; average memory 31%, peak 44%. Suggest a right-sized instance_type from the same family that leaves comfortable headroom (peak under ~70% on the new size). Show the HCL change and the approximate monthly cost difference. If the metrics are too sparse to be confident, say so instead of guessing.”

That last clause matters — two weeks of a spiky batch workload isn’t enough to downsize, and a good prompt makes the model admit when the data is thin rather than confidently shrink a resource into instability.

Pro Tip: Always size for the peak, not the average. An instance averaging 8% but peaking to 90% during a nightly job will fall over if you size to the average. Make the model justify its suggestion against the Maximum statistic, and check that it did.

Treat the suggestion as a PR, not a command

The model’s output is a proposed HCL change. It enters your normal workflow with no special privileges:

# After applying the AI's suggested edit to the .tf file:
terraform plan

A right-size of an instance type typically plans as an in-place update or a replacement — and which matters a lot. Replacing a stateless API box is fine; replacing something with local state or an attached ephemeral disk is not. Read the plan. If it’s a replacement, you decide whether the downtime and risk are worth the savings. The AI suggested the size; the plan tells you the cost of getting there; you make the call.

Cross-check the cost claim

Models are decent at relative cost reasoning and unreliable on exact dollar figures. Don’t trust the number it prints — verify it with a tool built for it:

# Infracost on the branch with the right-sizing change
infracost diff --path . --format table

infracost gives you the authoritative delta. Use the AI to find the over-provisioned resource and propose the new size; use deterministic tooling to confirm the savings are real. This split — AI for the judgment-y suggestion, tools for the precise numbers — keeps you from shipping a “30% cheaper” change that’s actually 4%.

Right-size storage and concurrency, not just compute

Compute gets all the attention, but the same metrics-then-suggest pattern applies to over-provisioned storage, IOPS, Lambda memory, and database instance classes — often with bigger wins. A 1 TB gp3 volume sitting 6% full, or a Lambda with 2 GB of memory that uses 180 MB, is the same waste in a different shape. Give the model the provisioned value and the actual usage:

“This aws_lambda_function has memory_size = 2048. Over 30 days, max memory used was 184 MB and p99 duration is well under the timeout. Suggest a memory_size that leaves headroom, and note that Lambda CPU scales with memory so flag if the lower memory might slow it down.”

That CPU-scales-with-memory caveat is exactly the provider-specific nuance to make the model account for — drop Lambda memory too far and you slow execution, which can cost more on duration. The model knows this when prompted; verify the resulting change against your latency expectations, not just the cost number.

Batch the analysis, apply one at a time

It’s efficient to have the AI analyze the whole fleet’s metrics in one pass and produce a ranked list of right-sizing opportunities — biggest savings first, with the risk (in-place vs. replacement) noted per item. That’s a great way to find the wins. But applying them is the opposite: one resource per PR, each through its own plan and review.

The reason is the same as every other AI-IaC workflow: a batched analysis is cheap to be wrong about because nothing happens from it, while a batched apply turns one bad suggestion into a multi-resource incident. Let the model survey broadly and act narrowly. The ranked list tells you where to spend effort; the per-resource gate keeps any single mistake contained and recoverable.

Keep it observe-only on the AI’s side

The architecture mirrors every other safe AI-IaC pattern:

Metrics are gathered by a human or a read-only collector. The model receives summarized numbers, never live cloud access.
The AI reads metrics and HCL and emits a proposed diff. It has no backend, no state, no credentials, and no ability to apply.
Every suggestion goes through plan and human review before apply. Right-sizing that triggers a replacement gets extra scrutiny.

I pull these metric summaries from whatever observability I already run — the monitoring alerts dashboard is where I notice the chronically idle resources worth investigating in the first place. The right-sizing prompts live in the prompt library, and the Terraform prompt pack includes a metrics-to-HCL right-sizing prompt with the peak-headroom and sparse-data rules baked in.

Conclusion

Over-provisioned Terraform resources are quiet, persistent waste, and AI is a strong tool for finding them — provided you feed it real utilization data and treat its output as a proposal. Gather metrics first, let the model suggest a size and rough savings, confirm the dollars with infracost, and run the change through your normal plan-and-review gate. The AI does the analysis and drafts the HCL; humans verify the savings and own the apply; and the model never gets near your credentials. That’s how you reclaim the budget without trading it for an outage. More cost guides are in the Terraform category.

Right-Sizing Terraform-Managed Resources With AI From Real Metrics