GCP Cost Optimization With AI: CUDs and Rightsizing

The first GCP bill I was asked to “do something about” was $38k a month, and the finance team’s only question was “why is it going up?” I opened the billing console and was met with a sea of SKUs — N2 Instance Core running in Americas, Network Inter Region Egress, dozens of them — with no obvious story. The hard part of GCP cost work isn’t knowing the levers; it’s that the data is a high-cardinality pile of line items and the savings are buried in patterns no single dashboard surfaces. This is a data-analysis problem, and that’s the shape of problem AI is genuinely strong at — as long as I give it the real billing export and verify the numbers before I commit a dollar.

Get the billing export, not the console summary

The console rounds and rolls up. Export the detailed billing data to BigQuery and pull the raw breakdown so AI reasons over actual numbers:

-- Last 30 days by service and SKU
SELECT
  service.description AS service,
  sku.description AS sku,
  ROUND(SUM(cost), 2) AS cost,
  ROUND(SUM(IFNULL((SELECT SUM(c.amount) FROM UNNEST(credits) c), 0)), 2) AS credits
FROM `my-proj.billing_export.gcp_billing_export_v1_XXXXXX`
WHERE _PARTITIONTIME >= TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 30 DAY)
GROUP BY service, sku
ORDER BY cost DESC
LIMIT 50;

Hand that table to the model:

Prompt: “Here is my GCP billing breakdown by service and SKU for the last 30 days (cost and credits columns). Identify the top cost drivers, flag any SKU that looks like waste (idle, over-provisioned, or egress that should be internal), and separate steady-state spend that’s a candidate for committed-use discounts from variable spend that isn’t. Don’t recommend a CUD for anything spiky.”

The model is good at the narrative the console hides: “70% of your compute is steady N2 in one region — that’s CUD-eligible; your egress spike is cross-region traffic that a placement change would remove.” That framing — committable baseline vs. variable — is the whole game, and it’s tedious to derive by hand from 50 SKUs.

CUDs: commit to the floor, never the peak

The cardinal rule of committed-use discounts is that you commit to your baseline usage, the floor you’ll never go below, and pay on-demand above it. Over-commit and you pay for capacity you don’t use; under-commit and you leave the discount on the table. I have AI find the floor from real hourly usage:

Prompt: “Here is hourly vCPU usage for N2 instances in us-central1 over 30 days (pasted). Find the consistent baseline — the level usage essentially never drops below. Recommend a committed-use commitment at or just under that floor, and estimate monthly savings at the 1-year CUD discount rate. Show your reasoning so I can check the floor myself.”

I always recompute the floor myself before buying — a CUD is a binding multi-year financial commitment, and an AI that misreads a daily dip can cost real money. The model proposes; I confirm the floor against the raw data and own the purchase.

# After confirming the floor, purchase the commitment
gcloud compute commitments create n2-baseline-1yr \
  --region=us-central1 \
  --resources=vcpu=40,memory=160 \
  --plan=twelve-month \
  --type=general-purpose-n2

Rightsizing: trust GCP’s recommender, then sanity-check

GCP’s own recommender watches actual utilization and proposes machine-type changes. Pull them and let AI prioritize by impact:

gcloud recommender recommendations list \
  --project=my-proj --location=us-central1-a \
  --recommender=google.compute.instance.MachineTypeRecommender \
  --format=json

Prompt: “Here are GCP machine-type rightsizing recommendations (JSON). Rank them by monthly savings. For each, tell me the current and proposed machine type and the observed CPU/memory headroom. Flag any instance where the proposed size leaves under 20% memory headroom — I don’t want to rightsize into OOM kills.”

That memory-headroom guardrail is mine to insist on. The recommender optimizes for utilization; it doesn’t know which workloads spike. AI applies my guardrail across the whole list so I’m not reviewing each instance for the same risk.

Find the silent waste

The biggest wins are often things nobody is watching: unattached persistent disks, idle external IPs, forgotten static IPs, oversized snapshots. I enumerate and triage:

gcloud compute disks list --filter="-users:*" \
  --format="table(name, zone, sizeGb, type)"
gcloud compute addresses list --filter="status=RESERVED" \
  --format="table(name, region, addressType)"

Prompt: “Here are unattached GCP disks and reserved-but-unused IP addresses. Estimate the monthly cost of each at standard pricing, total it, and give me the gcloud delete commands — but mark any disk over 500GB or named like a backup for manual review before deletion.”

The “mark backups for manual review” clause keeps the model from generating a delete command for something I actually need. It drafts the cleanup; I read the list before anything is destroyed.

Catch regressions automatically

Once it’s clean, the goal is to keep it clean. I have AI write the anomaly query I run weekly:

Prompt: “Write a BigQuery query against my billing export that compares this week’s spend per service to the prior week and flags any service that grew more than 25%. Output service, last-week cost, this-week cost, and percent change.”

The discipline that makes this safe

Cost work with AI is powerful because the bill is fundamentally a dataset, and models excel at finding patterns in datasets. But every recommendation here moves money or deletes resources, so the rule is absolute: AI analyzes and drafts, I verify the numbers against the raw export, and I personally approve every commitment and every deletion. A CUD bought on a hallucinated baseline is an expensive mistake the model never has to live with — you do.

The reusable prompts are in my prompts library, and the GCP with AI series covers the services that usually top the bill. The waste is in the data. AI just reads it faster than you will.