Skip to content
CloudOps
Newsletter
All guides
Azure with AI By James Joyner IV · · 11 min read

Azure Cost Management With AI: Rightsizing, Reservations, and Killing Waste

Most Azure overspend is idle resources and on-demand VMs that should be reserved. Here's how AI reads cost exports, finds rightsizing wins, and models reservations before you commit.

  • #azure
  • #ai
  • #finops
  • #cost-management
  • #reservations

The finance team flagged a 30% jump in the Azure bill and asked what changed. Nothing dramatic had — no new product, no traffic spike. What had happened was the slow accumulation every Azure account suffers: a dev environment left running over a quarter, three oversized VMs nobody downsized after a load test, an unattached managed disk for every VM someone had deleted, and a pile of on-demand compute that should have been on reservations a year ago. None of it was a single bad decision. It was the absence of anyone looking.

That last part is the real problem. Azure cost data is rich — Cost Management exports every line item — but nobody has the patience to read a multi-megabyte usage CSV and connect a meter to a fixable behavior. AI does have that patience. It will read the export, group spend by what’s actionable, and model a reservation break-even without you building a spreadsheet. It does not approve the purchase or run the resize. You verify the numbers and make the call; it does the analysis nobody otherwise does.

Pull the data, then point AI at the actionable spend

Get the actual cost breakdown out of Azure first. The CLI surfaces usage and you can export full details from Cost Management:

# Current resource group spend by service, this month
az consumption usage list \
  --start-date 2026-06-01 --end-date 2026-06-21 \
  --query "[].{resource:instanceName, meter:meterName, cost:pretaxCost, unit:unitOfMeasure}" \
  -o table

For anything serious, configure a scheduled Cost Management export to a storage account — it gives you the amortized, tag-enriched line items. Then hand a sample to AI:

Prompt: “Here is a sample of an Azure Cost Management export (amortized). Group the spend into three buckets: (1) compute I could rightsize or shut down, (2) on-demand usage that’s a reservation or savings-plan candidate, (3) storage and orphaned resources. For each bucket give me the rough monthly total and the top five line items by cost. Ignore anything tagged env=prod for the shutdown bucket.”

The win here is the grouping. A raw export is meaningless; spend organized by “what can I actually do about this” turns it into a worklist. AI does that triage in seconds, and you verify the totals against the Cost Management portal before trusting any of it.

Rightsizing: use the metrics, not a guess

The temptation is to eyeball VM sizes and downsize the ones that “look big.” Don’t — pull the actual utilization and let the data decide. Azure Advisor already computes rightsizing recommendations; start there, then verify with raw metrics:

az advisor recommendation list --category Cost \
  --query "[].{resource:impactedValue, problem:shortDescription, savings:extendedProperties.savingsAmount}" -o table

# Real CPU utilization for a VM over 30 days
az monitor metrics list --resource "$VM_ID" \
  --metric "Percentage CPU" --interval PT1H \
  --start-time 2026-05-21T00:00:00Z --end-time 2026-06-21T00:00:00Z \
  --aggregation Average Maximum --query "value[0].timeseries[0].data" -o json

Feed the metric series to AI and make it reason about the peak, not just the average — the average is what gets people in trouble:

Prompt: “Here is 30 days of hourly average and maximum CPU for a Standard_D8s_v5 VM. Average is ~8%, but I see periodic maximums near 70%. Is this safe to downsize to a D4s_v5, or do those peaks need the headroom? Recommend a target SKU, state your assumption about the peaks, and tell me what metric I should also check (memory, IOPS) before committing.”

A VM at 8% average but 70% peak is not a free downsize — the peak might be a nightly batch job that needs the cores. AI flags that tension instead of naively chasing the average, which is exactly the mistake a rushed human makes. Memory matters too, and on Azure VMs memory isn’t a default guest metric, so the AI nudging you to check it is doing real work.

Reservations and savings plans: model the break-even before you commit

Reservations are the biggest single lever and the scariest, because a one- or three-year commitment on the wrong SKU is money you can’t get back. This is pure math, which makes it ideal AI territory — model it before you buy:

# Azure's own reservation recommendations, based on your last 30/60/90 days
az reservations reservation-order-id list -o table 2>/dev/null
az consumption reservation recommendation list \
  --query "[].{sku:skuName, term:term, savings:netSavings, recommendedQty:recommendedQuantity}" -o table

Prompt: “I run six Standard_D4s_v5 VMs on-demand 24/7 in East US. A 1-year reserved instance is roughly 40% off and a 3-year is roughly 60% off on-demand. Model the break-even point in months for each term, and tell me the risk: if I might retire two of these VMs in eight months, does the 1-year reservation still pay off? Show the arithmetic.”

The arithmetic is the whole point. A reservation that breaks even at month seven is a no-brainer if the workload is stable, and a trap if you’re retiring the VMs at month eight. AI lays out the break-even and the downside scenario; you confirm the discount percentages against the live pricing (they shift by region and SKU) and decide based on how confident you are the workload stays put. Never let an AI estimate of a discount stand in for the real quote in the portal.

Orphans and schedules: the boring wins

The least glamorous savings are the most reliable. Unattached disks, idle public IPs, empty App Service plans, and dev environments running nights and weekends. Find them:

# Managed disks attached to nothing
az disk list --query "[?diskState=='Unattached'].{name:name, rg:resourceGroup, sizeGB:diskSizeGb, sku:sku.name}" -o table

# Public IPs not associated with anything
az network public-ip list --query "[?ipConfiguration==null].{name:name, rg:resourceGroup}" -o table

Hand the lists to AI to estimate the monthly bleed and draft an auto-shutdown plan for non-prod: “Here are unattached disks and idle IPs with sizes — estimate monthly cost and write the az commands to schedule dev VMs to deallocate at 7 p.m. and start at 8 a.m. weekdays.” Deallocate, not stop — a stopped-but-allocated VM still bills for compute, a distinction that costs people real money. AI knows it; verify it once and move on.

Stay in control of the spend

The rule holds: AI analyzes, you authorize. Cost work is unusually safe to let AI drive analysis on, because every number it produces is checkable against the portal — but a resize or a three-year reservation is irreversible enough that the human approval step is non-negotiable. The loop that works: export the cost data, let AI bucket it into actionable groups, verify the totals, model each reservation’s break-even with the real discount, and act on the boring orphan-cleanup wins first because they’re risk-free. Do that quarterly and the slow 30% creep never happens.

The FinOps prompts I use for reading Azure cost exports are in the prompts library, and there’s more Azure operations material in the Azure category. The data to control Azure spend has always been there — what was missing is someone willing to read it. That’s the job AI is genuinely good at.

Free download · 368-page PDF

Download the Free 500-Prompt DevOps AI Toolkit

500 battle-tested, copy-paste AI prompts engineered by a senior systems engineer — every one with fill-in placeholders and safety/back-out notes. Drop your email and it's yours.

  • 500 prompts: Linux · Kubernetes · Terraform · OpenStack · GitLab · Docker · Monitoring · Incident Response
  • Instant PDF download — yours free, forever
  • Plus one practical AI-workflow email a week (no spam)

Single opt-in · unsubscribe anytime · no spam.