Aurora Serverless v2 Scaling and Cost With AI: ACU Min, Max

A team handed me an Aurora Serverless v2 cluster they were sure was misconfigured, because the bill looked like a provisioned instance that never slept. It was — and the reason was a single number. They’d set the minimum capacity to a value high enough that the cluster never actually scaled down, so they were paying for elasticity they never used. Serverless v2 is one of those services where the marketing (“scales automatically”) hides the one decision that matters: the ACU floor. Get it wrong and you’ve bought a provisioned instance with a serverless markup. Get it right and you’ve genuinely matched spend to load. AI is useful here precisely because the sizing question is a reasoning problem over your own metrics, not a lookup — a model can read the capacity-utilization history and argue about where the floor should sit, faster than I can eyeball CloudWatch. What it can’t do is know that your “quiet” Sunday is actually the weekly batch job, so I keep ownership of the final numbers.

The line I hold: AI reads the ACU utilization and proposes min/max with the cost and cold-start trade-offs spelled out. I set the actual values, because the model doesn’t know our latency SLOs or which workloads can tolerate scaling lag.

Understand what an ACU actually costs

An Aurora Capacity Unit is roughly 2 GiB of memory with associated CPU and network. Serverless v2 bills per ACU-hour, measured in fine-grained increments, and it scales in steps as small as half an ACU. The critical mechanics: the cluster never scales below your configured minimum, and it scales up under memory or CPU pressure — and it scales down much more conservatively than it scales up, because shedding memory means evicting buffer pool. That asymmetry is why a spiky workload can hold elevated capacity longer than you’d expect.

Check the current configuration before touching anything:

aws rds describe-db-clusters \
  --db-cluster-identifier app-prod \
  --query 'DBClusters[0].ServerlessV2ScalingConfiguration'
# -> { "MinCapacity": 8.0, "MaxCapacity": 64.0 }

A minimum of 8 ACUs means the cluster never bills below 16 GiB of memory worth of capacity, around the clock, regardless of whether a single query is running. That floor is the line item people miss.

Read the real utilization

The metric that tells the truth is ServerlessDatabaseCapacity — the actual ACUs in use over time — alongside ACUUtilization, the percentage of your configured max. Pull a couple weeks of it:

aws cloudwatch get-metric-statistics \
  --namespace AWS/RDS \
  --metric-name ServerlessDatabaseCapacity \
  --dimensions Name=DBClusterIdentifier,Value=app-prod \
  --start-time "$(date -u -d '14 days ago' +%Y-%m-%dT%H:%M:%SZ)" \
  --end-time "$(date -u +%Y-%m-%dT%H:%M:%SZ)" \
  --period 3600 \
  --statistics Minimum Average Maximum \
  --query 'Datapoints | sort_by(@, &Timestamp)[].{t:Timestamp,min:Minimum,avg:Average,max:Maximum}' \
  --output table

What you’re looking for: the floor your workload actually hits at its quietest, the typical operating band, and the peaks. If the Minimum across two weeks never drops near your configured MinCapacity, you set the floor too high and you’re paying for capacity the workload would have released. If the Maximum regularly pins at your MaxCapacity, you’re throttling scale-up and queries are queuing.

Let AI argue the floor and ceiling

This is where a model is genuinely good — it can reason about the relationship between the floor, the cold-start behavior, and the bill, given your actual datapoints.

Prompt: Here’s two weeks of ServerlessDatabaseCapacity for app-prod. Min across the window is 6.5 ACU, average 11, p95 around 22, max 31. Current config is min 8, max 64. We have an OLTP API with a p99 latency SLO of 200ms. Where should min and max sit, and what’s the cost implication?

Reply: Two things stand out. First, your floor is doing nothing — the workload never drops below 6.5 ACU, so a min of 8 is below the observed trough and you’re not over-flooring as much as it looks, but you’re also not getting scale-down benefit because there’s no idle period. That’s a signal this cluster isn’t really a “scales to near-zero” candidate; the floor is load-bearing. Don’t drop min below ~6 or you’ll add buffer-pool churn that hurts your p99 during the morning ramp. Second, your max of 64 is far above the observed max of 31 — that headroom costs nothing when unused, but it also means a runaway query could scale you to 64 ACU and a surprising bill before you notice. I’d set min to 6, max to 40 — keeps roughly 30% headroom over observed peak without leaving a 64-ACU ceiling open. The cost win here is modest because your workload has no genuine idle, which is the real finding: if you can’t tolerate a higher floor and there’s no quiet period, a right-sized provisioned instance with a reservation may beat Serverless v2 on this cluster. Verify the trough is genuinely the floor and not an artifact of an hourly period masking a 3am batch dip.

That last caveat is exactly why I verify rather than apply. The model flagged that hourly aggregation can hide a real idle window — so I re-pulled at a 300-second period before trusting the “no idle” conclusion. For more sizing patterns, the AWS guides cover the broader rightsizing picture, and the database and cost prompts collection has reusable starting points for capacity reviews.

Apply the change without a restart

Serverless v2 scaling config updates apply without downtime — the floor and ceiling change in place:

aws rds modify-db-cluster \
  --db-cluster-identifier app-prod \
  --serverless-v2-scaling-configuration MinCapacity=6,MaxCapacity=40 \
  --apply-immediately

In Terraform, the same intent lives in the cluster resource:

resource "aws_rds_cluster" "app_prod" {
  cluster_identifier = "app-prod"
  engine             = "aurora-postgresql"
  engine_mode        = "provisioned" # serverless v2 uses provisioned mode

  serverlessv2_scaling_configuration {
    min_capacity = 6.0
    max_capacity = 40.0
  }
}

resource "aws_rds_cluster_instance" "app_prod" {
  cluster_identifier = aws_rds_cluster.app_prod.id
  instance_class     = "db.serverless"
  engine             = aws_rds_cluster.app_prod.engine
}

Note the quirk that trips people up: Serverless v2 runs in provisioned engine mode with a db.serverless instance class. The old serverless engine mode is v1, which is a different and inferior beast — slower to scale, with cold pauses. Don’t mix them up in a copied module.

Know when provisioned wins

Serverless v2 is not free elasticity — you pay a premium per ACU-hour over the equivalent provisioned instance. It pays off when your load has genuine variance: dev/test clusters that idle overnight, workloads with sharp daily peaks, multi-tenant systems where individual tenants are bursty. It loses when load is flat and predictable, because then you’re paying the serverless premium to “scale” between two numbers that are nearly the same. The clean decision rule I use: if the ratio of your p95 capacity to your trough capacity is small — say the cluster never breathes — a provisioned instance with a reserved-instance commitment will almost always be cheaper. If that ratio is large, or the trough is genuinely near-idle for hours a day, Serverless v2 earns its premium.

The honest version of “scales automatically” is “scales between the two numbers you chose.” AI makes reading the utilization and reasoning about those numbers fast, and it’s good at catching the floor-is-load-bearing case that quietly turns a serverless cluster into an expensive provisioned one. But the floor encodes your latency tolerance and your willingness to trade cold-start risk for cost — that’s your call. Set it from your own metrics, verify the trough is real, and revisit it when the workload’s shape changes.

Aurora Serverless v2 Scaling and Cost With AI: ACU Min, Max, and the Bill