Azure Container Apps Scaling With AI: Tame KEDA and Skip the

We moved a customer-facing API to Azure Container Apps for the serverless simplicity, left the scaling at defaults, and got paged a week later for latency. The defaults included min-replicas of zero, so every time traffic went quiet the app scaled to nothing — and the next customer request paid the full cold-start tax while a container spun up. Container Apps had done exactly what we configured. We just hadn’t realized that scale-to-zero, the feature that makes it cheap, is the wrong default for a latency-sensitive API. That tension — cost versus cold start — is the heart of running Container Apps well.

Container Apps abstracts away most of Kubernetes, but it surfaces KEDA scaling, and KEDA is where the decisions live. Pick the wrong scaler and your workers don’t keep up; pick the wrong minimum and you either pay for idle replicas or eat cold starts. AI is useful here because matching a scaler to a workload and choosing a sane replica floor are pattern decisions it can reason about from your traffic shape — and it knows the platform’s real constraints, like the fixed CPU/memory combinations that trip people up.

Match the scaler to the workload

The KEDA scaler is the most important configuration choice. An HTTP scaler suits a synchronous API. A queue-length scaler (Service Bus or Storage Queue) suits a background worker — it scales on backlog, not on CPU. A cron scaler suits scheduled batch work. Using a CPU scaler on a queue worker, or an HTTP scaler on a batch job, gives you scaling that lags the actual load.

Prompt: “I have two Container Apps: a customer-facing REST API and a background worker that drains a Service Bus queue. Recommend the right KEDA scaler and min/max replicas for each. The API is latency-sensitive; the worker is bursty and tolerant of a few seconds of lag. Explain the cold-start trade-off for the min-replica choice.”

A good answer pins the API to a minimum of one replica to avoid cold starts and uses an HTTP scaler, while letting the worker scale from zero on queue length because a few seconds of lag is fine there:

# Latency-sensitive API: never scale to zero, HTTP-based
az containerapp update -n api-prod -g rg-apps \
  --min-replicas 1 --max-replicas 10 \
  --scale-rule-name http-rule --scale-rule-type http \
  --scale-rule-http-concurrency 50

# Bursty worker: scale from zero on queue depth
az containerapp update -n worker-prod -g rg-apps \
  --min-replicas 0 --max-replicas 20 \
  --scale-rule-name queue-rule --scale-rule-type azure-servicebus \
  --scale-rule-metadata "queueName=jobs" "messageCount=5"

That asymmetry — floor of one for the API, floor of zero for the worker — is the whole game, and it comes straight from the workload’s latency tolerance. This kind of fit-to-workload reasoning runs through the broader Azure platform work.

Respect the CPU/memory combinations

Container Apps doesn’t let you request arbitrary CPU and memory; it allows specific combinations (for example 0.5 vCPU with 1 GiB, 1.0 vCPU with 2 GiB). Request an invalid combination and the deploy fails; request too little and you get throttled or OOM-killed under load. AI knows these constraints and can sanity-check your spec.

Prompt: “My Container App requests 0.75 vCPU and 1.5 GiB. Is that a valid Container Apps CPU/memory combination? If not, recommend the nearest valid combination that fits a moderately memory-hungry .NET API, and explain how to confirm I’m not getting OOM-killed under load.”

Secrets through Key Vault and managed identity

Inline secrets in the Container App work, but Key Vault references via managed identity are the better pattern — the secret lives in one place, rotates centrally, and never sits in the app’s configuration. This ties straight into the managed-identity patterns worth adopting across Azure.

Prompt: “Show me how to wire a Container App to read a database password from Key Vault using a user-assigned managed identity and a Key Vault secret reference, instead of storing it as an inline Container Apps secret. Give the least-privilege role the identity needs.”

The role is Key Vault Secrets User scoped to the vault — narrow, specific, and exactly the kind of least-privilege assignment the prompts library review prompts help you verify.

Roll out risky changes with a traffic-split canary

Container Apps revisions are your safety mechanism, and single-revision mode throws it away. In single-revision mode a new image replaces all traffic at once, and if it’s broken you’re already down with no easy rollback. Multiple-revision mode with a traffic split lets you send a small slice to the new revision, watch it, and shift the rest only when it’s healthy.

# Send 10% to the new revision, keep 90% on the known-good one
az containerapp ingress traffic set -n api-prod -g rg-apps \
  --revision-weight api-prod--rev-old=90 api-prod--rev-new=10

Prompt: “I’m deploying a risky change to a Container App. Give me a canary rollout plan using multiple-revision mode and traffic splitting: start at 10% to the new revision, what to watch before increasing, and the exact command to roll back to 100% on the old revision if it misbehaves.”

AI lays out the canary steps and the rollback command; you watch the metrics and make the call to shift traffic. A broken revision behind a 10% split is a contained incident. Behind a single-revision swap it’s an outage.

The discipline

Container Apps is simple until the defaults bite. Match the scaler to the workload, set a replica floor that reflects your latency tolerance, respect the valid CPU/memory combinations, source secrets from Key Vault through managed identity, and roll risky changes out behind a traffic-split canary. AI reasons about the scaling trade-offs and recalls the platform constraints; you load-test the cold-start path and own the traffic shifts. Get those right and you keep the serverless cost benefits without the serverless surprises. More platform material is in the Azure category, and the Container Apps config review prompt is ready in the prompts library.

Azure Container Apps Scaling With AI: Tame KEDA and Skip the Cold-Start Surprise