Event-Driven Autoscaling in Kubernetes With KEDA
CPU-based autoscaling can't see your queue backlog. KEDA scales on the metric that actually matters — and can scale all the way to zero.
- #kubernetes
- #keda
- #autoscaling
- #hpa
- #scaling
- #messaging
The Horizontal Pod Autoscaler is great until your workload is event-driven. A queue consumer pegged at 40% CPU while ten thousand messages pile up behind it is not “fine” — but to HPA, watching CPU, it looks perfectly healthy. The metric you care about isn’t on the pod; it’s the depth of the queue. KEDA exists to scale on exactly that kind of external signal, and it can scale to zero when there’s nothing to do.
I’ve used KEDA to cut idle spend on bursty workers and to keep up with traffic that HPA simply couldn’t see. Here’s how it works and where it bites.
What KEDA adds to HPA
KEDA doesn’t replace the HPA — it feeds it. KEDA runs as an operator that watches an external source (a Kafka lag, a RabbitMQ queue depth, a Prometheus query, a cron schedule, cloud queue metrics) and translates that into something the HPA can act on. It also handles the one thing HPA can’t: activating a workload from zero replicas and back down again.
So the division of labor is: KEDA decides “there is work,” HPA decides “how many replicas for that load.”
Installing
helm repo add kedacore https://kedacore.github.io/charts
helm install keda kedacore/keda --namespace keda --create-namespace
kubectl get pods -n keda
You want three components healthy: the operator, the metrics adapter, and the admission webhook.
A ScaledObject for queue depth
The core resource is ScaledObject. Here’s a worker that scales on RabbitMQ queue length:
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: order-worker
namespace: orders
spec:
scaleTargetRef:
name: order-worker
minReplicaCount: 0
maxReplicaCount: 30
pollingInterval: 15
cooldownPeriod: 120
triggers:
- type: rabbitmq
metadata:
protocol: amqp
queueName: orders
mode: QueueLength
value: "20"
authenticationRef:
name: rabbitmq-auth
Read that as: keep roughly 20 messages per replica. At 200 messages backlog, KEDA targets 10 replicas. At zero messages, after the cooldown, it scales the Deployment to zero.
The auth reference points at a TriggerAuthentication so the connection string isn’t inline:
apiVersion: keda.sh/v1alpha1
kind: TriggerAuthentication
metadata:
name: rabbitmq-auth
namespace: orders
spec:
secretTargetRef:
- parameter: host
name: rabbitmq-secret
key: connectionString
Scaling to zero — the headline feature
minReplicaCount: 0 is what makes KEDA worth adopting on its own. A nightly batch consumer, a webhook processor, a per-tenant worker that’s only busy during business hours — all of these can sit at zero replicas and cost nothing, then spin up the instant a message arrives.
Two things to understand about scale-to-zero:
- There’s a cold-start cost. The first message after idle waits for a pod to schedule and start. For latency-sensitive synchronous traffic, scale-to-zero is the wrong tool. For async work, it’s perfect.
- The activation threshold is separate from the scaling threshold.
activationValuecontrols when KEDA wakes a workload from zero; the triggervaluecontrols scaling once it’s running. Setting these thoughtfully avoids flapping between 0 and 1 on a trickle of traffic.
Scaling on a Prometheus query
Not everything is a queue. If the signal you care about lives in Prometheus — requests-per-second, in-flight jobs, a custom business metric — scale on it directly:
triggers:
- type: prometheus
metadata:
serverAddress: http://prometheus.monitoring:9090
query: sum(rate(http_requests_total{app="checkout"}[2m]))
threshold: "100"
This is the most flexible trigger and the one I reach for when the “right” metric is something only my app knows about.
ScaledJobs for batch work
For workloads where each unit of work is a discrete job rather than a long-running consumer, use ScaledJob instead of ScaledObject. KEDA creates one Job per batch of work and lets it run to completion rather than maintaining a pool. This is the better fit for video encoding, report generation, or anything where a pod does one task and exits.
Operational gotchas
- Don’t put an HPA and a KEDA ScaledObject on the same Deployment. KEDA manages the HPA for you; a second one fights it. Pick one.
- Mind the cooldown. Too short and you thrash pods up and down; too long and you pay for idle capacity. I start at 300 seconds and tune from there.
- Watch your max. A misconfigured
maxReplicaCountplus a sudden backlog can scale you straight into your cluster’s resource ceiling and trigger a cascade. Cap it where your nodes can actually keep up. - Authentication scoping. The
TriggerAuthenticationSecret should be least-privilege — read-only queue metrics, nothing more.
Check what’s happening:
kubectl get scaledobject -n orders
kubectl get hpa -n orders # the HPA KEDA generated
kubectl describe scaledobject order-worker -n orders
When KEDA is the wrong answer
If your service is CPU- or memory-bound and request-driven, plain HPA is simpler and you don’t need KEDA. Reach for KEDA when the thing that should drive scaling lives outside the pod — a queue, a stream, a schedule, a business metric. That’s the line.
Before rolling a ScaledObject into production, sanity-check the thresholds and the max replicas against your real traffic shape. Our AI code review is handy for catching an unbounded max or a trigger value that would scale far harder than your cluster can absorb.
KEDA turns “scale on the metric that matters” from a custom-metrics-adapter science project into a few lines of YAML. For more scaling and operations guides, see the Kubernetes & Helm category.
Autoscaling configuration interacts with cost and capacity. Validate thresholds and replica caps against your own workload before applying to production.
Download the Free 500-Prompt DevOps AI Toolkit
500 battle-tested, copy-paste AI prompts engineered by a senior systems engineer — every one with fill-in placeholders and safety/back-out notes. Drop your email and it's yours.
- 500 prompts: Linux · Kubernetes · Terraform · OpenStack · GitLab · Docker · Monitoring · Incident Response
- Instant PDF download — yours free, forever
- Plus one practical AI-workflow email a week (no spam)
Single opt-in · unsubscribe anytime · no spam.