Event-Driven Autoscaling in Kubernetes With KEDA

The Horizontal Pod Autoscaler is great until your workload is event-driven. A queue consumer pegged at 40% CPU while ten thousand messages pile up behind it is not “fine” — but to HPA, watching CPU, it looks perfectly healthy. The metric you care about isn’t on the pod; it’s the depth of the queue. KEDA exists to scale on exactly that kind of external signal, and it can scale to zero when there’s nothing to do.

I’ve used KEDA to cut idle spend on bursty workers and to keep up with traffic that HPA simply couldn’t see. Here’s how it works and where it bites.

What KEDA adds to HPA

KEDA doesn’t replace the HPA — it feeds it. KEDA runs as an operator that watches an external source (a Kafka lag, a RabbitMQ queue depth, a Prometheus query, a cron schedule, cloud queue metrics) and translates that into something the HPA can act on. It also handles the one thing HPA can’t: activating a workload from zero replicas and back down again.

So the division of labor is: KEDA decides “there is work,” HPA decides “how many replicas for that load.”

Installing

helm repo add kedacore https://kedacore.github.io/charts
helm install keda kedacore/keda --namespace keda --create-namespace
kubectl get pods -n keda

You want three components healthy: the operator, the metrics adapter, and the admission webhook.

A ScaledObject for queue depth

The core resource is ScaledObject. Here’s a worker that scales on RabbitMQ queue length:

apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: order-worker
  namespace: orders
spec:
  scaleTargetRef:
    name: order-worker
  minReplicaCount: 0
  maxReplicaCount: 30
  pollingInterval: 15
  cooldownPeriod: 120
  triggers:
  - type: rabbitmq
    metadata:
      protocol: amqp
      queueName: orders
      mode: QueueLength
      value: "20"
    authenticationRef:
      name: rabbitmq-auth

Read that as: keep roughly 20 messages per replica. At 200 messages backlog, KEDA targets 10 replicas. At zero messages, after the cooldown, it scales the Deployment to zero.

The auth reference points at a TriggerAuthentication so the connection string isn’t inline:

apiVersion: keda.sh/v1alpha1
kind: TriggerAuthentication
metadata:
  name: rabbitmq-auth
  namespace: orders
spec:
  secretTargetRef:
  - parameter: host
    name: rabbitmq-secret
    key: connectionString

Scaling to zero — the headline feature

minReplicaCount: 0 is what makes KEDA worth adopting on its own. A nightly batch consumer, a webhook processor, a per-tenant worker that’s only busy during business hours — all of these can sit at zero replicas and cost nothing, then spin up the instant a message arrives.

Two things to understand about scale-to-zero:

There’s a cold-start cost. The first message after idle waits for a pod to schedule and start. For latency-sensitive synchronous traffic, scale-to-zero is the wrong tool. For async work, it’s perfect.
The activation threshold is separate from the scaling threshold. activationValue controls when KEDA wakes a workload from zero; the trigger value controls scaling once it’s running. Setting these thoughtfully avoids flapping between 0 and 1 on a trickle of traffic.

Scaling on a Prometheus query

Not everything is a queue. If the signal you care about lives in Prometheus — requests-per-second, in-flight jobs, a custom business metric — scale on it directly:

  triggers:
  - type: prometheus
    metadata:
      serverAddress: http://prometheus.monitoring:9090
      query: sum(rate(http_requests_total{app="checkout"}[2m]))
      threshold: "100"

This is the most flexible trigger and the one I reach for when the “right” metric is something only my app knows about.

ScaledJobs for batch work

For workloads where each unit of work is a discrete job rather than a long-running consumer, use ScaledJob instead of ScaledObject. KEDA creates one Job per batch of work and lets it run to completion rather than maintaining a pool. This is the better fit for video encoding, report generation, or anything where a pod does one task and exits.

Operational gotchas

Don’t put an HPA and a KEDA ScaledObject on the same Deployment. KEDA manages the HPA for you; a second one fights it. Pick one.
Mind the cooldown. Too short and you thrash pods up and down; too long and you pay for idle capacity. I start at 300 seconds and tune from there.
Watch your max. A misconfigured maxReplicaCount plus a sudden backlog can scale you straight into your cluster’s resource ceiling and trigger a cascade. Cap it where your nodes can actually keep up.
Authentication scoping. The TriggerAuthentication Secret should be least-privilege — read-only queue metrics, nothing more.

Check what’s happening:

kubectl get scaledobject -n orders
kubectl get hpa -n orders        # the HPA KEDA generated
kubectl describe scaledobject order-worker -n orders

When KEDA is the wrong answer

If your service is CPU- or memory-bound and request-driven, plain HPA is simpler and you don’t need KEDA. Reach for KEDA when the thing that should drive scaling lives outside the pod — a queue, a stream, a schedule, a business metric. That’s the line.

Before rolling a ScaledObject into production, sanity-check the thresholds and the max replicas against your real traffic shape. Our AI code review is handy for catching an unbounded max or a trigger value that would scale far harder than your cluster can absorb.

KEDA turns “scale on the metric that matters” from a custom-metrics-adapter science project into a few lines of YAML. For more scaling and operations guides, see the Kubernetes & Helm category.

Autoscaling configuration interacts with cost and capacity. Validate thresholds and replica caps against your own workload before applying to production.