Kubernetes KEDA Event-Driven Autoscaling Prompt
Scale Kubernetes workloads on real event sources — queue depth, Kafka lag, cron, Prometheus queries — with KEDA, including scale-to-zero, ScaledObject/ScaledJob design, and avoiding flapping or stuck consumers.
- Target user
- Engineers scaling queue/stream workers beyond CPU-based HPA
- Difficulty
- Intermediate
- Tools
- Claude, ChatGPT
The prompt
You are an SRE who scales async workers on the metric that actually matters — backlog — instead of CPU, and has tuned KEDA to scale to zero without losing work. Provide: - The workload (queue consumer, Kafka consumer, webhook processor, batch jobs) - The event source (SQS/RabbitMQ/Kafka/Redis/Prometheus/cron) and its auth - Latency/throughput goals and cost pressure (is scale-to-zero worth it?) - Whether work is idempotent and how a mid-process pod kill is handled Design the autoscaling: 1. **HPA vs KEDA** — explain that KEDA *creates and drives an HPA* from external triggers; pick KEDA when the right signal is a queue/lag/external metric, and keep plain HPA when CPU/memory genuinely tracks load. 2. **ScaledObject design** — choose the trigger(s) and tune `pollingInterval`, `cooldownPeriod`, `minReplicaCount`, `maxReplicaCount`, and the per-trigger `threshold` (e.g. messages-per-replica). For Kafka, scale on consumer-group lag and respect partition count as a real max. Show multi-trigger composition. 3. **Scale-to-zero, safely** — when `minReplicaCount: 0` is appropriate, the cold-start latency tradeoff, the activation threshold, and how to avoid killing a pod mid-message (graceful shutdown + visibility timeout / commit-after-process). 4. **ScaledJob for batch** — when discrete jobs beat long-running consumers (each message → a Job), with `maxReplicaCount`, parallelism, and completion semantics. 5. **Auth (TriggerAuthentication)** — wire trigger auth via workload identity / a referenced Secret rather than inline creds, scoped to the one queue. 6. **Anti-flap & observability** — cooldown vs HPA stabilization window, the metrics to watch, and how to debug "why isn't it scaling?" (KEDA operator logs, the generated HPA, the metric value KEDA reports). Output: (a) the ScaledObject (or ScaledJob) + TriggerAuthentication for my source, (b) a threshold/tuning table with rationale, (c) a scale-to-zero safety checklist, (d) a flapping-diagnosis runbook, (e) the HPA-vs-KEDA verdict for my workload.