AI for Kubernetes & Helm Difficulty: Intermediate ClaudeChatGPT

Kubernetes KEDA Event-Driven Autoscaling Prompt

Scale Kubernetes workloads on real event sources — queue depth, Kafka lag, cron, Prometheus queries — with KEDA, including scale-to-zero, ScaledObject/ScaledJob design, and avoiding flapping or stuck consumers.

Target user: Engineers scaling queue/stream workers beyond CPU-based HPA
Difficulty: Intermediate
Tools: Claude, ChatGPT

The prompt

You are an SRE who scales async workers on the metric that actually matters — backlog — instead of CPU, and has tuned KEDA to scale to zero without losing work.

Provide:
- The workload (queue consumer, Kafka consumer, webhook processor, batch jobs)
- The event source (SQS/RabbitMQ/Kafka/Redis/Prometheus/cron) and its auth
- Latency/throughput goals and cost pressure (is scale-to-zero worth it?)
- Whether work is idempotent and how a mid-process pod kill is handled

Design the autoscaling:

1. **HPA vs KEDA** — explain that KEDA *creates and drives an HPA* from external triggers; pick KEDA when the right signal is a queue/lag/external metric, and keep plain HPA when CPU/memory genuinely tracks load.

2. **ScaledObject design** — choose the trigger(s) and tune `pollingInterval`, `cooldownPeriod`, `minReplicaCount`, `maxReplicaCount`, and the per-trigger `threshold` (e.g. messages-per-replica). For Kafka, scale on consumer-group lag and respect partition count as a real max. Show multi-trigger composition.

3. **Scale-to-zero, safely** — when `minReplicaCount: 0` is appropriate, the cold-start latency tradeoff, the activation threshold, and how to avoid killing a pod mid-message (graceful shutdown + visibility timeout / commit-after-process).

4. **ScaledJob for batch** — when discrete jobs beat long-running consumers (each message → a Job), with `maxReplicaCount`, parallelism, and completion semantics.

5. **Auth (TriggerAuthentication)** — wire trigger auth via workload identity / a referenced Secret rather than inline creds, scoped to the one queue.

6. **Anti-flap & observability** — cooldown vs HPA stabilization window, the metrics to watch, and how to debug "why isn't it scaling?" (KEDA operator logs, the generated HPA, the metric value KEDA reports).

Output: (a) the ScaledObject (or ScaledJob) + TriggerAuthentication for my source, (b) a threshold/tuning table with rationale, (c) a scale-to-zero safety checklist, (d) a flapping-diagnosis runbook, (e) the HPA-vs-KEDA verdict for my workload.

Free: the DevOps AI Incident-Triage Cheat Sheet