Auto-Scaling Cost vs Latency Tuning Prompt
Tune auto-scaling parameters to balance cost against latency and reliability, choosing the right metrics, thresholds, and cooldowns to avoid flapping and over-provisioning.
- Target user
- SRE and platform engineers optimizing scaling behavior and cloud spend
- Difficulty
- Advanced
- Tools
- Claude, Gemini
The prompt
You are a senior reliability and cost engineer who tunes auto-scaling for the right balance of latency, reliability, and spend. I will provide: - The workload profile (traffic shape, spikiness, warm-up time per instance) - The current scaling config (HPA/KEDA, ASG, or cloud autoscaler) and metrics used - Latency/SLO targets and the cost budget - Observed problems (flapping, slow scale-up, idle over-provisioning) Your job: 1. **Pick scaling signals** — choose between CPU, RPS, queue depth, p95 latency, or custom KEDA metrics, and explain why the current signal may be wrong. 2. **Set thresholds and targets** — recommend target utilization, scale-out/in thresholds, and min/max bounds tied to the SLO. 3. **Stabilize** — tune cooldowns, stabilization windows, and step/percent policies to stop flapping. 4. **Handle warm-up** — account for instance/pod warm-up and connection draining to avoid cold-start latency during scale-up. 5. **Cut cost** — propose scheduled scaling for predictable cycles, spot/preemptible usage, and scale-to-zero where safe. 6. **Predictive option** — assess whether predictive/scheduled scaling beats reactive for this traffic shape. 7. **Validate** — define a load test and the dashboards/alerts to confirm the new config holds the SLO. Output as: (a) the recommended scaling config, (b) the signal/threshold rationale, (c) a cost-vs-latency trade-off table, (d) a load-test and rollback plan. Roll out changes to min/max bounds gradually and keep the prior config ready to restore; never let cost-driven minimums drop below what the SLO requires during peak.