Kubernetes HPA Debugging Prompt
Diagnose HorizontalPodAutoscaler issues — flapping replicas, `unable to fetch metrics`, custom metrics adapter, behavior tuning, scale-from-zero patterns.
- Target user
- Kubernetes engineers tuning autoscaling
- Difficulty
- Intermediate
- Tools
- Claude, ChatGPT
The prompt
You are a senior Kubernetes engineer who has tuned HPAs in production for both CPU/memory-based and custom-metric-based autoscaling. You know that HPA scales on REQUESTS, not utilization — and that wrong-sized requests cause HPA math to be meaningless. I will provide: - The HPA spec (`kubectl get hpa <name> -o yaml`) - `kubectl describe hpa <name>` (Conditions section is critical) - The target Deployment's current replica count and resource requests - The symptom (flapping, never scales, `unable to fetch metrics`, custom metric not appearing) - For custom metrics: the metrics adapter installed (Prometheus Adapter, KEDA, etc.) Your job: 1. **Verify the metrics pipeline**: - **CPU/memory metrics** require `metrics-server` running and `kubectl top pod` working - **Custom metrics** require an adapter (prometheus-adapter, KEDA, custom) - `kubectl get apiservice v1beta1.metrics.k8s.io` → must show `Available=True` - `kubectl get apiservice v1beta1.custom.metrics.k8s.io` → for custom metrics 2. **HPA math basics**: - For CPU: `desiredReplicas = ceil(currentReplicas × currentMetricValue / desiredMetricValue)` - `currentMetricValue` for CPU = average CPU utilization across all pods (relative to request) - **A pod with `requests.cpu: 100m` using 80m has utilization 80%** - If `requests.cpu` is wrong (too high or too low), utilization math is meaningless 3. **For "unable to fetch metrics" / `Conditions: AbleToScale=False`**: - metrics-server not installed or not running: `kubectl -n kube-system get pods -l k8s-app=metrics-server` - metrics-server can't reach kubelet: TLS issues on self-managed clusters (need `--kubelet-insecure-tls`) - For custom: adapter pod logs (`kubectl logs <adapter-pod>`) 4. **For flapping (oscillating up/down)**: - **`behavior.scaleDown.stabilizationWindowSeconds`** — soak time before scale down (default 300s) - **`behavior.scaleUp.stabilizationWindowSeconds`** — default 0 (immediate) - **Noisy metric**: HPA reacts to noise → smooth at metric source or raise window - **Resource requests too low** → small load looks like high utilization → over-scale → load drops → scale down → repeat 5. **For "never scales"**: - Current value below target: `desired = current * 0.8 / target_0.5 = current * 1.6` → if just barely below threshold, no change - HPA's `tolerance` (cluster-wide default 0.1, 10%) prevents tiny scaling moves - `minReplicas` floor reached 6. **For custom metrics**: - Verify metric appears: `kubectl get --raw "/apis/custom.metrics.k8s.io/v1beta1" | jq` - Prometheus Adapter config: rules for queries → metrics names → HPA references - KEDA: ScaledObject instead of HPA directly; KEDA generates HPA + uses TriggerAuthentication 7. **For scale-from-zero**: - Vanilla HPA `minReplicas: 0` — supported but pods must exist for HPA to read current metric. KEDA solves this with external triggers (queue depth, S3 object, etc.). 8. **For behavior tuning** (1.18+): - `behavior.scaleUp.policies` and `scaleDown.policies` define rate of change per period - Example: scale up to max(100% of current pods, +10 pods) per 30 seconds Mark DESTRUCTIVE: removing HPA on a workload that depends on auto-scaling (replicas freeze at current; may be too few for load), `minReplicas: 0` without KEDA (vanilla HPA won't scale from 0 on metric arrival), changing target utilization while traffic is hitting. --- HPA + namespace: [DESCRIBE] Symptom: [DESCRIBE] `kubectl describe hpa <name>` (Conditions especially): ``` [PASTE] ``` HPA spec: ```yaml [PASTE] ``` Target Deployment's `resources.requests`: ```yaml [PASTE] ``` metrics-server status: ``` [PASTE `kubectl get apiservice v1beta1.metrics.k8s.io` and pod status] ``` For custom metrics: adapter logs: ``` [PASTE] ```
Why this prompt works
HPA is widely misunderstood: it scales on requests percentage, not actual usage. The flapping that frustrates teams usually comes from undersized requests (every burst → “200% utilization” → over-scale → drop → repeat). This prompt forces a requests-first audit before tuning HPA.
How to use it
- Always verify metrics-server is healthy. Most “HPA doesn’t work” is a broken metrics pipeline.
- Look at
kubectl describe hpaConditions section — it tells you exactly why scaling decisions were skipped. - Cross-check resource requests against actual usage. HPA math depends on requests being roughly right.
- For custom metrics, hit the metrics API directly (
--raw) to confirm the metric exists.
Useful commands
# HPA state
kubectl get hpa -A
kubectl describe hpa <name>
kubectl get hpa <name> -o yaml | yq '.spec, .status'
# Metrics pipeline
kubectl get apiservice v1beta1.metrics.k8s.io
kubectl get apiservice v1beta1.custom.metrics.k8s.io
kubectl get apiservice v1beta1.external.metrics.k8s.io
kubectl -n kube-system get pods -l k8s-app=metrics-server
kubectl top pod -n <ns> # verifies metrics-server
# Direct metrics API access
kubectl get --raw "/apis/metrics.k8s.io/v1beta1/namespaces/<ns>/pods" | jq
kubectl get --raw "/apis/custom.metrics.k8s.io/v1beta1" | jq
# Prometheus Adapter (if using)
kubectl -n monitoring logs deploy/prometheus-adapter
kubectl -n monitoring get configmap adapter-config -o yaml
# KEDA (if using)
kubectl get scaledobjects -A
kubectl get triggerauthentications -A
kubectl describe scaledobject <name>
# Resource requests audit
kubectl get pods -n <ns> -o jsonpath='{range .items[*]}{.metadata.name}{":\t"}{.spec.containers[*].resources}{"\n"}{end}'
# Force re-eval (delete HPA, re-apply)
kubectl delete hpa <name>
kubectl apply -f hpa.yaml
Patterns
Standard CPU HPA with behavior
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: web
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: web
minReplicas: 3
maxReplicas: 30
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70 # scale at 70% of request
behavior:
scaleUp:
stabilizationWindowSeconds: 30
policies:
- type: Percent
value: 100
periodSeconds: 30
- type: Pods
value: 4
periodSeconds: 30
selectPolicy: Max
scaleDown:
stabilizationWindowSeconds: 300
policies:
- type: Percent
value: 50
periodSeconds: 60
Custom metric (Prometheus Adapter — requests per second)
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: api
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: api
minReplicas: 2
maxReplicas: 50
metrics:
- type: Pods
pods:
metric:
name: http_requests_per_second
target:
type: AverageValue
averageValue: "100" # 100 RPS per pod
KEDA ScaledObject (scale from zero on queue depth)
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: queue-worker
spec:
scaleTargetRef:
name: queue-worker
minReplicaCount: 0
maxReplicaCount: 30
triggers:
- type: rabbitmq
metadata:
protocol: amqp
queueName: jobs
mode: QueueLength
value: "10" # 10 messages per worker
host: amqp://user:pass@rabbitmq:5672
Common findings this catches
- “unable to fetch metrics resource cpu” → metrics-server down or apiservice
Available=False. Restart metrics-server. - Flapping between 5 and 15 replicas → behavior
stabilizationWindowSecondstoo low; metric noise; right-size requests. - Never scales up despite high load → requests are too high (pod uses 50m of a 1000m request → 5% utilization).
- Scales to maxReplicas constantly →
targetUtilizationtoo low for the workload OR requests undersized. SuccessfulRescaleevents but no pod count change → tolerance band (10%) suppresses tiny moves.- Custom metric not in HPA → check adapter logs; verify metric appears in
--raw "/apis/custom.metrics.k8s.io/v1beta1". - HPA target points to a Deployment that doesn’t exist → check
scaleTargetRefnames.
When to escalate
- Cluster-wide metrics-server failure — engage cluster admin.
- KEDA trigger source unreachable (queue, DB, S3) — fix underlying access first.
- Scaling causing cluster autoscaler bills to explode — coordinate; set HPA
maxReplicasmore conservatively.
Related prompts
-
Kubernetes Cluster Autoscaler / Karpenter Debug Prompt
Diagnose cluster autoscaling — scale-up delay, scale-down protection, node group selection, pod doesn't fit any template, Karpenter NodePool/NodeClaim issues.
-
Kubernetes Deployment Rollout Debug Prompt
Diagnose stuck Deployment rollouts — `ProgressDeadlineExceeded`, replica set churn, maxSurge/maxUnavailable misconfig, image pull pacing, and stuck-mid-rollout recovery.
-
Kubernetes Resource Limits & OOMKilled Tuning Prompt
Tune CPU/memory requests and limits to stop OOMKilled, fix throttling, right-size HPA targets, and avoid noisy-neighbor scheduling issues.