Skip to content
CloudOps
All prompts
AI for Kubernetes & Helm Difficulty: Intermediate ClaudeChatGPT

Kubernetes HPA Debugging Prompt

Diagnose HorizontalPodAutoscaler issues — flapping replicas, `unable to fetch metrics`, custom metrics adapter, behavior tuning, scale-from-zero patterns.

Target user
Kubernetes engineers tuning autoscaling
Difficulty
Intermediate
Tools
Claude, ChatGPT

The prompt

You are a senior Kubernetes engineer who has tuned HPAs in production for both CPU/memory-based and custom-metric-based autoscaling. You know that HPA scales on REQUESTS, not utilization — and that wrong-sized requests cause HPA math to be meaningless.

I will provide:
- The HPA spec (`kubectl get hpa <name> -o yaml`)
- `kubectl describe hpa <name>` (Conditions section is critical)
- The target Deployment's current replica count and resource requests
- The symptom (flapping, never scales, `unable to fetch metrics`, custom metric not appearing)
- For custom metrics: the metrics adapter installed (Prometheus Adapter, KEDA, etc.)

Your job:

1. **Verify the metrics pipeline**:
   - **CPU/memory metrics** require `metrics-server` running and `kubectl top pod` working
   - **Custom metrics** require an adapter (prometheus-adapter, KEDA, custom)
   - `kubectl get apiservice v1beta1.metrics.k8s.io` → must show `Available=True`
   - `kubectl get apiservice v1beta1.custom.metrics.k8s.io` → for custom metrics
2. **HPA math basics**:
   - For CPU: `desiredReplicas = ceil(currentReplicas × currentMetricValue / desiredMetricValue)`
   - `currentMetricValue` for CPU = average CPU utilization across all pods (relative to request)
   - **A pod with `requests.cpu: 100m` using 80m has utilization 80%**
   - If `requests.cpu` is wrong (too high or too low), utilization math is meaningless
3. **For "unable to fetch metrics" / `Conditions: AbleToScale=False`**:
   - metrics-server not installed or not running: `kubectl -n kube-system get pods -l k8s-app=metrics-server`
   - metrics-server can't reach kubelet: TLS issues on self-managed clusters (need `--kubelet-insecure-tls`)
   - For custom: adapter pod logs (`kubectl logs <adapter-pod>`)
4. **For flapping (oscillating up/down)**:
   - **`behavior.scaleDown.stabilizationWindowSeconds`** — soak time before scale down (default 300s)
   - **`behavior.scaleUp.stabilizationWindowSeconds`** — default 0 (immediate)
   - **Noisy metric**: HPA reacts to noise → smooth at metric source or raise window
   - **Resource requests too low** → small load looks like high utilization → over-scale → load drops → scale down → repeat
5. **For "never scales"**:
   - Current value below target: `desired = current * 0.8 / target_0.5 = current * 1.6` → if just barely below threshold, no change
   - HPA's `tolerance` (cluster-wide default 0.1, 10%) prevents tiny scaling moves
   - `minReplicas` floor reached
6. **For custom metrics**:
   - Verify metric appears: `kubectl get --raw "/apis/custom.metrics.k8s.io/v1beta1" | jq`
   - Prometheus Adapter config: rules for queries → metrics names → HPA references
   - KEDA: ScaledObject instead of HPA directly; KEDA generates HPA + uses TriggerAuthentication
7. **For scale-from-zero**:
   - Vanilla HPA `minReplicas: 0` — supported but pods must exist for HPA to read current metric. KEDA solves this with external triggers (queue depth, S3 object, etc.).
8. **For behavior tuning** (1.18+):
   - `behavior.scaleUp.policies` and `scaleDown.policies` define rate of change per period
   - Example: scale up to max(100% of current pods, +10 pods) per 30 seconds

Mark DESTRUCTIVE: removing HPA on a workload that depends on auto-scaling (replicas freeze at current; may be too few for load), `minReplicas: 0` without KEDA (vanilla HPA won't scale from 0 on metric arrival), changing target utilization while traffic is hitting.

---

HPA + namespace: [DESCRIBE]
Symptom: [DESCRIBE]
`kubectl describe hpa <name>` (Conditions especially):
```
[PASTE]
```
HPA spec:
```yaml
[PASTE]
```
Target Deployment's `resources.requests`:
```yaml
[PASTE]
```
metrics-server status:
```
[PASTE `kubectl get apiservice v1beta1.metrics.k8s.io` and pod status]
```
For custom metrics: adapter logs:
```
[PASTE]
```

Why this prompt works

HPA is widely misunderstood: it scales on requests percentage, not actual usage. The flapping that frustrates teams usually comes from undersized requests (every burst → “200% utilization” → over-scale → drop → repeat). This prompt forces a requests-first audit before tuning HPA.

How to use it

  1. Always verify metrics-server is healthy. Most “HPA doesn’t work” is a broken metrics pipeline.
  2. Look at kubectl describe hpa Conditions section — it tells you exactly why scaling decisions were skipped.
  3. Cross-check resource requests against actual usage. HPA math depends on requests being roughly right.
  4. For custom metrics, hit the metrics API directly (--raw) to confirm the metric exists.

Useful commands

# HPA state
kubectl get hpa -A
kubectl describe hpa <name>
kubectl get hpa <name> -o yaml | yq '.spec, .status'

# Metrics pipeline
kubectl get apiservice v1beta1.metrics.k8s.io
kubectl get apiservice v1beta1.custom.metrics.k8s.io
kubectl get apiservice v1beta1.external.metrics.k8s.io
kubectl -n kube-system get pods -l k8s-app=metrics-server
kubectl top pod -n <ns>                   # verifies metrics-server

# Direct metrics API access
kubectl get --raw "/apis/metrics.k8s.io/v1beta1/namespaces/<ns>/pods" | jq
kubectl get --raw "/apis/custom.metrics.k8s.io/v1beta1" | jq

# Prometheus Adapter (if using)
kubectl -n monitoring logs deploy/prometheus-adapter
kubectl -n monitoring get configmap adapter-config -o yaml

# KEDA (if using)
kubectl get scaledobjects -A
kubectl get triggerauthentications -A
kubectl describe scaledobject <name>

# Resource requests audit
kubectl get pods -n <ns> -o jsonpath='{range .items[*]}{.metadata.name}{":\t"}{.spec.containers[*].resources}{"\n"}{end}'

# Force re-eval (delete HPA, re-apply)
kubectl delete hpa <name>
kubectl apply -f hpa.yaml

Patterns

Standard CPU HPA with behavior

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: web
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: web
  minReplicas: 3
  maxReplicas: 30
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70           # scale at 70% of request
  behavior:
    scaleUp:
      stabilizationWindowSeconds: 30
      policies:
      - type: Percent
        value: 100
        periodSeconds: 30
      - type: Pods
        value: 4
        periodSeconds: 30
      selectPolicy: Max
    scaleDown:
      stabilizationWindowSeconds: 300
      policies:
      - type: Percent
        value: 50
        periodSeconds: 60

Custom metric (Prometheus Adapter — requests per second)

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: api
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: api
  minReplicas: 2
  maxReplicas: 50
  metrics:
  - type: Pods
    pods:
      metric:
        name: http_requests_per_second
      target:
        type: AverageValue
        averageValue: "100"              # 100 RPS per pod

KEDA ScaledObject (scale from zero on queue depth)

apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: queue-worker
spec:
  scaleTargetRef:
    name: queue-worker
  minReplicaCount: 0
  maxReplicaCount: 30
  triggers:
  - type: rabbitmq
    metadata:
      protocol: amqp
      queueName: jobs
      mode: QueueLength
      value: "10"                        # 10 messages per worker
      host: amqp://user:pass@rabbitmq:5672

Common findings this catches

  • “unable to fetch metrics resource cpu” → metrics-server down or apiservice Available=False. Restart metrics-server.
  • Flapping between 5 and 15 replicas → behavior stabilizationWindowSeconds too low; metric noise; right-size requests.
  • Never scales up despite high load → requests are too high (pod uses 50m of a 1000m request → 5% utilization).
  • Scales to maxReplicas constantlytargetUtilization too low for the workload OR requests undersized.
  • SuccessfulRescale events but no pod count change → tolerance band (10%) suppresses tiny moves.
  • Custom metric not in HPA → check adapter logs; verify metric appears in --raw "/apis/custom.metrics.k8s.io/v1beta1".
  • HPA target points to a Deployment that doesn’t exist → check scaleTargetRef names.

When to escalate

  • Cluster-wide metrics-server failure — engage cluster admin.
  • KEDA trigger source unreachable (queue, DB, S3) — fix underlying access first.
  • Scaling causing cluster autoscaler bills to explode — coordinate; set HPA maxReplicas more conservatively.

Related prompts

Newsletter

Get weekly AI workflows for DevOps engineers

Practical prompts, automation ideas, and tool reviews for infrastructure engineers. One email per week. No spam.