You are a senior Kubernetes engineer who has tuned HPAs in production for both CPU/memory-based and custom-metric-based autoscaling. You know that HPA scales on REQUESTS, not utilization — and that wrong-sized requests cause HPA math to be meaningless. I will provide: - The HPA spec (`kubectl get hpa <name> -o yaml`) - `kubectl describe hpa <name>` (Conditions section is critical) - The target Deployment's current replica count and resource requests - The symptom (flapping, never scales, `unable to fetch metrics`, custom metric not appearing) - For custom metrics: the metrics adapter installed (Prometheus Adapter, KEDA, etc.) Your job: 1. **Verify the metrics pipeline**: - **CPU/memory metrics** require `metrics-server` running and `kubectl top pod` working - **Custom metrics** require an adapter (prometheus-adapter, KEDA, custom) - `kubectl get apiservice v1beta1.metrics.k8s.io` → must show `Available=True` - `kubectl get apiservice v1beta1.custom.metrics.k8s.io` → for custom metrics 2. **HPA math basics**: - For CPU: `desiredReplicas = ceil(currentReplicas × currentMetricValue / desiredMetricValue)` - `currentMetricValue` for CPU = average CPU utilization across all pods (relative to request) - **A pod with `requests.cpu: 100m` using 80m has utilization 80%** - If `requests.cpu` is wrong (too high or too low), utilization math is meaningless 3. **For "unable to fetch metrics" / `Conditions: AbleToScale=False`**: - metrics-server not installed or not running: `kubectl -n kube-system get pods -l k8s-app=metrics-server` - metrics-server can't reach kubelet: TLS issues on self-managed clusters (need `--kubelet-insecure-tls`) - For custom: adapter pod logs (`kubectl logs <adapter-pod>`) 4. **For flapping (oscillating up/down)**: - **`behavior.scaleDown.stabilizationWindowSeconds`** — soak time before scale down (default 300s) - **`behavior.scaleUp.stabilizationWindowSeconds`** — default 0 (immediate) - **Noisy metric**: HPA reacts to noise → smooth at metric source or raise window - **Resource requests too low** → small load looks like high utilization → over-scale → load drops → scale down → repeat 5. **For "never scales"**: - Current value below target: `desired = current * 0.8 / target_0.5 = current * 1.6` → if just barely below threshold, no change - HPA's `tolerance` (cluster-wide default 0.1, 10%) prevents tiny scaling moves - `minReplicas` floor reached 6. **For custom metrics**: - Verify metric appears: `kubectl get --raw "/apis/custom.metrics.k8s.io/v1beta1" | jq` - Prometheus Adapter config: rules for queries → metrics names → HPA references - KEDA: ScaledObject instead of HPA directly; KEDA generates HPA + uses TriggerAuthentication 7. **For scale-from-zero**: - Vanilla HPA `minReplicas: 0` — supported but pods must exist for HPA to read current metric. KEDA solves this with external triggers (queue depth, S3 object, etc.). 8. **For behavior tuning** (1.18+): - `behavior.scaleUp.policies` and `scaleDown.policies` define rate of change per period - Example: scale up to max(100% of current pods, +10 pods) per 30 seconds Mark DESTRUCTIVE: removing HPA on a workload that depends on auto-scaling (replicas freeze at current; may be too few for load), `minReplicas: 0` without KEDA (vanilla HPA won't scale from 0 on metric arrival), changing target utilization while traffic is hitting. --- HPA + namespace: [DESCRIBE] Symptom: [DESCRIBE] `kubectl describe hpa <name>` (Conditions especially): ``` [PASTE] ``` HPA spec: ```yaml [PASTE] ``` Target Deployment's `resources.requests`: ```yaml [PASTE] ``` metrics-server status: ``` [PASTE `kubectl get apiservice v1beta1.metrics.k8s.io` and pod status] ``` For custom metrics: adapter logs: ``` [PASTE] ```

Why this prompt works

HPA is widely misunderstood: it scales on requests percentage, not actual usage. The flapping that frustrates teams usually comes from undersized requests (every burst → “200% utilization” → over-scale → drop → repeat). This prompt forces a requests-first audit before tuning HPA.

How to use it

Always verify metrics-server is healthy. Most “HPA doesn’t work” is a broken metrics pipeline.
Look at kubectl describe hpa Conditions section — it tells you exactly why scaling decisions were skipped.
Cross-check resource requests against actual usage. HPA math depends on requests being roughly right.
For custom metrics, hit the metrics API directly (--raw) to confirm the metric exists.

Useful commands

# HPA state
kubectl get hpa -A
kubectl describe hpa <name>
kubectl get hpa <name> -o yaml | yq '.spec, .status'

# Metrics pipeline
kubectl get apiservice v1beta1.metrics.k8s.io
kubectl get apiservice v1beta1.custom.metrics.k8s.io
kubectl get apiservice v1beta1.external.metrics.k8s.io
kubectl -n kube-system get pods -l k8s-app=metrics-server
kubectl top pod -n <ns>                   # verifies metrics-server

# Direct metrics API access
kubectl get --raw "/apis/metrics.k8s.io/v1beta1/namespaces/<ns>/pods" | jq
kubectl get --raw "/apis/custom.metrics.k8s.io/v1beta1" | jq

# Prometheus Adapter (if using)
kubectl -n monitoring logs deploy/prometheus-adapter
kubectl -n monitoring get configmap adapter-config -o yaml

# KEDA (if using)
kubectl get scaledobjects -A
kubectl get triggerauthentications -A
kubectl describe scaledobject <name>

# Resource requests audit
kubectl get pods -n <ns> -o jsonpath='{range .items[*]}{.metadata.name}{":\t"}{.spec.containers[*].resources}{"\n"}{end}'

# Force re-eval (delete HPA, re-apply)
kubectl delete hpa <name>
kubectl apply -f hpa.yaml

Patterns

Standard CPU HPA with behavior

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: web
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: web
  minReplicas: 3
  maxReplicas: 30
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70           # scale at 70% of request
  behavior:
    scaleUp:
      stabilizationWindowSeconds: 30
      policies:
      - type: Percent
        value: 100
        periodSeconds: 30
      - type: Pods
        value: 4
        periodSeconds: 30
      selectPolicy: Max
    scaleDown:
      stabilizationWindowSeconds: 300
      policies:
      - type: Percent
        value: 50
        periodSeconds: 60

Custom metric (Prometheus Adapter — requests per second)

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: api
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: api
  minReplicas: 2
  maxReplicas: 50
  metrics:
  - type: Pods
    pods:
      metric:
        name: http_requests_per_second
      target:
        type: AverageValue
        averageValue: "100"              # 100 RPS per pod

KEDA ScaledObject (scale from zero on queue depth)

apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: queue-worker
spec:
  scaleTargetRef:
    name: queue-worker
  minReplicaCount: 0
  maxReplicaCount: 30
  triggers:
  - type: rabbitmq
    metadata:
      protocol: amqp
      queueName: jobs
      mode: QueueLength
      value: "10"                        # 10 messages per worker
      host: amqp://user:pass@rabbitmq:5672

Common findings this catches

“unable to fetch metrics resource cpu” → metrics-server down or apiservice Available=False. Restart metrics-server.
Flapping between 5 and 15 replicas → behavior stabilizationWindowSeconds too low; metric noise; right-size requests.
Never scales up despite high load → requests are too high (pod uses 50m of a 1000m request → 5% utilization).
Scales to maxReplicas constantly → targetUtilization too low for the workload OR requests undersized.
SuccessfulRescale events but no pod count change → tolerance band (10%) suppresses tiny moves.
Custom metric not in HPA → check adapter logs; verify metric appears in --raw "/apis/custom.metrics.k8s.io/v1beta1".
HPA target points to a Deployment that doesn’t exist → check scaleTargetRef names.

When to escalate

Cluster-wide metrics-server failure — engage cluster admin.
KEDA trigger source unreachable (queue, DB, S3) — fix underlying access first.
Scaling causing cluster autoscaler bills to explode — coordinate; set HPA maxReplicas more conservatively.

Reading prompts? Get all 500 in one free PDF

500 battle-tested, copy-paste AI prompts engineered by a senior systems engineer — every one with fill-in placeholders and safety/back-out notes. Drop your email and it's yours.

500 prompts: Linux · Kubernetes · Terraform · OpenStack · GitLab · Docker · Monitoring · Incident Response

Instant PDF download — yours free, forever

Plus one practical AI-workflow email a week (no spam)

Kubernetes HPA Debugging Prompt

Why this prompt works

How to use it

Useful commands

Patterns

Standard CPU HPA with behavior

Custom metric (Prometheus Adapter — requests per second)

KEDA ScaledObject (scale from zero on queue depth)

Common findings this catches

When to escalate

Related prompts

Kubernetes Resource Limits & OOMKilled Tuning Prompt

Kubernetes Deployment Rollout Debug Prompt

Kubernetes Cluster Autoscaler / Karpenter Debug Prompt

Kubernetes HPA Custom & External Metrics with Prometheus Adapter Prompt

Reading prompts? Get all 500 in one free PDF