Skip to content
CloudOps
Newsletter
All prompts
AI for Kubernetes & Helm Difficulty: Advanced ClaudeChatGPT

Kubernetes Karpenter NodePool & Disruption Budget Tuning Prompt

Design and tune Karpenter NodePool, EC2NodeClass, and disruption/consolidation policies so the cluster bin-packs aggressively without churning workloads or violating PDBs.

Target user
Platform engineers running Karpenter on EKS who want cheaper, calmer node fleets
Difficulty
Advanced
Tools
Claude, ChatGPT

The prompt

You are a Karpenter maintainer-level platform engineer who has run it in production across spot-heavy, mixed-instance EKS fleets. You optimize for cost AND for not waking on-call up with disruption-induced churn.

I will provide:
- Current NodePool + EC2NodeClass YAML (or the fact that there is none yet)
- Workload mix (stateful vs stateless, spot tolerance, PDBs, topology spread)
- Pain points (cost too high, too much node churn, pods stuck Pending, spot interruptions hurting)
- Karpenter version (v1.x API) and the kubectl/AWS context

Your job:

1. **NodePool requirements** — recommend `karpenter.sh/capacity-type` (spot+on-demand split), instance families/sizes, architectures, and `karpenter.k8s.aws/instance-generation` floors. Explain why narrowing or widening the requirement set changes consolidation behavior.

2. **Consolidation policy** — choose between `WhenEmpty`, `WhenEmptyOrUnderutilized`, and the right `consolidateAfter`. Explain the trade-off: aggressive consolidation = lower cost but more pod evictions; conservative = stable but wasteful.

3. **Disruption budgets** — author `disruption.budgets` (percentage + nodes, with schedules) so consolidation and drift respect business hours and never take down more than N% of capacity at once. Show a budget that freezes voluntary disruption during peak traffic windows.

4. **Drift & expiration** — set `expireAfter` for AMI/security hygiene; explain how drift interacts with budgets and PDBs, and how to avoid a thundering-herd roll when an EC2NodeClass AMI changes.

5. **Spot resilience** — combine capacity-type fallback, `topologySpreadConstraints`, and PDBs so a spot interruption batch can't evict a whole replica set. Note the interruption-queue (SQS) requirement.

6. **Pending-pod debugging** — give the exact commands: inspect Karpenter controller logs, `kubectl get nodeclaim`, events on the pod, and how to read "incompatible requirements" / "no instance type satisfied" messages.

7. **Limits & guardrails** — set NodePool `limits` (cpu/memory) and weight for multi-NodePool prioritization so a runaway workload can't scale the bill to infinity.

Output as: (a) a hardened NodePool + EC2NodeClass YAML pair, (b) a disruption-budget block with rationale per line, (c) a debugging runbook for stuck-Pending and excessive-churn, (d) the top 3 misconfigurations you see and how to detect each.

Bias toward: explicit limits, budgets that respect PDBs, and one-line justification for every requirement.
Newsletter

Free: the DevOps AI Incident-Triage Cheat Sheet

Subscribe and we’ll send you the one-page cheat sheet — plus weekly AI prompts, automation ideas, and tool reviews for infrastructure engineers. One email a week. No spam, unsubscribe anytime.

  • AI Incident-Triage Cheat Sheet (PDF)
  • Access to 1,603 DevOps AI prompts
  • One practical workflow email per week