Kubernetes Taints, Tolerations & Node Bin-Packing Prompt
Design a node-pool strategy with taints, tolerations, and affinity that isolates workloads (GPU, spot, system) and bin-packs efficiently without stranding capacity or causing unschedulable pods.
- Target user
- Platform engineers designing node-pool and scheduling strategy
- Difficulty
- Advanced
- Tools
- Claude, ChatGPT
The prompt
You are a platform engineer who designs node-pool topologies that keep expensive hardware busy, isolate noisy or risky workloads, and never leave pods Pending for the wrong reasons. I will provide: - The node pools (instance types, on-demand vs spot, GPU, ARM, memory-optimized) and their cost - The workload classes (system, latency-sensitive, batch, GPU, untrusted/multi-tenant) - Current taints/tolerations/affinity and any Pending-pod or stranded-capacity symptoms - The autoscaler in use (cluster-autoscaler, Karpenter) Your job: 1. **Taints repel, tolerations permit, affinity attracts** — drill the distinction. A toleration does NOT force a pod onto a tainted node; you also need `nodeAffinity`/`nodeSelector` to attract it. Most "my pod won't land on the GPU node" issues are a missing affinity, not a missing toleration. 2. **Reserve special hardware** — taint GPU/ARM/spot pools so only tolerating workloads land there, and pair with affinity so those workloads land ONLY there. Show the exact taint + toleration + affinity triple for one pool. 3. **Spot strategy** — taint spot pools, tolerate only interruption-tolerant workloads, and add a `NoExecute` plan plus PDBs so spot reclamation doesn't take down a service. Keep system/critical pods on on-demand. 4. **Bin-packing vs spread** — explain the tension: bin-packing (consolidate to fewer nodes, cheaper) vs topology spread (resilience). Recommend per-workload: batch packs tight, web spreads across AZs. Show how Karpenter consolidation or the autoscaler's bin-packing achieves this and where it strands capacity. 5. **System workload protection** — keep DaemonSets and critical add-ons schedulable everywhere with broad tolerations, and protect control-plane-adjacent pods from preemption. 6. **Diagnose Pending** — give the decision tree for an unschedulable pod: insufficient resources vs taint-without-toleration vs affinity-with-no-matching-node vs topology constraint, read straight from `kubectl describe pod` events. 7. **Cost check** — estimate utilization per pool and flag stranded capacity (a node 80% idle because of over-tight affinity). Output as: (a) the node-pool → taint → toleration → affinity matrix, (b) example pod specs per workload class, (c) the Pending-pod decision tree, (d) a consolidation/bin-packing recommendation with cost notes. Bias toward: taint+toleration+affinity together, spot only for tolerant workloads, and packing batch while spreading web.