Kubernetes PDB unhealthyPodEvictionPolicy Prompt
Stop PodDisruptionBudgets from deadlocking node drains when pods are already broken, by choosing the right unhealthyPodEvictionPolicy and minAvailable/maxUnavailable math.
- Target user
- Engineers whose node drains hang on PDBs
- Difficulty
- Intermediate
- Tools
- Claude, ChatGPT
The prompt
You are a senior reliability engineer whose node drain has been stuck for 20 minutes because a PodDisruptionBudget refuses to let an already-crashing pod be evicted. I want a fix and a policy that doesn't recreate this. I will provide: - The PDB spec (minAvailable / maxUnavailable, selector) - The Deployment/StatefulSet it guards (replicas, current ready count) - The drain or eviction error and which pods are unhealthy Your job: 1. **Explain the deadlock**: with the default `IfHealthyBudget` policy, the eviction API only evicts a pod if doing so keeps the budget satisfied. If some pods are already not-ready, the budget is already at its limit, so even evicting a BROKEN pod is refused — and the drain hangs forever. 2. **Introduce `unhealthyPodEvictionPolicy`**: - `IfHealthyBudget` (default) — only evict unhealthy pods if the budget is currently met - `AlwaysAllow` — always permit eviction of pods that are not Ready, regardless of budget Explain when `AlwaysAllow` is correct (a broken pod isn't serving anyway, so evicting it can only help) and when it's risky. 3. **Check the budget math**: a PDB with `minAvailable` equal to (or above) the replica count makes EVERY voluntary eviction impossible; show the corrected value that allows one disruption at a time. 4. **Diagnose the specific stuck drain**: map ready/unhealthy counts to why the eviction API is returning 429/`Cannot evict`, and the precise change that unblocks it. 5. **Produce the fixed PDB** with `unhealthyPodEvictionPolicy` set and corrected budget, plus how to verify with `kubectl get pdb` (ALLOWED DISRUPTIONS column) and a dry-run eviction. 6. **Mark anything** that lowers availability guarantees, and confirm the workload can tolerate the disruptions the new budget permits. Output format: deadlock explanation, the budget-math fix, the corrected PDB YAML, and verification commands. Do not drain anything — give me commands to inspect and the change to apply myself. --- PDB spec: ```yaml [PASTE] ``` Guarded workload (replicas, ready count): [DESCRIBE] Drain/eviction error + unhealthy pods: [DESCRIBE]
Why this prompt works
The classic PDB deadlock is one of the most frustrating ways a routine node drain turns into a 2 a.m. incident: a pod is already crashing, the budget is therefore already at its floor, and the eviction API refuses to remove even the broken pod because removing it would “violate” the budget — which it already violates. Operators end up force-deleting pods or editing PDBs under pressure. The unhealthyPodEvictionPolicy field exists specifically to break this cycle, but it’s recent and most PDBs in the wild were written before it existed.
This prompt works because it names the deadlock precisely and then offers the two policies with honest guidance on when each is correct. AlwaysAllow is the right answer surprisingly often — a not-Ready pod isn’t serving traffic, so evicting it can only improve things — but only if your readiness probe actually means “broken” and not “still warming up.” The prompt also catches the other half of the problem: budget math where minAvailable equals the replica count, which forbids every eviction silently. Many “PDB bug” reports are really this arithmetic.
The verification step grounds the fix: the ALLOWED DISRUPTIONS column and a dry-run eviction tell you immediately whether the drain will proceed. The assistant proposes the corrected PDB and inspection commands; you apply the change after confirming the workload tolerates the disruptions it now permits. More disruption-budget design lives in the Kubernetes & Helm guides and the prompt library.
Related prompts
-
Kubernetes Descheduler Strategy & Rebalancing Prompt
Design and tune a Kubernetes Descheduler configuration to fix node imbalance, evict pods violating affinity/topology rules, and reclaim stranded capacity — without fighting your autoscaler or HPA.
-
Kubernetes Node Cordon, Drain & Maintenance Runbook Prompt
Produce a safe, repeatable runbook for taking a node out of service for patching or hardware work, respecting PodDisruptionBudgets, local storage, and DaemonSets.
-
Kubernetes PodDisruptionBudget Design Prompt
Design PDBs that keep enough replicas serving during voluntary disruptions (node drains, upgrades, autoscaler scale-down) without accidentally blocking maintenance forever.