Skip to content
DevOps AI ToolKit
Newsletter
All guides
AI for Kubernetes & Helm By James Joyner IV · · 9 min read

PDBs That Don't Deadlock With unhealthyPodEvictionPolicy

A PodDisruptionBudget can refuse to evict a pod that's already crashing, hanging your node drain forever. The unhealthyPodEvictionPolicy field breaks the deadlock.

  • #kubernetes-helm
  • #ai
  • #pdb
  • #eviction
  • #drain

The drain had been stuck for twenty minutes. We were rolling node maintenance, kubectl drain was waiting on a single pod, and that pod was — of all things — already crash-looping. The PodDisruptionBudget guarding its Deployment was refusing to let the eviction API remove it. So the broken pod, which wasn’t serving anyone, was blocking the maintenance of an entire node. We force-deleted it under time pressure, which is exactly the kind of thing you don’t want to be doing during a maintenance window.

That deadlock has a real name and a real fix. The unhealthyPodEvictionPolicy field on a PDB exists specifically so an already-broken pod can’t hold your drains hostage, and most PDBs in the wild predate it.

Why the budget refuses to evict a broken pod

The eviction API — which kubectl drain uses — only removes a pod if doing so keeps the PDB satisfied. That’s the whole point of a PDB: don’t let voluntary disruptions drop you below your availability floor.

Now consider what happens when pods are already unhealthy. Say you have a Deployment with 3 replicas and a PDB of minAvailable: 2. One pod is crash-looping, so only 2 are Ready — you’re exactly at the floor. The drain tries to evict the crashing pod, and the eviction API does the arithmetic: removing it would leave fewer than 2 available, so it refuses. The fact that the pod it’s refusing to evict is the broken one is invisible to the default logic. The budget is already at its limit, so nothing can be evicted — including the thing that’s broken.

Cannot evict pod as it would violate the pod's disruption budget.

That’s the deadlock. A broken pod, which contributes nothing to availability, is protected as if it were a healthy one.

The field that fixes it

unhealthyPodEvictionPolicy controls how the eviction API treats not-Ready pods:

ValueBehavior
IfHealthyBudget (default)Only evict unhealthy pods if the budget is currently met. This is the deadlock source.
AlwaysAllowAlways permit eviction of not-Ready pods, regardless of the budget.

With AlwaysAllow, the logic becomes: a pod that isn’t Ready isn’t serving traffic anyway, so evicting it can’t reduce real availability — let it go. The crash-looping pod gets evicted, the drain proceeds, and the Deployment reschedules it (hopefully onto a healthier node):

apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: web
spec:
  minAvailable: 2
  unhealthyPodEvictionPolicy: AlwaysAllow
  selector:
    matchLabels:
      app: web

The one caveat: AlwaysAllow is only safe if your readiness probe accurately means “broken.” If the probe reports not-Ready during a long, legitimate warmup, AlwaysAllow will happily evict pods that were merely slow to start. Make sure not-Ready means not-working before you flip it.

Check the budget math too

Plenty of “PDB bugs” aren’t the deadlock at all — they’re arithmetic. A PDB whose minAvailable equals (or exceeds) the replica count forbids every voluntary eviction, because there’s never any slack:

# The column that tells you the truth
kubectl get pdb web
# NAME   MIN AVAILABLE   MAX UNAVAILABLE   ALLOWED DISRUPTIONS   AGE
# web    3               N/A               0                     12d

ALLOWED DISRUPTIONS: 0 means no drain will ever succeed against this PDB until you give it slack — typically by lowering minAvailable to replicas - 1, or expressing it as maxUnavailable: 1. If that column shows at least 1, the budget permits progress.

Prompt: Here is a PDB with minAvailable: 2 guarding a 3-replica Deployment, and a node drain that’s been stuck for 20 minutes on a crash-looping pod. Explain exactly why the eviction API is refusing, give me the corrected PDB, and the commands to verify the drain will now proceed. Inspect-and-apply-myself — don’t drain anything.

Output (excerpt): With only 2 of 3 pods Ready you’re at the minAvailable floor, so the default IfHealthyBudget policy refuses to evict even the broken pod. Add unhealthyPodEvictionPolicy: AlwaysAllow so not-Ready pods can be evicted regardless of budget. Verify kubectl get pdb shows ALLOWED DISRUPTIONS >= 1 for the healthy case, and confirm the readiness probe genuinely means “broken” before applying.

This is a good AI-assisted task because it’s a logic puzzle with a known shape: map the ready/unhealthy counts to why the eviction API is returning what it is, then prescribe the field. I keep it advisory — the assistant explains and writes the corrected PDB, and I confirm the workload tolerates the disruptions the new budget permits before I apply it and run the drain. PDBs only gate voluntary disruptions, so they’re never a substitute for replica-level redundancy against node failure. More disruption-budget design lives in the Kubernetes & Helm guides.

Wrapping up

A PodDisruptionBudget is supposed to protect availability, not deadlock your maintenance, but with the default IfHealthyBudget policy a single already-broken pod can hang a drain indefinitely because the budget is already at its floor. Set unhealthyPodEvictionPolicy: AlwaysAllow so not-Ready pods can always be evicted — after confirming your readiness probe really means broken — and check the ALLOWED DISRUPTIONS column to catch the budget-math version of the same problem. Let an AI assistant diagnose the arithmetic and draft the fix while you verify and apply. More node-maintenance and scheduling guides are in the Kubernetes & Helm guides, with reusable prompts in the prompt library.

Free download · 368-page PDF

Download the Free 500-Prompt DevOps AI Toolkit

500 battle-tested, copy-paste AI prompts engineered by a senior systems engineer — every one with fill-in placeholders and safety/back-out notes. Drop your email and it's yours.

  • 500 prompts: Linux · Kubernetes · Terraform · OpenStack · GitLab · Docker · Monitoring · Incident Response
  • Instant PDF download — yours free, forever
  • Plus one practical AI-workflow email a week (no spam)

Single opt-in · unsubscribe anytime · no spam.