Skip to content
DevOps AI ToolKit
Newsletter
All guides
AI for Kubernetes & Helm By James Joyner IV · · 11 min read

Kubernetes PriorityClass and Preemption: Who Gets Evicted First

When a node fills up, Kubernetes decides which pods survive. Learn how PriorityClass and preemption work, the traps that cause cascading evictions, and how to set them safely.

  • #kubernetes
  • #scheduling
  • #priorityclass
  • #preemption
  • #reliability

The first time I really understood preemption was at 2 a.m., watching a routine batch job quietly delete three replicas of a payment service. Nothing was broken, exactly. The scheduler did precisely what I had told it to do — I just hadn’t realized I’d told it anything. PriorityClass is one of those Kubernetes features that sits dormant and harmless until the cluster gets tight, and then it decides, in milliseconds, which of your workloads live and which get killed. If you’ve never set it deliberately, the cluster is making that call with defaults you didn’t choose.

This guide walks through how priority and preemption actually work, the failure modes that bite teams, and how to roll it out without turning a capacity crunch into an outage.

What priority actually controls

A PriorityClass is a cluster-scoped object that maps a name to an integer. Pods reference it by name, and the scheduler reads the resulting number in two situations: deciding scheduling order when multiple pods are pending, and deciding whom to evict (preempt) when a high-priority pod can’t fit anywhere.

apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
  name: critical-service
value: 1000000
globalDefault: false
preemptionPolicy: PreemptLowerPriority
description: "Customer-facing services that must not be evicted by batch work."

Higher numbers win. A pod with value 1000000 will be scheduled ahead of a pending pod at 0, and if the cluster is full, the scheduler may evict lower-priority pods to make room for it. That eviction is preemption, and it is not graceful in the way a rolling update is — it deletes victim pods to free their requests.

Pro Tip: The two built-in classes system-cluster-critical and system-node-critical exist for a reason. Never assign your application a value near or above them (the system ones are around two billion). If you do, a misbehaving app pod can preempt CoreDNS or the CNI and take the whole node’s networking down with it.

How preemption picks its victims

When a high-priority pod is unschedulable, the scheduler looks for a node where evicting one or more lower-priority pods would let the new pod fit. It tries to minimize disruption: fewest victims, lowest priority, and it respects PodDisruptionBudgets on a best-effort basis. Crucially, it only ever preempts pods with a strictly lower priority value. Two pods at the same priority never preempt each other — they just queue.

You can watch this happen:

kubectl get events --field-selector reason=Preempted -A
# and on the victim's side:
kubectl describe pod <victim> | grep -A3 Events

The victim gets a normal termination with its terminationGracePeriodSeconds, so a well-behaved app still drains connections. But the pod is going away. If it’s part of a Deployment, the controller reschedules it — possibly somewhere it preempts something else. That’s the cascade I mentioned: a single high-priority pod can ripple through a tight cluster.

The preemptionPolicy: Never escape hatch

Not every important pod should be allowed to evict others. A common, sensible pattern is high scheduling priority but no preemption:

apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
  name: high-but-polite
value: 500000
preemptionPolicy: Never

This pod jumps the scheduling queue — it gets placed before lower-priority pending pods — but it will wait for capacity rather than kill anything to get it. For batch workloads that are urgent but not life-or-death, this avoids the worst surprises while still front-running noisy neighbors.

A sane tiering scheme

Resist the urge to invent fifteen priority levels. Three or four tiers cover almost everything, and they should be spaced far apart so you can insert values later without renumbering:

TierValueUse
platform-critical1000000Ingress, DNS add-ons, cert-manager
customer-facing100000Public APIs, checkout, auth
internal-services10000Async workers, internal dashboards
batch0Reports, backfills, CI jobs

Set globalDefault: true on exactly one class — usually a low one like batch or a dedicated default at value 0 — so any pod that forgets to specify a class lands at the bottom rather than inheriting something dangerous. If no globalDefault exists, unspecified pods get priority 0 anyway, but being explicit avoids confusion.

apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
  name: default-priority
value: 0
globalDefault: true
description: "Fallback for pods that do not set a priorityClassName."

Where AI genuinely helps — and where it must not touch

Priority schemes are an excellent task to hand an AI assistant for drafting, because the work is pattern-heavy and tedious: generating consistent YAML, auditing which Deployments lack a priorityClassName, and reasoning about whether a proposed value sits safely below the system classes. I’ll paste a directory of manifests into a model and ask it to summarize the implied priority ordering and flag anything assigned above 1,000,000. It’s fast and surprisingly good at catching the off-by-a-digit value that would let a cron job preempt the API gateway.

Treat the model as a sharp junior engineer: great at the first draft and the consistency pass, not the one who runs kubectl apply. Everything it produces here mutates scheduling behavior across the whole cluster, so it stays human-in-the-loop. I review every value by hand, and I never hand the model a kubeconfig or production credentials — it gets the YAML text, nothing that can touch a live cluster. For building that review habit there’s a structured flow in the code review dashboard, and reusable starting points in the prompt library and prompt packs.

A prompt I reach for:

Here are 20 Deployment manifests. List each workload's priorityClassName
(or "unset"). Flag any value >= 1000000, any duplicate tier with
inconsistent values, and any customer-facing service left at default.
Do not output kubectl commands; just the analysis table.

Rolling it out without an outage

Introduce priority gradually. Start by creating the classes and assigning them to a non-critical tier first, then watch for unexpected preemption events for a week before promoting your important services. Pair high-priority classes with PodDisruptionBudgets so voluntary disruptions still respect availability, and make sure your critical pods have honest resources.requests — preemption math is driven entirely by requests, so an under-requested critical pod can be both starved and skipped.

# Audit current usage before and after a rollout
kubectl get pods -A -o custom-columns=\
'NS:.metadata.namespace,POD:.metadata.name,PRIO:.spec.priorityClassName' \
  | sort -k3

If you see preemption you didn’t expect, the fix is usually capacity, not priority. Priority decides who loses when there isn’t enough room; it never creates room. A cluster that preempts constantly is a cluster that needs more nodes, and that’s a signal worth acting on rather than papering over with ever-higher numbers.

Wrapping up

PriorityClass is a small feature with cluster-wide blast radius. Keep the tiers few and widely spaced, anchor everything safely below the system classes, use preemptionPolicy: Never for urgent-but-polite workloads, and lean on an AI assistant to draft and audit the YAML while you keep your own hands on anything that applies it. Set it deliberately, and a capacity crunch becomes a controlled degradation instead of a 2 a.m. mystery. For more on keeping workloads scheduled where you want them, browse the rest of the Kubernetes & Helm guides.

Free download · 368-page PDF

Download the Free 500-Prompt DevOps AI Toolkit

500 battle-tested, copy-paste AI prompts engineered by a senior systems engineer — every one with fill-in placeholders and safety/back-out notes. Drop your email and it's yours.

  • 500 prompts: Linux · Kubernetes · Terraform · OpenStack · GitLab · Docker · Monitoring · Incident Response
  • Instant PDF download — yours free, forever
  • Plus one practical AI-workflow email a week (no spam)

Single opt-in · unsubscribe anytime · no spam.