Skip to content
DevOps AI ToolKit
Newsletter
All guides
AI for Kubernetes & Helm By James Joyner IV · · 9 min read

Kubernetes Error Guide: 'PIDPressure True' Node Condition and Eviction Taint

Fix Kubernetes PIDPressure: the pid.available eviction threshold, fork bombs, the pid-pressure taint, podPidsLimit, and processes exhausting node PIDs.

  • #kubernetes-helm
  • #troubleshooting
  • #errors
  • #node

Exact Error Message

When the number of available process IDs on a node falls below the kubelet threshold, the node reports PIDPressure=True, gains a taint, and may evict pods:

$ kubectl describe node worker-5
Conditions:
  Type             Status  Reason                 Message
  ----             ------  ------                 -------
  PIDPressure      True    KubeletHasInsufficientPID  kubelet has insufficient PID available
Taints:            node.kubernetes.io/pid-pressure:NoSchedule

Events:
  Type     Reason               Age   From     Message
  ----     ------               ----  ----     -------
  Warning  EvictionThresholdMet 12s   kubelet  Attempting to reclaim pids
  Warning  Evicted              10s   kubelet  The node was low on resource: pids.

Workloads on the node may also fail to start new processes:

fork: retry: Resource temporarily unavailable
sh: can't fork: Resource temporarily unavailable

What the Error Means

Every process and thread on a Linux node consumes a process ID, and the kernel caps the total via kernel.pid_max. The kubelet monitors pid.available — the number of PIDs still free — and when it drops below the eviction threshold (commonly configured as pid.available<10% or an absolute count), it sets PIDPressure=True.

As with disk and memory pressure, the kubelet then taints the node node.kubernetes.io/pid-pressure:NoSchedule so no new pods schedule, and it can evict pods to reclaim PIDs. PID exhaustion is dangerous because once the table is full, nothing on the node — including the kubelet and SSH — can fork, so the node can become effectively unmanageable.

Common Causes

  • Fork bomb or runaway thread creation — a buggy or malicious process spawning processes/threads without bound.
  • Zombie/defunct accumulation — a parent that never reaps children leaves PIDs consumed.
  • Thread-heavy apps — JVMs or connection-per-thread servers creating thousands of threads, each a PID.
  • No per-pod PID limit — without podPidsLimit, one pod can consume the node’s entire PID space.
  • Low kernel.pid_max on dense nodes running many containers.
  • Subprocess leaks — apps that spawn shell/helper processes per request and never clean them up.

How to Reproduce the Error

On a disposable test node, run a controlled fork loop inside a pod (do this only in a sandbox):

apiVersion: v1
kind: Pod
metadata:
  name: pid-hog
spec:
  containers:
    - name: hog
      image: busybox
      command: ["sh", "-c", "i=0; while true; do sleep 600 & i=$((i+1)); done"]
kubectl apply -f pid-hog.yaml
kubectl describe node <NODE> | grep -A2 PIDPressure

Each backgrounded sleep consumes a PID; as pid.available falls under the threshold the kubelet flips PIDPressure=True, taints the node, and begins evicting pods.

Diagnostic Commands

# Confirm the condition and taint
kubectl describe node <NODE> | grep -A2 'PIDPressure\|Taints'

# Eviction events
kubectl get events -A --field-selector reason=Evicted --sort-by=.lastTimestamp

# Inspect kernel PID limits and current usage on the node (read-only)
kubectl debug node/<NODE> -it --image=busybox -- cat /proc/sys/kernel/pid_max
kubectl debug node/<NODE> -it --image=busybox -- sh -c 'ls /proc | grep -c "^[0-9]"'

# Find the pod/container creating the most processes
kubectl debug node/<NODE> -it --image=busybox -- ps -eo pid,ppid,comm | wc -l

# Check the configured PID eviction threshold and podPidsLimit
kubectl get --raw "/api/v1/nodes/<NODE>/proxy/configz" | grep -o 'pid[^,]*'

Comparing live process count against kernel.pid_max confirms how close the node is to exhaustion and which workload is responsible.

Step-by-Step Resolution

1. Confirm PID exhaustion is the cause. Look for fork: Resource temporarily unavailable in app logs and the PIDPressure=True condition together. EAGAIN on fork is the signature of running out of PIDs.

2. Identify the offending pod. Use a node debug shell to count processes per container; the pod with thousands of processes or growing zombie count is the culprit. Kill or restart it to release PIDs immediately.

3. Set a per-pod PID limit. Configure the kubelet’s podPidsLimit so no single pod can consume the node’s whole PID space:

# kubelet config (KubeletConfiguration)
podPidsLimit: 1024

Pods that try to exceed the limit get fork failures contained to themselves instead of taking down the node.

4. Fix the application. Cap thread pools, reap child processes (use an init like tini/shareProcessNamespace reaping, or dumb-init), and eliminate per-request subprocess leaks.

5. Raise kernel.pid_max if legitimately needed. On dense nodes with many well-behaved containers, increasing pid_max gives more headroom — but this masks leaks, so fix the workload first.

6. Wait for the taint to clear. Once PIDs are reclaimed above the threshold, the kubelet clears PIDPressure and removes the node.kubernetes.io/pid-pressure taint; scheduling resumes.

Prevention and Best Practices

  • Always set podPidsLimit at the kubelet level so one pod cannot exhaust node PIDs — this is the single most effective safeguard.
  • Run apps with a proper init/reaper so zombies are collected and defunct PIDs do not accumulate.
  • Bound thread pools and connection-per-thread servers; treat threads as the PID-consuming resources they are.
  • Monitor node_processes / PID usage versus kernel.pid_max and alert before the eviction threshold.
  • Set kernel.pid_max appropriately for node density, but never as a substitute for fixing a leak.
  • Test workloads under load for unbounded process growth before production. More in our Kubernetes & Helm guides.

Frequently Asked Questions

Why does PIDPressure make the whole node unstable, not just one pod? PIDs are a node-wide kernel resource. Once kernel.pid_max is reached, no process can fork — including the kubelet, container runtime, and your SSH session. That is why a single fork bomb can render an entire node unmanageable, and why podPidsLimit per pod is essential.

What counts as a PID — processes or threads? Both. On Linux each thread has a task ID drawn from the same space governed by pid_max. A single process with 10,000 threads consumes roughly 10,000 PIDs, so thread-heavy apps trip PIDPressure even with few “processes”.

My app logs fork: retry: Resource temporarily unavailable but the node looks fine. That EAGAIN means the pod’s podPidsLimit (or a cgroup pids limit) was hit, not necessarily the whole node. Check the pod’s PID limit; per-pod limits cause this without raising the node condition.

How does the kubelet decide which pod to evict for PIDs? Similar to other resources, it ranks pods by their PID usage relative to QoS and the configured podPidsLimit, evicting the heaviest BestEffort/Burstable consumers first to reclaim the most PIDs quickly.

Free download · 368-page PDF

Download the Free 500-Prompt DevOps AI Toolkit

500 battle-tested, copy-paste AI prompts engineered by a senior systems engineer — every one with fill-in placeholders and safety/back-out notes. Drop your email and it's yours.

  • 500 prompts: Linux · Kubernetes · Terraform · OpenStack · GitLab · Docker · Monitoring · Incident Response
  • Instant PDF download — yours free, forever
  • Plus one practical AI-workflow email a week (no spam)

Single opt-in · unsubscribe anytime · no spam.