Skip to content
DevOps AI ToolKit
Newsletter
All guides
AI for Kubernetes & Helm By James Joyner IV · · 9 min read

Kubernetes Error Guide: 'NotReady' Node Status and Kubelet Conditions

Fix Kubernetes nodes stuck in NotReady: diagnose down kubelets, dead containerd, CNI not ready, resource-pressure taints, expired certs, and clock skew.

  • #kubernetes
  • #troubleshooting
  • #errors
  • #nodes

Exact Error Message

A node drops out of the cluster and kubectl get nodes reports NotReady:

NAME        STATUS     ROLES    AGE    VERSION
node-2      NotReady   <none>   42d    v1.30.2

kubectl describe node node-2 exposes the underlying condition reason:

Conditions:
  Type             Status    Reason                       Message
  ----             ------    ------                       -------
  Ready            False     KubeletNotReady              container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: cni plugin not initialized
  MemoryPressure   Unknown   NodeStatusUnknown            Kubelet stopped posting node status.
  Ready            Unknown   NodeStatusUnknown            Kubelet stopped posting node status.

What the Error Means

A node’s Ready condition is set to True only while the kubelet keeps posting a healthy heartbeat to the API server and reports that the container runtime and pod network are usable. When the kubelet cannot confirm a healthy runtime it sets Ready=False with reason KubeletNotReady. When the kubelet stops posting status entirely (process dead, host down, or partitioned from the API server), the node-lifecycle controller flips every condition to Unknown after the node-monitor-grace-period (40s by default) with the message Kubelet stopped posting node status.

A NotReady node is not scheduled new pods, and after pod-eviction-timeout (5m) its existing pods are marked for eviction and rescheduled elsewhere. The distinction matters: KubeletNotReady means the kubelet is alive but unhappy with the runtime/CNI, while NodeStatusUnknown means the control plane lost contact with the kubelet altogether.

Common Causes

  • Kubelet down or crash-looping. The kubelet service is stopped, masked, or restarting on bad config/flags, so no heartbeat is posted.
  • Container runtime down. containerd (or CRI-O) is dead or its socket is unreachable; the kubelet cannot create or inspect containers.
  • CNI not ready. The network plugin has not initialized (cni plugin not initialized), so the runtime reports NetworkReady=false.
  • Resource-pressure taints. Disk, memory, or PID pressure makes the kubelet add node.kubernetes.io/disk-pressure style taints and report degraded conditions.
  • Kubelet certificate expiry. The kubelet client cert expired and was not rotated, so the API server rejects its status updates (x509: certificate has expired).
  • Network partition to the API server. A firewall, routing, or security-group change blocks the node from reaching the API server on 6443.
  • Clock skew / NTP drift. A node clock far from the API server’s breaks TLS validity windows and token/lease timing.

How to Reproduce the Error

Stop the container runtime on a worker node and watch the node go NotReady:

# On the node — simulate a runtime outage
sudo systemctl stop containerd

# From a control-plane host
kubectl get nodes -w
NAME        STATUS     ROLES    AGE    VERSION
node-2      Ready      <none>   42d    v1.30.2
node-2      NotReady   <none>   42d    v1.30.2

Within ~40s the kubelet can no longer talk to the runtime and the node reports NotReady. Restarting containerd (sudo systemctl start containerd) returns it to Ready.

Diagnostic Commands

# Cluster view and the node's conditions
kubectl get nodes -o wide
kubectl describe node node-2 | grep -A12 Conditions
kubectl describe node node-2 | grep -i taint
# On the affected node — are kubelet and the runtime up?
systemctl status kubelet --no-pager
systemctl status containerd --no-pager
journalctl -u kubelet -n 80 --no-pager
journalctl -u containerd -n 40 --no-pager
# Resource pressure (disk / memory / PID)
df -h /var/lib/kubelet /var/lib/containerd /
free -m
cat /sys/fs/cgroup/pids.current 2>/dev/null
# Certificate expiry and API-server reachability
sudo openssl x509 -enddate -noout -in /var/lib/kubelet/pki/kubelet-client-current.pem
curl -k --max-time 5 https://<API_SERVER>:6443/healthz
timedatectl status | grep -i 'synchronized\|NTP'

A kubelet journal full of Failed to get system container stats, cni plugin not initialized, or use of closed network connection points straight at the runtime, CNI, or partition cause respectively.

Step-by-Step Resolution

  1. Read the condition reason first. KubeletNotReady with a CNI message is a network-plugin problem; NodeStatusUnknown is a lost-heartbeat problem. This decides where you log in.

    kubectl describe node node-2 | grep -A12 Conditions
  2. Confirm kubelet and runtime are running. Restart whichever is down.

    sudo systemctl status kubelet containerd --no-pager
    sudo systemctl restart containerd
    sudo systemctl restart kubelet
  3. Check for CNI readiness. If the message is cni plugin not initialized, confirm the CNI config and binaries exist and the CNI pods are Ready.

    ls /etc/cni/net.d/ /opt/cni/bin/
    kubectl get pods -n kube-system -l k8s-app=calico-node -o wide
  4. Relieve resource pressure. If a disk-pressure or memory-pressure taint is present, free space (image/log GC) or memory; the taint clears automatically once usage drops below the threshold.

    sudo crictl rmi --prune
    sudo journalctl --vacuum-size=200M
  5. Rotate an expired kubelet certificate. If openssl shows an expired cert, enable rotation or re-bootstrap.

    sudo rm /var/lib/kubelet/pki/kubelet-client-current.pem
    sudo systemctl restart kubelet   # re-requests a CSR; approve if needed
    kubectl get csr | grep Pending
    kubectl certificate approve <CSR_NAME>
  6. Drain safely if the node needs work or replacement. Cordon first so nothing new schedules, then drain to evict gracefully.

    kubectl cordon node-2
    kubectl drain node-2 --ignore-daemonsets --delete-emptydir-data --grace-period=120
    # after repair
    kubectl uncordon node-2
  7. Verify recovery.

    kubectl get node node-2 -w

Prevention and Best Practices

  • Run node-level alerts on the Ready condition and on kubelet/containerd service state so you catch NotReady before pods evict.
  • Enable automatic kubelet certificate rotation (--rotate-certificates) and alert 14 days before any node cert expires.
  • Set conservative evictionHard/evictionSoft thresholds and monitor disk on /var/lib/containerd and /var/lib/kubelet; image and log growth is the most common pressure cause.
  • Enforce time sync (chrony/NTP) on every node and alert on un-synchronized clocks.
  • Keep the API-server endpoint reachable from all nodes through firewall/security-group rules that survive infrastructure changes.
  • Always cordon then drain before maintenance so workloads move off gracefully instead of being force-evicted. See more in Kubernetes & Helm guides.
  • KubeletNotReady: container runtime network not ready — the CNI sub-cause covered in the networkPlugin cni failed to set up pod guide.
  • ImagePullBackOff / ErrImagePull — appears once a node is Ready again but cannot fetch images; see the ImagePullBackOff guide.
  • node.kubernetes.io/unreachable:NoExecute taint — added automatically when a node stays NotReady, triggering pod eviction.
  • FailedScheduling: node(s) had untolerated taint — pods cannot land on a pressure-tainted node.

Frequently Asked Questions

Why does my node show Ready=Unknown instead of Ready=False? Unknown means the API server stopped receiving heartbeats from the kubelet entirely (Kubelet stopped posting node status) — the process is dead, the host is down, or there is a network partition. False means the kubelet is alive and reporting but the runtime or CNI is unhealthy. Check connectivity for Unknown; check the runtime for False.

How long before pods are evicted from a NotReady node? The node-lifecycle controller waits node-monitor-grace-period (40s) to flag the node, then pods become eligible for eviction after the tolerationSeconds on the auto-added node.kubernetes.io/unreachable taint — 300s (5 minutes) by default.

Will restarting the kubelet disrupt running pods? No. Restarting kubelet does not stop running containers; the runtime (containerd) keeps them alive. The kubelet re-syncs state on startup. Restarting containerd, however, does briefly disrupt containers.

Is it safe to delete a NotReady node object? Only after you confirm the node is permanently gone and pods have rescheduled. kubectl delete node removes the object and orphans any still-running pods on that host; cordon and drain first when the node is recoverable.

Free download · 368-page PDF

Download the Free 500-Prompt DevOps AI Toolkit

500 battle-tested, copy-paste AI prompts engineered by a senior systems engineer — every one with fill-in placeholders and safety/back-out notes. Drop your email and it's yours.

  • 500 prompts: Linux · Kubernetes · Terraform · OpenStack · GitLab · Docker · Monitoring · Incident Response
  • Instant PDF download — yours free, forever
  • Plus one practical AI-workflow email a week (no spam)

Single opt-in · unsubscribe anytime · no spam.