Kubernetes Error Guide: 'NotReady' Node Status and Kubelet Conditions
Fix Kubernetes nodes stuck in NotReady: diagnose down kubelets, dead containerd, CNI not ready, resource-pressure taints, expired certs, and clock skew.
- #kubernetes
- #troubleshooting
- #errors
- #nodes
Exact Error Message
A node drops out of the cluster and kubectl get nodes reports NotReady:
NAME STATUS ROLES AGE VERSION
node-2 NotReady <none> 42d v1.30.2
kubectl describe node node-2 exposes the underlying condition reason:
Conditions:
Type Status Reason Message
---- ------ ------ -------
Ready False KubeletNotReady container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: cni plugin not initialized
MemoryPressure Unknown NodeStatusUnknown Kubelet stopped posting node status.
Ready Unknown NodeStatusUnknown Kubelet stopped posting node status.
What the Error Means
A node’s Ready condition is set to True only while the kubelet keeps posting a healthy heartbeat to the API server and reports that the container runtime and pod network are usable. When the kubelet cannot confirm a healthy runtime it sets Ready=False with reason KubeletNotReady. When the kubelet stops posting status entirely (process dead, host down, or partitioned from the API server), the node-lifecycle controller flips every condition to Unknown after the node-monitor-grace-period (40s by default) with the message Kubelet stopped posting node status.
A NotReady node is not scheduled new pods, and after pod-eviction-timeout (5m) its existing pods are marked for eviction and rescheduled elsewhere. The distinction matters: KubeletNotReady means the kubelet is alive but unhappy with the runtime/CNI, while NodeStatusUnknown means the control plane lost contact with the kubelet altogether.
Common Causes
- Kubelet down or crash-looping. The kubelet service is stopped, masked, or restarting on bad config/flags, so no heartbeat is posted.
- Container runtime down.
containerd(or CRI-O) is dead or its socket is unreachable; the kubelet cannot create or inspect containers. - CNI not ready. The network plugin has not initialized (
cni plugin not initialized), so the runtime reportsNetworkReady=false. - Resource-pressure taints. Disk, memory, or PID pressure makes the kubelet add
node.kubernetes.io/disk-pressurestyle taints and report degraded conditions. - Kubelet certificate expiry. The kubelet client cert expired and was not rotated, so the API server rejects its status updates (
x509: certificate has expired). - Network partition to the API server. A firewall, routing, or security-group change blocks the node from reaching the API server on 6443.
- Clock skew / NTP drift. A node clock far from the API server’s breaks TLS validity windows and token/lease timing.
How to Reproduce the Error
Stop the container runtime on a worker node and watch the node go NotReady:
# On the node — simulate a runtime outage
sudo systemctl stop containerd
# From a control-plane host
kubectl get nodes -w
NAME STATUS ROLES AGE VERSION
node-2 Ready <none> 42d v1.30.2
node-2 NotReady <none> 42d v1.30.2
Within ~40s the kubelet can no longer talk to the runtime and the node reports NotReady. Restarting containerd (sudo systemctl start containerd) returns it to Ready.
Diagnostic Commands
# Cluster view and the node's conditions
kubectl get nodes -o wide
kubectl describe node node-2 | grep -A12 Conditions
kubectl describe node node-2 | grep -i taint
# On the affected node — are kubelet and the runtime up?
systemctl status kubelet --no-pager
systemctl status containerd --no-pager
journalctl -u kubelet -n 80 --no-pager
journalctl -u containerd -n 40 --no-pager
# Resource pressure (disk / memory / PID)
df -h /var/lib/kubelet /var/lib/containerd /
free -m
cat /sys/fs/cgroup/pids.current 2>/dev/null
# Certificate expiry and API-server reachability
sudo openssl x509 -enddate -noout -in /var/lib/kubelet/pki/kubelet-client-current.pem
curl -k --max-time 5 https://<API_SERVER>:6443/healthz
timedatectl status | grep -i 'synchronized\|NTP'
A kubelet journal full of Failed to get system container stats, cni plugin not initialized, or use of closed network connection points straight at the runtime, CNI, or partition cause respectively.
Step-by-Step Resolution
-
Read the condition reason first.
KubeletNotReadywith a CNI message is a network-plugin problem;NodeStatusUnknownis a lost-heartbeat problem. This decides where you log in.kubectl describe node node-2 | grep -A12 Conditions -
Confirm kubelet and runtime are running. Restart whichever is down.
sudo systemctl status kubelet containerd --no-pager sudo systemctl restart containerd sudo systemctl restart kubelet -
Check for CNI readiness. If the message is
cni plugin not initialized, confirm the CNI config and binaries exist and the CNI pods areReady.ls /etc/cni/net.d/ /opt/cni/bin/ kubectl get pods -n kube-system -l k8s-app=calico-node -o wide -
Relieve resource pressure. If a
disk-pressureormemory-pressuretaint is present, free space (image/log GC) or memory; the taint clears automatically once usage drops below the threshold.sudo crictl rmi --prune sudo journalctl --vacuum-size=200M -
Rotate an expired kubelet certificate. If
opensslshows an expired cert, enable rotation or re-bootstrap.sudo rm /var/lib/kubelet/pki/kubelet-client-current.pem sudo systemctl restart kubelet # re-requests a CSR; approve if needed kubectl get csr | grep Pending kubectl certificate approve <CSR_NAME> -
Drain safely if the node needs work or replacement. Cordon first so nothing new schedules, then drain to evict gracefully.
kubectl cordon node-2 kubectl drain node-2 --ignore-daemonsets --delete-emptydir-data --grace-period=120 # after repair kubectl uncordon node-2 -
Verify recovery.
kubectl get node node-2 -w
Prevention and Best Practices
- Run node-level alerts on the
Readycondition and onkubelet/containerdservice state so you catchNotReadybefore pods evict. - Enable automatic kubelet certificate rotation (
--rotate-certificates) and alert 14 days before any node cert expires. - Set conservative
evictionHard/evictionSoftthresholds and monitor disk on/var/lib/containerdand/var/lib/kubelet; image and log growth is the most common pressure cause. - Enforce time sync (chrony/NTP) on every node and alert on un-synchronized clocks.
- Keep the API-server endpoint reachable from all nodes through firewall/security-group rules that survive infrastructure changes.
- Always
cordonthendrainbefore maintenance so workloads move off gracefully instead of being force-evicted. See more in Kubernetes & Helm guides.
Related Errors
KubeletNotReady: container runtime network not ready— the CNI sub-cause covered in the networkPlugin cni failed to set up pod guide.ImagePullBackOff/ErrImagePull— appears once a node isReadyagain but cannot fetch images; see the ImagePullBackOff guide.node.kubernetes.io/unreachable:NoExecutetaint — added automatically when a node staysNotReady, triggering pod eviction.FailedScheduling: node(s) had untolerated taint— pods cannot land on a pressure-tainted node.
Frequently Asked Questions
Why does my node show Ready=Unknown instead of Ready=False?
Unknown means the API server stopped receiving heartbeats from the kubelet entirely (Kubelet stopped posting node status) — the process is dead, the host is down, or there is a network partition. False means the kubelet is alive and reporting but the runtime or CNI is unhealthy. Check connectivity for Unknown; check the runtime for False.
How long before pods are evicted from a NotReady node?
The node-lifecycle controller waits node-monitor-grace-period (40s) to flag the node, then pods become eligible for eviction after the tolerationSeconds on the auto-added node.kubernetes.io/unreachable taint — 300s (5 minutes) by default.
Will restarting the kubelet disrupt running pods?
No. Restarting kubelet does not stop running containers; the runtime (containerd) keeps them alive. The kubelet re-syncs state on startup. Restarting containerd, however, does briefly disrupt containers.
Is it safe to delete a NotReady node object?
Only after you confirm the node is permanently gone and pods have rescheduled. kubectl delete node removes the object and orphans any still-running pods on that host; cordon and drain first when the node is recoverable.
Download the Free 500-Prompt DevOps AI Toolkit
500 battle-tested, copy-paste AI prompts engineered by a senior systems engineer — every one with fill-in placeholders and safety/back-out notes. Drop your email and it's yours.
- 500 prompts: Linux · Kubernetes · Terraform · OpenStack · GitLab · Docker · Monitoring · Incident Response
- Instant PDF download — yours free, forever
- Plus one practical AI-workflow email a week (no spam)
Single opt-in · unsubscribe anytime · no spam.