kube-state-metrics vs node_exporter: Monitoring Kubernetes

The most common Kubernetes monitoring mistake I see is treating kube-state-metrics and node_exporter as interchangeable. They are not. One reads the Kubernetes API and tells you what the cluster believes should be true; the other reads the kernel and tells you what’s actually happening on the metal. The interesting incidents live in the gap between those two stories, and you can only see the gap if you understand both exporters.

Two exporters, two questions

node_exporter scrapes the Linux kernel on each node. CPU, memory, disk, network — physical reality. It knows nothing about pods, deployments, or namespaces.
kube-state-metrics (KSM) scrapes the Kubernetes API server. It emits the state of objects: how many replicas a Deployment wants vs has, whether a pod is Pending, when a Job last succeeded, what a container’s resource requests are. It knows nothing about actual CPU usage.

KSM does not measure resource consumption. That’s the line people cross. “How much CPU is this pod using” is not a KSM question — that comes from cAdvisor (container_cpu_usage_seconds_total), exposed by the kubelet. KSM tells you what the pod asked for, not what it used.

What kube-state-metrics is uniquely good at

The killer KSM metrics are the ones about desired vs actual, because that’s where Kubernetes silently fails:

# Deployments not at their desired replica count
kube_deployment_spec_replicas
  != kube_deployment_status_replicas_available

# Pods stuck pending (scheduling failure)
kube_pod_status_phase{phase="Pending"} == 1

# Containers that keep restarting
rate(kube_pod_container_status_restarts_total[15m]) > 0

A deployment that wants 5 replicas and has 3 available is degraded right now — and nothing on a CPU dashboard will tell you. This is the question KSM exists to answer.

The metric that prevents most CrashLoops

Half the “why is my pod OOMKilled” incidents are a requests/limits mismatch, and KSM hands you the inputs to catch it before it pages. Join KSM’s requests against cAdvisor’s usage:

# Pods using >90% of their memory limit
sum by (namespace, pod) (
  container_memory_working_set_bytes{container!=""}
)
/
sum by (namespace, pod) (
  kube_pod_container_resource_limits{resource="memory"}
)
> 0.9

That join — usage from cAdvisor, the limit from KSM — is the single most useful Kubernetes query I run. It’s literally impossible with either exporter alone, which is the whole point of this article.

CPU throttling: the invisible latency tax

Pods don’t get OOMKilled for CPU; they get throttled, silently, and your latency creeps up with no obvious cause. cAdvisor exposes the throttling counter:

# Fraction of CPU periods that were throttled
rate(container_cpu_cfs_throttled_periods_total[5m])
/
rate(container_cpu_cfs_periods_total[5m])
> 0.25

Cross-reference against KSM’s CPU limit to confirm the limit is the cause. A pod throttled 25%+ of periods with a tight CPU limit is starved — raise the limit or remove it.

Node-level reconciliation: where the two meet

Here’s the join that closes the loop between the two exporters. Compare what Kubernetes allocated (KSM) against what the node is actually using (node_exporter):

# Memory committed via requests vs real usage per node
sum by (node) (
  kube_pod_container_resource_requests{resource="memory"}
)
/ on (node)
node_memory_MemTotal_bytes

If requests sum to 95% of a node’s memory but node_exporter shows it’s only 40% used, you’re over-requesting and wasting half the box — the scheduler thinks it’s full while the metal is bored. That reconciliation is pure money, and it needs both exporters.

Don’t forget the control-plane objects

KSM quietly monitors things people forget have a lifecycle:

# CronJobs whose last run failed
kube_job_status_failed{job_name=~".+"} > 0

# PersistentVolumeClaims stuck unbound
kube_persistentvolumeclaim_status_phase{phase="Pending"} == 1

# Certificates / secrets nearing expiry (if exposed)
kube_certificatesigningrequest_condition{condition="Approved"}

A failing CronJob or an unbound PVC is the kind of thing that’s invisible until someone asks why last night’s batch didn’t run. KSM catches it for free.

Scaling KSM itself

On large clusters KSM can become a memory hog and a slow scrape, because it holds the state of every object. Two levers:

# shard across replicas
--shard=0 --total-shards=2

# drop metrics for object kinds you don't query
--metric-allowlist=kube_pod_*,kube_deployment_*,kube_node_*

The allowlist matters more than people think — KSM emits metrics for object kinds most teams never query, and each is series you store forever. Trimming it is the same cardinality hygiene covered across the Prometheus & Monitoring category.

A practical alerting split

Map each exporter to the alerts it’s actually qualified to raise:

From KSM: deployment under-replicated, pods pending, restart storms, jobs failing, PVCs unbound, replicas vs available mismatch.
From cAdvisor: memory near limit, CPU throttling, container OOM events.
From node_exporter: node disk/memory/CPU saturation, the physical layer.

Wire those into a sane monitoring alert routing setup and you get alerts that point at a cause instead of a symptom.

The takeaway

node_exporter is the kernel’s testimony; kube-state-metrics is the API server’s intent; cAdvisor is the actual consumption. The best Kubernetes monitoring isn’t more dashboards — it’s the joins between these sources: requests vs usage, desired vs available, allocated vs real. Master those three joins and your dashboards stop lying to you.

Metric names and labels vary across exporter and Kubernetes versions. Validate every query against your own cluster before relying on it for alerts.

kube-state-metrics vs node_exporter: Monitoring Kubernetes Right