node_exporter Deep Dive: The Host Metrics That Actually

The Prometheus node_exporter is the first thing most teams install and the last thing they understand. It emits thousands of time series per host, and on a bad day someone copies a dashboard from the internet that graphs forty of them and explains none. After years of staring at these metrics during real incidents, I reach for maybe twenty. Here’s what they are, the PromQL to turn raw counters into answers, and which collectors to switch off.

CPU: stop reading the wrong number

node_cpu_seconds_total is a counter per CPU per mode. The number you actually want is “what fraction of CPU is not idle,” computed as a rate:

# CPU utilization per instance, 0-1
1 - avg by (instance) (
  rate(node_cpu_seconds_total{mode="idle"}[5m])
)

The mistake I see constantly is graphing mode="user" or mode="system" in isolation. Utilization is 1 - idle. And watch mode="iowait" separately — a box that’s 90% busy in iowait is starved on disk, not CPU, and the fix is completely different.

# Time stalled waiting on I/O
avg by (instance) (rate(node_cpu_seconds_total{mode="iowait"}[5m]))

Memory: “free” is a lie, use available

Linux uses free RAM for cache, so node_memory_MemFree_bytes always looks alarmingly low and means nothing. The kernel computes MemAvailable specifically to answer “how much can I allocate without swapping” — use it:

# Available memory fraction
node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes

Alert when that drops below, say, 10%. Pair it with swap activity, because a box that’s swapping is already in pain even if “available” hasn’t hit zero:

rate(node_vmstat_pgmajfault[5m]) > 0

Disk space: predict, don’t react

Static thresholds on disk fullness page you at 3am for a slow leak that wasn’t urgent. predict_linear forecasts when you’ll actually run out, which is the question you care about:

# Will this filesystem fill within 4 hours?
predict_linear(node_filesystem_avail_bytes{fstype!~"tmpfs|overlay"}[1h], 4*3600) < 0

This fires only when the trend is dangerous, which is far closer to “wake a human” than “85% full.” Exclude tmpfs and overlay or you’ll page on ephemeral container filesystems.

Disk I/O and inodes: the silent killers

Running out of inodes looks exactly like running out of space to the application and nothing like it on a space dashboard. Watch both:

# Inode exhaustion
node_filesystem_files_free / node_filesystem_files < 0.10

# Disk saturation: device busy fraction
rate(node_disk_io_time_seconds_total[5m])

node_disk_io_time_seconds_total increasing toward 1.0 means the device is saturated — that’s your “disk is the bottleneck” signal, and it lines up neatly with the USE method’s saturation column.

Network: errors before bandwidth

Bandwidth graphs are pretty and rarely the problem. Drops and errors are:

rate(node_network_receive_errs_total[5m]) > 0
rate(node_network_receive_drop_total[5m]) > 0

A handful of receive drops under load is normal; a steadily climbing rate is a NIC, driver, or buffer problem worth chasing.

Load average in context

Load average is meaningless without a core count. Normalize it:

node_load5 / count by (instance) (
  node_cpu_seconds_total{mode="idle"}
)

A value above 1.0 means more runnable work than CPUs over the last five minutes. That ratio is comparable across machines of different sizes, which the raw number never is.

Turn off the collectors you don’t use

By default node_exporter enables a pile of collectors, many of which you’ll never query. Each one is series you store and scrape forever. Trim them at startup:

node_exporter \
  --no-collector.wifi \
  --no-collector.infiniband \
  --no-collector.nfs \
  --no-collector.zfs \
  --collector.filesystem.mount-points-exclude='^/(dev|proc|sys|run)($|/)'

The filesystem mount-point exclusion alone often cuts your series count noticeably on container hosts, where /run, overlay mounts, and per-container bind mounts otherwise multiply. Fewer useless series is a real, ongoing cardinality win — the same discipline covered in taming Prometheus metric cardinality.

The textfile collector: your escape hatch

When you need a metric node_exporter doesn’t ship — a backup’s age, a cert’s expiry, a custom hardware check — don’t write a new exporter. Drop a file:

# /var/lib/node_exporter/textfile/backup.prom
backup_last_success_timestamp_seconds 1718150400

Point the exporter at that directory with --collector.textfile.directory and a cron job writes the file. It’s the lowest-effort way to get host-local facts into Prometheus, and it composes with everything else.

A minimal, trustworthy host dashboard

If I had to fit a host’s health on one screen, it’s these:

CPU non-idle and iowait
Memory available fraction and major page faults
Per-filesystem predict_linear to empty, plus inode free
Disk io_time saturation
Network errors/drops
Normalized load

That’s the set I actually look at during an incident. Everything else is for forensics after the fact. Wire those into your monitoring alert routing so the page is meaningful, and skip the forty-panel dashboard you copied off the internet.

PromQL examples assume standard node_exporter label names, which can vary by version and relabeling. Validate queries against your own metrics before alerting on them.

node_exporter Deep Dive: The Host Metrics That Actually Matter