Capacity Planning With Prometheus Queries That Predict

Capacity planning has a reputation as a spreadsheet exercise done quarterly by someone in a finance-adjacent role. In practice the data you need is already in Prometheus — you’re just not querying it forward. The same metrics that page you when a disk fills can tell you when it will fill, days in advance, so you provision on a Tuesday afternoon instead of getting paged on a Saturday night. Here are the PromQL patterns I use to turn reactive monitoring into proactive planning.

predict_linear: the workhorse

The single most useful function for capacity work is predict_linear. It fits a least-squares line to a range of samples and extrapolates it forward by however many seconds you specify. The canonical disk-fill prediction:

# will this filesystem hit zero in the next 4 hours (14400s)?
predict_linear(node_filesystem_avail_bytes{mountpoint="/"}[6h], 4 * 3600) < 0

Read it as: take the last 6 hours of available bytes, fit a trend, and project 4 hours ahead. If the projection goes below zero, the disk is on track to fill within 4 hours. Alert on that and you get a warning while you can still act, not a page when it’s already full.

The window choice matters. A 6h fit smooths out short spikes; a 1h fit reacts faster but is noisier. For slow-moving resources like disk, longer windows give steadier predictions.

Turning prediction into a useful alert

The bare prediction is a boolean. The useful alert says how long you have:

- alert: DiskWillFillSoon
  expr: |
    predict_linear(node_filesystem_avail_bytes{mountpoint="/"}[6h], 4*3600) < 0
    and
    node_filesystem_avail_bytes{mountpoint="/"} / node_filesystem_size_bytes{mountpoint="/"} < 0.2
  for: 30m
  labels:
    severity: ticket
  annotations:
    summary: "{{ $labels.instance }} root fs projected to fill within 4h"

The second clause (< 0.2) guards against false alarms when a disk briefly dips but has tons of headroom. You only care about the prediction once you’re already below 20% — combining a trend with a current threshold is what keeps predictive alerts from flapping.

Headroom queries: how much is left, right now

Before forecasting, know your current headroom. These are the queries I keep on a capacity dashboard:

# CPU headroom per node (fraction of cores idle)
1 - avg by (instance) (rate(node_cpu_seconds_total{mode="idle"}[5m])) * -1
# (clearer form: fraction of CPU actually used)
avg by (instance) (1 - rate(node_cpu_seconds_total{mode="idle"}[5m]))

# memory headroom
node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes

# disk headroom
node_filesystem_avail_bytes / node_filesystem_size_bytes

Headroom is a fraction; multiply by 100 if you want a percentage in the panel. The value of expressing it as a fraction is that you can alert on < 0.15 uniformly across resource types.

A snapshot of headroom doesn’t tell you direction. The deriv function gives you the per-second rate of change of a gauge, which is the slope you actually care about:

# bytes-per-second growth of disk usage over the last day
deriv(node_filesystem_size_bytes{mountpoint="/data"}[1d]
      - node_filesystem_avail_bytes{mountpoint="/data"}[1d])

# how many days until a metric hits a ceiling, computed inline
(8e9 - container_memory_working_set_bytes{pod="cache-0"})
  / deriv(container_memory_working_set_bytes{pod="cache-0"}[6h])
  / 86400

That second query is a hand-rolled “days remaining” estimate: remaining headroom divided by the growth slope, converted to days. It’s rougher than predict_linear but it gives you a number a human can reason about — “we have 11 days” lands better in a planning meeting than a boolean.

Kubernetes capacity: requests vs allocatable

In Kubernetes, capacity planning is about scheduling pressure, not just raw utilization. The question is whether the cluster can fit more pods, which is governed by requests against allocatable:

# CPU requested as a fraction of allocatable, per node
sum by (node) (kube_pod_container_resource_requests{resource="cpu"})
  / sum by (node) (kube_node_status_allocatable{resource="cpu"})

# cluster-wide memory commitment
sum(kube_pod_container_resource_requests{resource="memory"})
  / sum(kube_node_status_allocatable{resource="memory"})

A cluster can be at 30% CPU utilization but 95% CPU requested — fully committed and unable to schedule new pods despite idle cores. That gap is the most common capacity surprise in Kubernetes, and these queries surface it before the scheduler starts rejecting pods.

Peak-aware planning with quantiles over time

Provisioning to average load gets you paged at peak. Plan to a high quantile of your historical load instead:

# 95th percentile of request rate over the last week, hourly
quantile_over_time(0.95,
  sum(rate(http_requests_total[5m]))[7d:1h])

The subquery [7d:1h] evaluates the inner rate every hour across a week, then quantile_over_time takes the 95th percentile of those values. That’s your realistic peak — provision for it, not for the misleadingly comfortable mean.

Putting it into a cadence

The workflow I run: a capacity dashboard with headroom gauges and predict_linear panels, a weekly review of the deriv-based “days remaining” numbers, and predictive ticket-severity alerts (never pages) that open a few days before a resource crosses a threshold. The whole point is to move capacity from the 3am pager onto a calendar. Prometheus already holds the data; these queries just ask it about the future instead of the present.

For the host and Kubernetes metrics these forecasts depend on, see our node_exporter and kube-state-metrics guides in the Prometheus and monitoring category. And to turn these predictions into well-tuned alerts, our monitoring alert assistant reviews the rules so they warn early without flapping.

Linear extrapolation assumes the recent trend continues; it’s blind to step changes and seasonality. Treat predictions as early warnings, not guarantees.