Skip to content
DevOps AI ToolKit
Newsletter
All guides
AI for Prometheus & Monitoring By James Joyner IV · · 10 min read

PromQL Subqueries and _over_time: Trend Analysis Without the Guesswork

A practical guide to PromQL subqueries and the _over_time family for spotting trends, slow leaks, and daily peaks, plus why recording rules often win.

  • #prometheus
  • #promql
  • #subqueries
  • #trends
  • #sre

The first time I tried to answer “what was the worst this service got over the last 24 hours?” I wrote a query that returned the current rate, not the peak. It looked right in the panel. It was wrong in the worst way: confidently wrong. PromQL’s instant-vector functions only see one moment, and trend questions are about a window of moments. That gap is exactly what the _over_time family and subqueries close. Once I understood the syntax, a whole class of investigations — slow memory leaks, creeping latency, weekly capacity drift — became a single expression instead of an afternoon of squinting at graphs.

The shape of an _over_time aggregation

The _over_time functions take a range vector and collapse each series down to one number per series, summarizing the samples inside that window. The family is small and worth memorizing:

max_over_time(node_memory_MemAvailable_bytes[1h])
min_over_time(node_memory_MemAvailable_bytes[1h])
avg_over_time(node_memory_MemAvailable_bytes[1h])
quantile_over_time(0.95, http_request_duration_seconds[30m])
stddev_over_time(process_resident_memory_bytes[6h])

Read max_over_time(metric[1h]) as “for each series, the highest sample seen in the last hour.” This works directly on a raw gauge because a gauge already has samples at every scrape. The subtlety arrives when you want the max of something that isn’t stored — like a rate.

Why you need subqueries

rate() produces an instant vector. You cannot feed an instant vector into max_over_time, which demands a range vector. So how do you ask “what was the peak request rate over the last day”? You can’t write max_over_time(rate(...)[1d]) — that’s a type error, because rate(...) already returned an instant value, not a range.

A subquery fixes this. It evaluates an inner expression repeatedly across a window and hands the results back as a synthetic range vector you can then aggregate:

max_over_time(
  rate(http_requests_total[5m])[1d:1m]
)

The [1d:1m] is the subquery part. The first number is the range (look back 1 day); the second is the resolution (re-evaluate every 1 minute). So Prometheus computes rate(http_requests_total[5m]) at one-minute steps across the past day, then max_over_time picks the largest result. That gives you the genuine peak rate, not the rate happening right now.

The resolution is optional — [1d:] defaults to the global evaluation interval — but I always set it explicitly. An implicit resolution is a silent performance and accuracy decision, and silent decisions are the ones that bite.

Pro Tip: The inner range ([5m] here) and the subquery resolution (1m) are independent knobs. Keep the inner range at least as wide as your scrape interval so each rate() has enough samples, and set the resolution fine enough to catch the spikes you care about — but no finer, because resolution drives cost linearly.

The single most useful trend pattern I reach for is deriv over a subquery, or simply comparing a metric to itself across time. To catch a slow memory leak, look at the average slope of memory usage over several hours:

deriv(process_resident_memory_bytes[1h]) > 0

That flags any process whose memory is trending upward right now. But a leak is sneaky precisely because the instantaneous slope wobbles. Smooth it with a subquery so you see the sustained direction:

avg_over_time(
  deriv(process_resident_memory_bytes[1h])[6h:10m]
) > 1024

This says: every 10 minutes over the last 6 hours, compute the hourly memory slope, then average those slopes. If the average slope exceeds ~1KB/s, something is genuinely accumulating rather than just breathing. A short spike won’t trip it; a steady climb will.

For latency drift, quantile_over_time is the workhorse. To see whether your p95 latency has a bad ceiling over the day rather than just at this instant:

quantile_over_time(0.95,
  rate(http_request_duration_seconds_sum[5m])[1d:5m]
)

Pro Tip: quantile_over_time computes the quantile of samples over time for each series — it is not the same as a histogram histogram_quantile, which computes a quantile across buckets at one instant. Mixing them up produces numbers that look plausible and mean nothing. When in doubt, write down in plain English which dimension you are summarizing over.

Comparing windows to spot drift

Trends are relative. A clean way to express “are we worse than yesterday” is to subtract a past window from a recent one. The offset modifier shifts a range backwards in time:

avg_over_time(node_load1[1h])
  -
avg_over_time(node_load1[1h] offset 1d)

Positive output means today’s hourly load average is running hotter than the same hour yesterday — a tidy week-over-week or day-over-day regression detector you can alert on. This is also the backbone of forward-looking work; if you want to project these trends into the future, that crosses into capacity planning with predictive queries, where predict_linear takes over.

The performance cost nobody warns you about

Subqueries are expensive, and the cost is easy to underestimate because the syntax is so compact. Consider:

max_over_time(rate(http_requests_total[5m])[7d:1m])

At a 1-minute resolution over 7 days, Prometheus must evaluate the inner rate() roughly 10,080 times, each evaluation scanning its own [5m] of raw samples — and it does this for every series matching http_requests_total. On a metric with thousands of label combinations, this single query can read hundreds of millions of samples. Run it on a dashboard that refreshes every 15 seconds and you have effectively built a denial-of-service tool aimed at your own Prometheus.

This is why recording rules usually beat subqueries for anything you query repeatedly. A recording rule pre-computes the inner expression once per evaluation interval and stores it as a new series:

# recording rule, evaluated every 1m
record: job:http_requests:rate5m
expr:   rate(http_requests_total[5m])

Then your trend query becomes cheap, because the hard part is already materialized:

max_over_time(job:http_requests:rate5m[7d])

Same answer, a fraction of the cost, and no subquery at all. The rule of thumb I follow: use subqueries for ad-hoc exploration and one-off investigations, then promote anything that lands on a dashboard or an alert into a recording rule. I wrote up the full pattern in recording rules that make queries fast if you want the migration playbook.

Where AI fits — and where review still matters

I treat an AI assistant like a fast, eager junior engineer. Ask Claude, ChatGPT, or Cursor to “write a PromQL query for the peak p99 latency per service over the last 24 hours” and you’ll get a syntactically valid subquery in seconds — far quicker than I’d recall the exact quantile_over_time argument order from memory. That speed is real and worth using.

What the junior engineer cannot do is feel the cost. The model will happily hand you [30d:15s] without flinching, and it has no idea your cardinality is 50,000 series. So the review checklist before shipping any AI-generated subquery is short but non-negotiable:

  • Type check. Does the inner expression already return an instant vector? If so, the subquery brackets are correct; if it returns a range vector, you’ve double-counted.
  • Resolution sanity. Multiply window ÷ resolution × series count. If that number scares you, it should.
  • Semantic check. Is quantile_over_time really what you want, or did you mean histogram_quantile?
  • Promotion path. Will this run more than once? Then it belongs in a recording rule.

Because subqueries are explainable, you can actually verify them — read the brackets aloud, predict the output, and confirm. That reviewability is the whole point: AI gives you the draft, you supply the judgment. If you’d rather skip straight to vetted alerting expressions, the free Alert Rule Generator produces deterministic, reviewable rules you can drop into Prometheus without hand-tuning the syntax. And when you inherit a query you didn’t write, untangling inherited PromQL with AI walks through reverse-engineering it safely.

Conclusion

The _over_time family answers “what happened across this window,” and subqueries extend that power to expressions Prometheus doesn’t store — like the max of a rate. They turn vague trend questions into precise, single-line answers. Just respect the cost: explore with subqueries, ship with recording rules. Let AI draft the syntax at junior-engineer speed, then read every bracket before it touches production. Fast and correct is better than fast and confidently wrong.

Free download · 368-page PDF

Download the Free 500-Prompt DevOps AI Toolkit

500 battle-tested, copy-paste AI prompts engineered by a senior systems engineer — every one with fill-in placeholders and safety/back-out notes. Drop your email and it's yours.

  • 500 prompts: Linux · Kubernetes · Terraform · OpenStack · GitLab · Docker · Monitoring · Incident Response
  • Instant PDF download — yours free, forever
  • Plus one practical AI-workflow email a week (no spam)

Single opt-in · unsubscribe anytime · no spam.