quantile_over_time vs histogram_quantile: Which Percentile to Trust
Two PromQL functions compute percentiles in completely different ways, and picking the wrong one gives a confidently wrong number. Here's how to choose and verify.
- #prometheus-monitoring
- #ai
- #promql
- #percentiles
- #histograms
Percentiles are where PromQL quietly lies. Two functions advertise themselves as the way to get a p95 or p99 — quantile_over_time and histogram_quantile — and they compute fundamentally different things from fundamentally different data. Use the wrong one and you don’t get an error. You get a number that’s plausible, that you put on a dashboard, that you cite in an SLO review, and that’s simply wrong. The disagreements that follow (“the dashboard says p99 is 200ms but users are timing out”) usually trace back to a function-and-data-shape mismatch nobody noticed.
The two functions, precisely
quantile_over_time(0.95, some_gauge[5m]) takes the 95th percentile of the sampled values of a gauge over the window. The crucial word is sampled. It only sees the snapshots taken at scrape time. If your scrape interval is 15 seconds, it sees one value every 15 seconds and computes the percentile of those points. Anything that happens between scrapes is invisible to it.
histogram_quantile(0.95, rate(some_metric_bucket[5m])) interpolates the 95th percentile from bucketed observation counts. Every single observation lands in a bucket as it happens, so it sees all of them. But its precision is bounded by where the bucket boundaries sit — it interpolates linearly within whichever bucket the quantile falls into.
# Gauge sampled over time — sees only scrape-time snapshots
quantile_over_time(0.95, queue_depth[5m])
# Histogram — sees every observation, limited by bucket layout
histogram_quantile(0.95, rate(http_request_duration_seconds_bucket[5m]))
So the rule of thumb: histogram_quantile for things you instrumented as histograms (latency, sizes), quantile_over_time for the distribution of a gauge’s sampled values over time. Cross them and the result is wrong.
The silent trap in each
quantile_over_time misses sub-scrape spikes. A service that spikes to 2 seconds of latency for 5 seconds, between two 15-second scrapes, contributes nothing — the scrapes landed on the calm moments. You get a clean p99 that hides the tail. This is dangerous precisely because it looks reassuring. And applying it to a counter is meaningless: you’d be taking the percentile of an ever-increasing number.
histogram_quantile is only as good as the buckets. If your p99 lands in the top bucket — say everything above le="1" collapses into le="+Inf" — the function interpolates across that entire unbounded range. The “p99” is a guess over a gap, not a measurement. A poorly chosen bucket layout produces a confident-looking percentile that’s pure interpolation.
# Sanity-bound a histogram quantile against the +Inf total
histogram_quantile(0.99, rate(http_request_duration_seconds_bucket[5m]))
# Compare buckets: if p99 sits in or near le="+Inf", distrust it.
sum(rate(http_request_duration_seconds_bucket{le="+Inf"}[5m]))
Diagnosing a disputed number with AI
When two people disagree about a percentile, an assistant is useful for figuring out which trap is biting — provided you give it the metric type and make it reason about sampling:
Someone says my p99 latency dashboard reads 180ms but customers report multi-second waits. The query is quantile_over_time(0.99, request_latency_gauge[5m]) and the scrape interval is 30s. Which function should I be using, what’s the trap, and how do I verify?
The trap is
quantile_over_timeon a gauge scraped every 30s — it only sees 10 samples per 5m window and misses spikes that happen between scrapes, so your real tail is invisible. A gauge can’t give accurate p99 latency at that sampling rate; instrument a histogram instead and usehistogram_quantile(0.99, rate(request_latency_seconds_bucket[5m])), which sees every request. To confirm the gap today, compare your gauge p99 againstmax_over_time(request_latency_gauge[5m])— if max is multi-second while p99 reads 180ms, sampling is hiding the tail.
The model drafts the diagnosis and a cross-check; you run the check and confirm. Notice the most valuable move: it was willing to say the metric is the wrong shape for the question. No query fixes a sparsely-sampled gauge — the answer is to instrument a histogram. That honesty is what you want from AI-assisted PromQL, and it’s the pattern across the Prometheus and monitoring prompts.
When each is the right call
- Use
histogram_quantilefor latency, payload size, or anything you can instrument as a histogram and where you care about the tail. Pair it with sane bucket boundaries so the quantile doesn’t land in+Inf. - Use
quantile_over_timefor the distribution of a gauge over time — for example, “what was the 95th-percentile queue depth this hour” — where the value genuinely is a point-in-time measurement and sub-scrape behavior doesn’t matter. - When in doubt, bound the answer: a real p99 must sit between the observed min and
max_over_time, andhistogram_countover a window should track your event rate.
The bottom line
quantile_over_time percentiles the sampled values of a gauge and is blind between scrapes; histogram_quantile percentiles bucketed counts of every observation and is bounded by your bucket layout. Match the function to the data shape, know each one’s silent failure mode, and bound every percentile with an independent cross-check before you quote it. For help choosing and verifying, the quantile_over_time vs histogram_quantile prompt and the histogram bucket boundary design prompt keep your percentiles honest before they end up in an SLO.
Download the Free 500-Prompt DevOps AI Toolkit
500 battle-tested, copy-paste AI prompts engineered by a senior systems engineer — every one with fill-in placeholders and safety/back-out notes. Drop your email and it's yours.
- 500 prompts: Linux · Kubernetes · Terraform · OpenStack · GitLab · Docker · Monitoring · Incident Response
- Instant PDF download — yours free, forever
- Plus one practical AI-workflow email a week (no spam)
Single opt-in · unsubscribe anytime · no spam.