PromQL rate() vs irate() vs increase(): When Each One Lies to You
A working SRE's guide to PromQL rate, irate, and increase on counters: extrapolation, lookback gotchas, when each misleads, and reviewing AI-drafted queries.
- #prometheus
- #promql
- #counters
- #sre
The first time a counter function bit me, it was 2 a.m. and a dashboard told me request throughput had dropped to “0.6 requests per second” on a service that was very obviously serving a few thousand. Nothing was actually broken. I had just written irate() over a [1m] window on a metric scraped every 30 seconds, and the panel was showing me the slope between the last two samples — which happened to land in a quiet pocket. The traffic was fine. My query was the incident.
That night taught me the thing nobody puts on the function reference page: rate(), irate(), and increase() all operate on the same counter, but they make different assumptions about time, and each one will hand you a confidently wrong number under the right conditions. Let me walk through where each one lies, with real queries, because this is the stuff that turns into 3 a.m. pages.
Counters, resets, and the shared foundation
All three functions assume the input is a counter — a value that only goes up, except when the process restarts and it falls back to zero. That reset handling is the whole reason these functions exist. You can’t just subtract two raw samples, because a restart in the middle would give you a giant negative spike.
# Wrong: raw subtraction breaks on counter resets
http_requests_total - http_requests_total offset 5m
# Right: rate() detects the reset and corrects for it
rate(http_requests_total[5m])
Every one of these functions auto-corrects resets by assuming any drop means “the counter went back to zero.” That’s usually right. It’s wrong if your “counter” can legitimately decrease — which means it isn’t a counter, and you shouldn’t be wrapping it in rate() at all. Treat that as the first review question for any AI-suggested query: is this metric actually monotonic?
rate(): the default, and the extrapolation surprise
rate() gives you the per-second average rate of increase over the whole range you give it. Over [5m], it looks at all the samples in that window, fits the increase across them, and divides by the seconds.
# Average requests/sec over the last 5 minutes, smoothed
sum by (job) (rate(http_requests_total[5m]))
The surprise hiding inside rate() is extrapolation. Prometheus rarely has a sample exactly at the start and end of your window. So it extrapolates to the window boundaries — and if the window starts very close to the counter’s “birth” (the metric just appeared, or just reset), it extrapolates toward zero rather than inventing data, but it still adjusts the endpoints. The practical effect: rate() results can be slightly higher or lower than a naive hand calculation, and over very short ranges that adjustment becomes a meaningful fraction of the answer.
Pro Tip: A rate() range that’s too short relative to your scrape interval produces noisy, gappy results. The well-worn rule of thumb is to use a range at least 4x your scrape interval — [2m] for a 30s scrape, [4m] for a 1m scrape — so every evaluation reliably contains 4+ samples.
This is exactly the kind of subtlety where an AI assistant behaves like a fast junior engineer: it will happily produce rate(http_requests_total[1m]) because it reads cleanly, without knowing your scrape interval is 30s and that [1m] gives you only two samples to work with. The draft is fine. The review is where you catch it.
irate(): instant rate, and why it’s spiky
irate() ignores everything in your range except the last two samples and computes the slope between just those. It’s designed for fast-moving graphs where you want to see sudden changes that rate() would smooth away.
# Instant rate — only uses the final two samples in the window
irate(node_network_receive_bytes_total[5m])
Notice the range is still [5m] even though irate() only uses two points — the range just has to be wide enough to contain two samples. The danger is that those two samples might both land in a lull, or both catch a burst, and your graph swings wildly. On a busy dashboard panel irate() looks like static. That’s not your traffic being erratic; that’s the function showing you instantaneous slope.
The hard rule I follow: never alert on irate(). Its spikiness means a single unlucky sample pair can trip or mask a threshold. Use it for interactive debugging graphs where a human is watching; use rate() for anything that pages someone. If an AI hands you an alert expression built on irate(), that’s an automatic rewrite — and a good reason to want the query explained back to you before it ships. A free Alert Rule Generator can scaffold the surrounding rule YAML, but you still own the choice of function.
increase(): same engine, counts instead of rates
increase() is essentially rate() multiplied by the range length — it returns the total increase of the counter over the window rather than a per-second rate.
# Total HTTP 5xx responses in the last hour
increase(http_requests_total{status=~"5.."}[1h])
The famous gotcha: increase() returns fractional, non-integer results. People expect “12 errors in the last hour” and get 11.7 or 12.3. That’s the same extrapolation from rate() showing up in absolute terms. It is not a bug, and you should not “fix” it by rounding inside the query in a way that hides resets.
# Don't be alarmed by a result like 11.7 — that's extrapolation, not a miscount
increase(http_requests_total{status=~"5.."}[1h])
Pro Tip: If you genuinely need exact event totals — billing, audit counts — a _total counter through increase() is the wrong tool because of extrapolation. Counters answer “is the rate elevated?” not “exactly how many.” For alerting, threshold on a rate or on a comfortably-above-noise increase, not on an exact integer.
Putting it together: choosing the right function
Here’s the decision I make every time, distilled:
# Dashboard, smooth trend over time -> rate()
# Live debugging, want to see sharp spikes -> irate()
# "How many happened in window X" -> increase()
# Anything that pages a human -> rate() (never irate)
And a realistic alerting rule that survives review:
groups:
- name: http-errors
rules:
- alert: HighErrorRate
expr: |
sum(rate(http_requests_total{status=~"5.."}[5m]))
/
sum(rate(http_requests_total[5m]))
> 0.05
for: 10m
labels:
severity: page
annotations:
summary: "5xx error ratio above 5% for 10 minutes"
Note the ratio uses rate() on both numerator and denominator over the same range, so the extrapolation effects largely cancel. That’s the kind of structural correctness an AI draft often gets almost right — matching ranges, matching label sets — and where a quick human pass pays for itself.
Where AI fits, and where the human stays
I lean on AI tools the way I’d lean on a sharp junior who’s fast but hasn’t been paged yet. Whether I’m in Claude, Cursor, or a chat assistant, the model is genuinely good at the mechanical parts: getting label matchers right, remembering histogram_quantile argument order, scaffolding a recording rule. What it does not have is your scrape interval, your cardinality, and your on-call scars.
So the workflow that actually works for me: let the assistant draft, then make it explain the query back in plain language — “this computes the per-second 5xx rate averaged over 5 minutes, then divides by total rate.” If the explanation matches my intent and the function choice fits the use case, it ships. If it can’t explain why it picked irate() over rate(), that’s the tell. The same review discipline I wrote about in untangling inherited PromQL queries with AI applies to brand-new queries too: never ship a PromQL expression you can’t narrate.
And once a query is correct but slow, that’s a different problem with a different fix — recording rules that make queries fast precompute these rates so dashboards and alerts read a cheap pre-aggregated series instead of re-deriving the rate on every evaluation.
Conclusion
rate() smooths and extrapolates. irate() shows you the last two samples and nothing else. increase() is rate() in absolute, fractional terms. None of them lie on purpose — they each answer a precise question, and the bugs come from asking the wrong one. Use rate() for alerts, irate() for live debugging, increase() for windowed totals, and keep your range at least 4x your scrape interval. Let AI write the first draft, but make it explain the draft before it guards your production traffic. For more in this vein, browse the Prometheus monitoring category.
Download the Free 500-Prompt DevOps AI Toolkit
500 battle-tested, copy-paste AI prompts engineered by a senior systems engineer — every one with fill-in placeholders and safety/back-out notes. Drop your email and it's yours.
- 500 prompts: Linux · Kubernetes · Terraform · OpenStack · GitLab · Docker · Monitoring · Incident Response
- Instant PDF download — yours free, forever
- Plus one practical AI-workflow email a week (no spam)
Single opt-in · unsubscribe anytime · no spam.