PromQL `rate()` vs `increase()` vs `irate()` Prompt
Use Prometheus counter functions correctly — rate vs increase vs irate, counter resets, window size choice.
- Target user
- SREs writing PromQL
- Difficulty
- Intermediate
- Tools
- Claude, ChatGPT
The prompt
You are a senior SRE who has explained `rate()` vs `increase()` countless times to engineers. You know that getting them wrong produces dashboards that look right but are quantitatively off. I will provide: - The query - Use case - Symptom (zero values, NaN, suspicious shape) Your job: 1. **rate()**: - Per-second average rate over the window - For counters (always-increasing values) - Handles counter resets (sees the drop, extrapolates) - Output: per-second rate (e.g., requests/sec) 2. **increase()**: - Total increase over the window - Mathematically = `rate() * window_seconds` - Output: count over window (e.g., total requests in 5 min) 3. **irate()**: - INSTANT rate from last 2 samples - Highly responsive but jittery - Use only for visualizing recent spikes - Doesn't aggregate well 4. **For window choice**: - Window > 4× scrape interval is safe - Smaller window = more responsive but noisier - 1m for ops dashboards, 5m for stability 5. **For counter resets**: - rate() detects negative changes and extrapolates - Reset means counter restarted (process restart) - Usually unobservable in output 6. **For "per-second" vs "per-minute"**: - rate() = per-second - Multiply by 60 for per-minute 7. **For combining with sum/avg**: - `sum(rate(...))` — sum of rates (correct order) - `avg(rate(...))` — mean rate - Never `rate(sum(...))` 8. **For zero/NaN values**: - Counter never incremented in window → 0 - Counter only one sample in window → NaN - Window too small for scrape interval → NaN Mark DESTRUCTIVE: increase() over very long windows can mislead; rate() with short window on slow-scrape metric returns NaN. --- Query: ```promql [PASTE] ``` Use case: [DESCRIBE] Symptom: [DESCRIBE]
Why this prompt works
Counter functions are subtle. This prompt walks the differences.
How to use it
- rate() for ops graphs.
- increase() for “count over window”.
- irate() only for spike visualization.
- Window > 4× scrape interval.
Examples
rate() — requests per second
sum by (job)(rate(http_requests_total[5m]))
# Output: avg rps over last 5 min, per job
increase() — total requests in window
sum by (job)(increase(http_requests_total[1h]))
# Output: total requests in last hour, per job
# Equivalent to: rate(...) * 3600
irate() — most recent rate (jittery)
irate(http_requests_total[5m])
# Output: instantaneous rate from last 2 samples
# Useful for "what's happening RIGHT NOW"
Per-minute rate
sum by (job)(rate(http_requests_total[5m])) * 60
# RPM derived from RPS
Error rate ratio
sum by (job)(rate(http_requests_total{code=~"5.."}[5m]))
/ sum by (job)(rate(http_requests_total[5m]))
Common findings this catches
- rate() returning 0 → counter not incrementing OR not present in window.
- NaN values → window too small or single sample.
- rate() * 3600 = “per hour” but expressed as rate → confusion; use increase() instead.
- Counter reset visible as spike → expected; rate handles.
irate()in alerts → noisy; switch to rate().rate(sum(...))wrong order.- Window too long, no recent data.
When to escalate
- App not exporting counters → engage app team.
- Counter reset frequency high → investigate process stability.
- Scrape rate tuning — operational.
Related prompts
-
Prometheus Alert Rule Generator Prompt
Generate production-quality Prometheus alerting rules with sensible thresholds, labels, and runbook annotations.
-
PromQL Query Optimization Prompt
Diagnose slow PromQL queries — cardinality explosion, range vector traps, sum vs avg pitfalls, query timeout, recording rules opportunity.
-
PromQL Recording Rules Design Prompt
Design Prometheus recording rules — naming convention, evaluation interval, when to use, retention, multi-cluster patterns.