Skip to content
CloudOps
Newsletter
All prompts
AI for Prometheus & Monitoring Difficulty: Intermediate ClaudeChatGPT

PromQL `rate()` vs `increase()` vs `irate()` Prompt

Use Prometheus counter functions correctly — rate vs increase vs irate, counter resets, window size choice.

Target user
SREs writing PromQL
Difficulty
Intermediate
Tools
Claude, ChatGPT

The prompt

You are a senior SRE who has explained `rate()` vs `increase()` countless times to engineers. You know that getting them wrong produces dashboards that look right but are quantitatively off.

I will provide:
- The query
- Use case
- Symptom (zero values, NaN, suspicious shape)

Your job:

1. **rate()**:
   - Per-second average rate over the window
   - For counters (always-increasing values)
   - Handles counter resets (sees the drop, extrapolates)
   - Output: per-second rate (e.g., requests/sec)
2. **increase()**:
   - Total increase over the window
   - Mathematically = `rate() * window_seconds`
   - Output: count over window (e.g., total requests in 5 min)
3. **irate()**:
   - INSTANT rate from last 2 samples
   - Highly responsive but jittery
   - Use only for visualizing recent spikes
   - Doesn't aggregate well
4. **For window choice**:
   - Window > 4× scrape interval is safe
   - Smaller window = more responsive but noisier
   - 1m for ops dashboards, 5m for stability
5. **For counter resets**:
   - rate() detects negative changes and extrapolates
   - Reset means counter restarted (process restart)
   - Usually unobservable in output
6. **For "per-second" vs "per-minute"**:
   - rate() = per-second
   - Multiply by 60 for per-minute
7. **For combining with sum/avg**:
   - `sum(rate(...))` — sum of rates (correct order)
   - `avg(rate(...))` — mean rate
   - Never `rate(sum(...))`
8. **For zero/NaN values**:
   - Counter never incremented in window → 0
   - Counter only one sample in window → NaN
   - Window too small for scrape interval → NaN

Mark DESTRUCTIVE: increase() over very long windows can mislead; rate() with short window on slow-scrape metric returns NaN.

---

Query:
```promql
[PASTE]
```
Use case: [DESCRIBE]
Symptom: [DESCRIBE]

Why this prompt works

Counter functions are subtle. This prompt walks the differences.

How to use it

  1. rate() for ops graphs.
  2. increase() for “count over window”.
  3. irate() only for spike visualization.
  4. Window > 4× scrape interval.

Examples

rate() — requests per second

sum by (job)(rate(http_requests_total[5m]))
# Output: avg rps over last 5 min, per job

increase() — total requests in window

sum by (job)(increase(http_requests_total[1h]))
# Output: total requests in last hour, per job
# Equivalent to: rate(...) * 3600

irate() — most recent rate (jittery)

irate(http_requests_total[5m])
# Output: instantaneous rate from last 2 samples
# Useful for "what's happening RIGHT NOW"

Per-minute rate

sum by (job)(rate(http_requests_total[5m])) * 60
# RPM derived from RPS

Error rate ratio

sum by (job)(rate(http_requests_total{code=~"5.."}[5m]))
  / sum by (job)(rate(http_requests_total[5m]))

Common findings this catches

  • rate() returning 0 → counter not incrementing OR not present in window.
  • NaN values → window too small or single sample.
  • rate() * 3600 = “per hour” but expressed as rate → confusion; use increase() instead.
  • Counter reset visible as spike → expected; rate handles.
  • irate() in alerts → noisy; switch to rate().
  • rate(sum(...)) wrong order.
  • Window too long, no recent data.

When to escalate

  • App not exporting counters → engage app team.
  • Counter reset frequency high → investigate process stability.
  • Scrape rate tuning — operational.

Related prompts

Newsletter

Free: the DevOps AI Incident-Triage Cheat Sheet

Subscribe and we’ll send you the one-page cheat sheet — plus weekly AI prompts, automation ideas, and tool reviews for infrastructure engineers. One email a week. No spam, unsubscribe anytime.

  • AI Incident-Triage Cheat Sheet (PDF)
  • Access to 1,603 DevOps AI prompts
  • One practical workflow email per week