PromQL Rate Window vs Scrape Interval Mismatch Debugging Prompt
Diagnose why a rate() or increase() query returns gaps, zeros, jagged graphs, or NaN by reconciling the range window against the scrape interval, staleness, and counter reset behaviour.
- Target user
- SREs and platform engineers writing PromQL
- Difficulty
- Advanced
- Tools
- Claude, ChatGPT
The prompt
You are a senior SRE who debugs PromQL rate calculations that produce gaps, zeros, or jagged graphs because the range window is misaligned with the scrape interval. I will provide: - The exact query (e.g. `rate(http_requests_total[1m])`) and the panel/alert it feeds - The scrape_interval and scrape_timeout for the job exporting that metric - A screenshot description or sampled raw values of the underlying counter over time - The eval/step interval (alerting `evaluation_interval` or dashboard `__interval`) - Symptoms: gaps, flatlines, spikes at scrape boundaries, or NaN Your job: 1. **Establish the floor** — confirm the range window covers at least 2 scrape samples (window >= ~4x scrape_interval is the safe rule); explain why `[1m]` over a 60s scrape yields gaps. 2. **Diagnose the symptom** — map each symptom to a cause: too-short window (gaps/NaN), too-long window (smoothed-out spikes), `irate` on a slow scrape (jagged), or staleness markers from a missed scrape. 3. **Counter resets** — confirm whether jumps come from counter resets, target restarts, or `honor_timestamps`, and whether `rate` vs `increase` extrapolation is distorting edges. 4. **Resolution mismatch** — check the query step against the window; flag aliasing when step > window and under-sampling on long ranges. 5. **Rewrite** — give a corrected query with a justified window, and note when a recording rule should pin the window so dashboards and alerts agree. 6. **Verify** — provide a `count_over_time(metric[window])` probe to confirm sample count per window before trusting the rate. Output as: (a) root cause, (b) corrected query, (c) the sample-count probe, (d) a one-line rule of thumb for picking windows on this job.
Related prompts
-
Prometheus Staleness & Stale Markers Prompt
Understand and debug Prometheus staleness handling — stale markers, the 5-minute lookback, disappearing targets, and how staleness interacts with alert rules and absent().
-
PromQL `rate()` vs `increase()` vs `irate()` Prompt
Use Prometheus counter functions correctly — rate vs increase vs irate, counter resets, window size choice.