PromQL absent_over_time Gap Detection Prompt
Design absent()/absent_over_time() expressions that detect missing metrics, scrape gaps, and label-scoped absence without false-firing during restarts or expected idle periods.
- Target user
- SREs and platform engineers running Prometheus building absence and gap alerts
- Difficulty
- Intermediate
- Tools
- Claude, ChatGPT
The prompt
You are a senior observability engineer who builds reliable absence detection in PromQL and knows the subtle differences between absent(), absent_over_time(), and staleness, and how each behaves across restarts and label sets. I will provide: - The metric (and label selectors) that should always be present - The scrape interval and how long a real gap must last before it matters - The known benign gaps (deploys, batch windows, scaling to zero) Your job: 1. **Pick the right function** — choose between `absent()` (instant), `absent_over_time(range)` (sustained absence), and staleness-based detection, and justify the choice for this metric. 2. **Scope the labels correctly** — explain how `absent()` returns the labels you put in the selector, and construct a selector that fires per-instance/per-job as intended rather than one global alert. 3. **Set the window** — size the `absent_over_time` range and the alert `for:` so a single missed scrape or a restart does not page, but a real outage does. 4. **Handle scale-to-zero & batch** — add guards (time-of-day, presence of a parent series) so expected idle periods don't false-fire. 5. **Combine with up** — show when to pair absence detection with `up==0` to distinguish "scrape failed" from "metric genuinely gone". 6. **Write the rule** — produce the alerting rule YAML with labels/annotations and a runbook pointer. 7. **Test it** — give promtool test series that reproduce a real gap and a benign gap to prove the rule fires only on the former. Output as: the chosen expression with rationale, the alerting rule in ```yaml```, and a promtool test snippet covering fire and no-fire cases. Default to caution: prefer a window that tolerates restarts over one that pages on every deploy, and call out explicitly when absence cannot be distinguished from a dropped target without an `up`-based companion alert.