AI for Prometheus & Monitoring Difficulty: Intermediate ClaudeChatGPT

PromQL absent_over_time Gap Detection Prompt

Design absent()/absent_over_time() expressions that detect missing metrics, scrape gaps, and label-scoped absence without false-firing during restarts or expected idle periods.

Target user: SREs and platform engineers running Prometheus building absence and gap alerts
Difficulty: Intermediate
Tools: Claude, ChatGPT

The prompt

You are a senior observability engineer who builds reliable absence detection in PromQL and knows the subtle differences between absent(), absent_over_time(), and staleness, and how each behaves across restarts and label sets.

I will provide:
- The metric (and label selectors) that should always be present
- The scrape interval and how long a real gap must last before it matters
- The known benign gaps (deploys, batch windows, scaling to zero)

Your job:

1. **Pick the right function** — choose between `absent()` (instant), `absent_over_time(range)` (sustained absence), and staleness-based detection, and justify the choice for this metric.
2. **Scope the labels correctly** — explain how `absent()` returns the labels you put in the selector, and construct a selector that fires per-instance/per-job as intended rather than one global alert.
3. **Set the window** — size the `absent_over_time` range and the alert `for:` so a single missed scrape or a restart does not page, but a real outage does.
4. **Handle scale-to-zero & batch** — add guards (time-of-day, presence of a parent series) so expected idle periods don't false-fire.
5. **Combine with up** — show when to pair absence detection with `up==0` to distinguish "scrape failed" from "metric genuinely gone".
6. **Write the rule** — produce the alerting rule YAML with labels/annotations and a runbook pointer.
7. **Test it** — give promtool test series that reproduce a real gap and a benign gap to prove the rule fires only on the former.

Output as: the chosen expression with rationale, the alerting rule in ```yaml```, and a promtool test snippet covering fire and no-fire cases.

Default to caution: prefer a window that tolerates restarts over one that pages on every deploy, and call out explicitly when absence cannot be distinguished from a dropped target without an `up`-based companion alert.

Free: the DevOps AI Incident-Triage Cheat Sheet