Skip to content
CloudOps
Newsletter
All prompts
AI for Prometheus & Monitoring Difficulty: Advanced ClaudeChatGPT

PromQL predict_linear Capacity Forecasting Prompt

Build predictive PromQL alerts that fire BEFORE disks fill, certificates expire, or quotas exhaust — using predict_linear, deriv, and seasonal-aware windows instead of static thresholds.

Target user
SREs and capacity planners who want to alert on trajectory, not just the current value
Difficulty
Advanced
Tools
Claude, ChatGPT

The prompt

You are a capacity-planning SRE who has replaced dozens of noisy "disk 85% full" alerts with predictive ones that fire only when something will actually break within the on-call window.

I will provide:
- The metric(s) I want to forecast (e.g., node_filesystem_avail_bytes, certificate expiry, PVC usage, queue depth)
- Current static thresholds and how often they false-fire
- Scrape interval, retention, and typical growth pattern (linear, bursty, seasonal)
- The lead time on-call actually needs to act (e.g., 4h, 12h, 3 days)

Your job:

1. **Trajectory vs. level** — explain why `node_filesystem_avail_bytes < 10%` is the wrong question and `predict_linear(...[6h], 4*3600) < 0` is the right one. State the failure modes of each.

2. **Window selection** — recommend the lookback range (e.g., `[6h]`, `[1h]`) based on the metric's noise and growth shape. Explain why too-short windows chase spikes and too-long windows lag real growth.

3. **Write the alert expressions** for each metric I gave, with:
   - `predict_linear` projecting to the needed lead time
   - A floor guard so it only fires when usage is also already meaningful (avoid forecasting from noise on near-empty disks)
   - Per-device / per-mountpoint label hygiene, excluding tmpfs/overlay/read-only

4. **Seasonality caveat** — call out where `predict_linear` (pure linear regression) misleads on sawtooth or daily-cyclic metrics, and when to switch to `deriv`, `holt_winters`, or a recording rule over a longer baseline.

5. **Recording rules** — precompute the expensive regression as a recording rule so the alert eval stays cheap; show the rule group and interval.

6. **for: and severity tiers** — a warning tier (will breach in 24h) and a page tier (will breach within on-call window), with appropriate `for:` durations to suppress flapping.

7. **Cert & quota variants** — adapt the pattern to TLS cert expiry, API rate-limit quota burn, and Kafka/queue lag growth.

Output as: (a) the alerting rules YAML, (b) the recording rules YAML, (c) a one-paragraph rationale per alert, (d) a backtest plan using historical data to prove false-fire reduction before rollout.

Bias toward: fewer, higher-confidence pages; every magic number justified; explicit guards against forecasting from noise.
Newsletter

Free: the DevOps AI Incident-Triage Cheat Sheet

Subscribe and we’ll send you the one-page cheat sheet — plus weekly AI prompts, automation ideas, and tool reviews for infrastructure engineers. One email a week. No spam, unsubscribe anytime.

  • AI Incident-Triage Cheat Sheet (PDF)
  • Access to 1,603 DevOps AI prompts
  • One practical workflow email per week