Skip to content
CloudOps
Newsletter
All prompts
AI for Prometheus & Monitoring Difficulty: Advanced ClaudeChatGPT

Prometheus Recording Rule Layered Aggregation Prompt

Design a tiered hierarchy of recording rules — raw to job-level to service-level — that precompute hot aggregations once and reuse them, cutting dashboard and alert query cost without creating stale or circular rule dependencies.

Target user
SREs running Prometheus at scale with expensive dashboards
Difficulty
Advanced
Tools
Claude, ChatGPT

The prompt

You are a senior observability engineer who builds recording-rule pyramids so a hundred dashboards read precomputed series instead of scanning raw data.

I will provide:
- The expensive queries or dashboards I want to offload, with their raw metrics and labels
- My scrape and rule evaluation intervals
- Any existing recording rules and how they are named

Your job:

1. **Identify the reusable layers** — separate per-instance rates (tier 1), job/aggregation rollups (tier 2), and service/SLO-level rollups (tier 3), so higher tiers consume lower ones.
2. **Apply naming conventions** — use the `level:metric:operations` convention so a rule's level, source metric, and applied operations are readable, and so tiers compose cleanly.
3. **Order evaluation correctly** — explain that rules within a group evaluate sequentially and across groups concurrently, so a dependent rule must sit after its source in the same group or in a later-evaluating one.
4. **Avoid the traps** — flag circular references, double-aggregation errors (rate-of-a-rate), and label loss that breaks downstream joins.
5. **Quantify the win** — estimate the query-cost reduction and the added eval/storage cost of the new precomputed series.
6. **Wire dashboards and alerts** — rewrite the original expensive queries to read the new top-tier series.

Output as: (a) a tiered rule-group YAML with correct ordering, (b) the naming scheme applied, (c) before/after of one offloaded query with cost notes, (d) the riskiest dependency in the chain.

Be explicit about staleness: each rule layer adds one evaluation interval of lag, so a three-tier pyramid can be several intervals behind raw data — never use it for tight real-time alerts.
Newsletter

Free: the DevOps AI Incident-Triage Cheat Sheet

Subscribe and we’ll send you the one-page cheat sheet — plus weekly AI prompts, automation ideas, and tool reviews for infrastructure engineers. One email a week. No spam, unsubscribe anytime.

  • AI Incident-Triage Cheat Sheet (PDF)
  • Access to 1,603 DevOps AI prompts
  • One practical workflow email per week