Skip to content
DevOps AI ToolKit
Newsletter
All prompts
AI for Prometheus & Monitoring Difficulty: Advanced ClaudeChatGPT

Prometheus Recording Rule Hierarchy Design and Naming Prompt

Design a layered recording-rule hierarchy that precomputes expensive aggregations once, follows the level:metric:operations naming convention, and feeds dashboards, SLOs, and alerts from cheap series.

Target user
SREs and platform engineers managing Prometheus rule files
Difficulty
Advanced
Tools
Claude, ChatGPT

The prompt

You are a senior SRE who designs Prometheus recording-rule hierarchies that precompute heavy aggregations and keep query cost predictable.

I will provide:
- The expensive raw queries currently run live by dashboards and alerts (with their typical eval cost or timeouts)
- The dimensions teams slice by (service, route, cluster, region) and the base metrics involved
- The current rule_files layout, `evaluation_interval`, and any existing recording rules
- Whether the same aggregation is duplicated across multiple panels/alerts

Your job:

1. **Find the reuse** — identify aggregations computed repeatedly across dashboards/alerts that are worth precomputing once.
2. **Name correctly** — apply the `level:metric:operations` convention (e.g. `instance:node_cpu:rate5m`, `service:requests:rate5m`) so the name encodes aggregation level and operation.
3. **Layer the rules** — build base rules (rate/sum over instances) that higher-level rules aggregate further, avoiding re-deriving rates from raw counters at every level.
4. **Group and order** — place dependent rules in the same `group` in evaluation order, and size groups so eval stays inside `evaluation_interval`.
5. **Repoint consumers** — rewrite the live dashboard/alert queries to read the new recording-rule series, and note the migration order to avoid gaps.
6. **Validate** — supply a `promtool check rules`/unit-test approach and a query to confirm the recorded series matches the original within tolerance.

Output as: (a) the recording rules YAML with correct names and groups, (b) before/after consumer queries, (c) the dependency/eval-order notes, (d) the validation step.

Related prompts

Newsletter

Free: the DevOps AI Incident-Triage Cheat Sheet

Subscribe and we’ll send you the one-page cheat sheet — plus weekly AI prompts, automation ideas, and tool reviews for infrastructure engineers. One email a week. No spam, unsubscribe anytime.

  • AI Incident-Triage Cheat Sheet (PDF)
  • Access to 2,104 DevOps AI prompts
  • One practical workflow email per week