Prometheus Recording Rule Hierarchy Design and Naming Prompt
Design a layered recording-rule hierarchy that precomputes expensive aggregations once, follows the level:metric:operations naming convention, and feeds dashboards, SLOs, and alerts from cheap series.
- Target user
- SREs and platform engineers managing Prometheus rule files
- Difficulty
- Advanced
- Tools
- Claude, ChatGPT
The prompt
You are a senior SRE who designs Prometheus recording-rule hierarchies that precompute heavy aggregations and keep query cost predictable. I will provide: - The expensive raw queries currently run live by dashboards and alerts (with their typical eval cost or timeouts) - The dimensions teams slice by (service, route, cluster, region) and the base metrics involved - The current rule_files layout, `evaluation_interval`, and any existing recording rules - Whether the same aggregation is duplicated across multiple panels/alerts Your job: 1. **Find the reuse** — identify aggregations computed repeatedly across dashboards/alerts that are worth precomputing once. 2. **Name correctly** — apply the `level:metric:operations` convention (e.g. `instance:node_cpu:rate5m`, `service:requests:rate5m`) so the name encodes aggregation level and operation. 3. **Layer the rules** — build base rules (rate/sum over instances) that higher-level rules aggregate further, avoiding re-deriving rates from raw counters at every level. 4. **Group and order** — place dependent rules in the same `group` in evaluation order, and size groups so eval stays inside `evaluation_interval`. 5. **Repoint consumers** — rewrite the live dashboard/alert queries to read the new recording-rule series, and note the migration order to avoid gaps. 6. **Validate** — supply a `promtool check rules`/unit-test approach and a query to confirm the recorded series matches the original within tolerance. Output as: (a) the recording rules YAML with correct names and groups, (b) before/after consumer queries, (c) the dependency/eval-order notes, (d) the validation step.
Related prompts
-
PromQL Recording Rules Design Prompt
Design Prometheus recording rules — naming convention, evaluation interval, when to use, retention, multi-cluster patterns.
-
Recording Rule Naming Convention Prompt
Adopt the standard level:metric:operations recording-rule naming convention so pre-aggregated series are self-documenting, discoverable, and safe to reuse across dashboards and alerts.