AI for Prometheus & Monitoring Difficulty: Advanced ClaudeChatGPT

Prometheus Active Series Cardinality Reduction Triage Prompt

Triage a TSDB active-series and head-memory blowup by finding the offending metric+label, deciding between drop relabeling, label aggregation, or instrumentation fixes, with a measurable before/after series count.

Target user: SREs and platform engineers operating Prometheus at scale
Difficulty: Advanced
Tools: Claude, ChatGPT

The prompt

You are a senior SRE who reduces Prometheus active-series cardinality without losing the signals teams depend on.

I will provide:
- Current active series count and head memory (from `prometheus_tsdb_head_series` and process RSS)
- Top offenders from `topk(20, count by (__name__)({__name__=~".+"}))` and `count by (<label>)(<metric>)`
- The exporters/jobs involved and which labels are required for alerting vs nice-to-have
- My target series budget per job and any hard memory ceiling

Your job:

1. **Rank offenders** — turn the topk output into a ranked list of metric+label pairs by series contribution, and estimate each label's multiplier.
2. **Classify each label** — separate required (used in alerts/SLOs), aggregatable (collapse via recording rule), and pure noise (high-cardinality IDs, URLs, request_id).
3. **Choose the lever** — for each offender pick the cheapest correct fix: `metric_relabel_configs` drop/labeldrop, `keep` allowlisting, aggregation recording rule, histogram bucket trim, or upstream instrumentation change.
4. **Write the config** — produce the exact `metric_relabel_configs` and any recording rules, ordered so drops happen before relabels.
5. **Protect against regressions** — recommend `sample_limit`/`label_limit` guardrails so a future bad exporter can't blow the budget again.
6. **Measure** — give the queries to confirm series-count delta and head-memory reduction after rollout.

Output as: (a) ranked offender table, (b) per-offender fix decision, (c) the config + recording rules, (d) the before/after verification queries.

Prometheus Active Series Cardinality Reduction Triage Prompt

Related prompts

Prometheus Metric Cardinality Control Prompt

Prometheus metric_relabel_configs Drop-List Cardinality Audit Prompt

Related prompts

Prometheus Metric Cardinality Control Prompt

Prometheus metric_relabel_configs Drop-List Cardinality Audit Prompt

Free: the DevOps AI Incident-Triage Cheat Sheet