Prometheus metric_relabel_configs Drop-List Cardinality Audit Prompt
Audit and generate metric_relabel_configs drop and keep rules that cut high-cardinality series at ingest without dropping metrics your alerts and dashboards depend on.
- Target user
- Engineers fighting cardinality who need a safe ingest-time drop list
- Difficulty
- Advanced
- Tools
- Claude, ChatGPT, Cursor
The prompt
You are a Prometheus operator who treats metric_relabel_configs as a cardinality firewall applied after scrape but before ingestion, and who never drops a series an alert or dashboard depends on.
I will provide:
- The top offending metrics by series count (from `topk(20, count by (__name__)({__name__=~".+"}))`): [TOP SERIES]
- The high-churn labels (e.g. pod, id, path, user_id) and example values: [LABELS]
- The list of alert rules and dashboard queries that touch these metrics: [DEPENDENCIES]
- My scrape job structure (one job vs many) and whether I use kube-prometheus-stack: [JOB CONTEXT]
Your job:
1. **Separate drop-the-metric from drop-the-label** — explain when to use `action: drop` (kill an entire useless metric like a go_gc_* you never query) vs `labeldrop`/`labelmap` (keep the metric, remove an unbounded label). Most cardinality wins come from labeldrop, not drop.
2. **Cross-check against dependencies** — for every proposed drop, state which alert/dashboard query, if any, references that metric or label. Refuse to drop anything referenced; flag it for me to confirm instead.
3. **Write the rules** — produce a metric_relabel_configs block with explicit regex, source_labels, and comments. Prefer keep-lists for noisy exporters (only keep the 30 metrics we use) over endless drop-lists.
4. **Estimate the reduction** — for each rule, give me the query to measure series-before and series-after so I can quantify the win rather than guess.
5. **Order and safety** — explain rule ordering (rules run top to bottom; a drop short-circuits later rules for that series) and where to place this in a kube-prometheus-stack ServiceMonitor vs the Prometheus config.
Output as: (a) a table of proposed changes with the dependency-safety column, (b) the runnable metric_relabel_configs YAML, (c) the before/after measurement queries, (d) a rollout note (apply to staging, watch active series, then promote).
Never drop a metric or label referenced by an existing alert or recording rule. When unsure whether something is used, flag it, do not drop it.
Why this prompt works
Cardinality control is where well-meaning engineers cause outages, because the tool that fixes the problem — metric_relabel_configs — is also a silent data-deletion mechanism. A drop rule produces no error and no warning; it just removes series before they hit the TSDB. If one of those series backed an alert, that alert quietly stops firing, and you discover it during the next incident when the page never came. This prompt makes dependency cross-checking a hard, non-skippable step: every proposed drop must be reconciled against the actual alert and dashboard queries you supply, and anything referenced gets flagged rather than removed.
The prompt also corrects the most common strategic mistake, which is reaching for action: drop on whole metrics when the real cardinality blowup is one unbounded label. Distinguishing drop-the-metric from labeldrop, and biasing toward keep-lists for chatty exporters, is the difference between trimming a few series and actually capping growth. By demanding before/after measurement queries, it also converts cardinality work from superstition (“this should help”) into something you can quantify, which is what you need to justify the change in review.
Finally, ordering and placement are where these configs go wrong in practice — rules short-circuit, and in a kube-prometheus-stack world the right place might be a ServiceMonitor’s metricRelabelings rather than the Prometheus config. Forcing the model to address ordering and a staged rollout keeps this in the AI-drafts, human-verifies lane: you get a concrete YAML block, but also the measurement and rollout discipline to prove it is safe before it touches production ingestion.
Related prompts
-
Prometheus Metric Cardinality Control Prompt
Find, quantify, and kill the high-cardinality label combinations that bloat your TSDB, blow up memory, and slow queries — then put guardrails in place so it never regresses.
-
Prometheus Relabeling Rules Prompt
Author and debug relabel_configs and metric_relabel_configs to filter targets, rewrite labels, drop expensive series, and normalize metadata before and after scraping.