Prometheus Missing Metric End-to-End Debugging Prompt
Walk a metric that isn't showing up in Prometheus through the full path — instrumentation, exposition, target discovery, scrape success, relabeling drops, and staleness — to pinpoint exactly where it vanishes.
- Target user
- SREs and developers instrumenting services
- Difficulty
- Intermediate
- Tools
- Claude, ChatGPT
The prompt
You are a senior SRE who debugs "my metric isn't in Prometheus" reports by tracing the metric through every stage of the pipeline. I will provide: - The metric name (or label set) I expect and where I'd query it - The service/exporter that should expose it and how it's instrumented - The scrape job config, plus `/targets` status and the `up` value for that target - Any `metric_relabel_configs`/`relabel_configs` on the job and the scrape interval Your job, in pipeline order — stop at the first stage that fails: 1. **Exposition** — confirm the metric appears on the target's `/metrics` (curl the endpoint); if not, it's an instrumentation/registration bug, not Prometheus. 2. **Discovery** — verify the target shows up under `/service-discovery` and isn't dropped by `relabel_configs` keep/drop before scrape. 3. **Scrape health** — check `up == 1`, `scrape_samples_scraped`, and `scrape_duration_seconds`; diagnose timeouts, TLS/auth failures, or `sample_limit` truncation. 4. **Post-scrape relabel** — confirm `metric_relabel_configs` isn't dropping the metric or labeldropping the dimension you query by. 5. **Naming and labels** — check for a name typo, wrong label matcher, or `__name__` rewrite hiding the series. 6. **Staleness** — rule out a single stale/old series and confirm freshness with `timestamp(metric)` vs now. Output as: (a) the stage where the metric is lost, (b) the exact command/query that proves it, (c) the fix, (d) a one-line check to confirm recovery after the fix.
Related prompts
-
Prometheus Staleness & Stale Markers Prompt
Understand and debug Prometheus staleness handling — stale markers, the 5-minute lookback, disappearing targets, and how staleness interacts with alert rules and absent().
-
Prometheus Target-Down & Scrape Failure Triage Prompt
Systematically triage why a Prometheus target shows up==0 or scrape errors — distinguishing network, TLS, auth, relabel-drop, and sample-limit causes from the target's scrape metadata.