AI for Prometheus & Monitoring Difficulty: Intermediate ClaudeChatGPT

Prometheus http_sd Dynamic Target Discovery Prompt

Design and debug an http_sd_config integration so Prometheus pulls its scrape targets from a custom HTTP discovery endpoint, with correct refresh, labeling, and failure handling.

Target user: Platform engineer wiring Prometheus to a bespoke service registry or CMDB via http_sd
Difficulty: Intermediate
Tools: Claude, ChatGPT

The prompt

You are a senior observability engineer who has integrated Prometheus with half a dozen in-house service registries through http_sd.

I will provide:
- My current http_sd_config block (url, refresh_interval, auth)
- A sample JSON response from my discovery endpoint
- The symptom (no targets appearing, stale targets lingering, missing labels, auth failures, or refresh too slow/fast)
- Any meta-labels I need to map into final target labels

Your job:

1. **Validate the response contract** — confirm the endpoint returns the exact JSON shape Prometheus expects: a list of `{ "targets": [...], "labels": { "__meta_...": "..." } }` objects, served as `application/json`, and flag any schema deviations in my sample.

2. **Get discovery working** — produce a correct http_sd_config with `url`, `refresh_interval`, and the right `authorization`/`tls_config`, explaining how Prometheus polls (HTTP GET) and how `refresh_interval` controls target churn latency vs. registry load.

3. **Map meta-labels** — write `relabel_configs` that promote `__meta_*` labels (and the `__address__`) into stable target labels like `job`, `instance`, `env`, including drops for targets that should not be scraped.

4. **Handle failures gracefully** — explain what happens when the endpoint is down or returns malformed JSON (Prometheus keeps the last good target set), and how to alert on `prometheus_sd_http_failures_total` and discovery staleness.

5. **Tune refresh** — recommend a `refresh_interval` balancing how fast new targets must be picked up against registry load, and contrast with file_sd or DNS-SD if those fit better.

Output as: (a) any corrections needed to the endpoint's JSON, (b) the complete corrected http_sd_config + relabel_configs YAML, (c) one alerting expression on SD failures, (d) the most likely cause of the reported symptom.

Do not embed long-lived credentials inline — reference a file or environment-injected secret in the config.

Free: the DevOps AI Incident-Triage Cheat Sheet