Skip to content
CloudOps
Newsletter
All prompts
AI for Prometheus & Monitoring Difficulty: Intermediate ClaudeChatGPT

Prometheus Scrape Config & Service Discovery Prompt

Configure Prometheus scrape targets — kubernetes_sd, ec2_sd, file_sd, consul_sd, relabeling, scrape interval tuning.

Target user
SREs setting up Prometheus targets
Difficulty
Intermediate
Tools
Claude, ChatGPT

The prompt

You are a senior SRE who has set up Prometheus scrape configs across Kubernetes, EC2, Consul, file_sd, and static targets — with careful relabeling for clean labels.

I will provide:
- The discovery source
- Current scrape config
- Symptom (targets missing, wrong labels, scrape errors)

Your job:

1. **Service discovery types**:
   - **`kubernetes_sd`** — Pod / Service / Endpoints / Node
   - **`ec2_sd`** — AWS EC2 instances
   - **`consul_sd`** — Consul services
   - **`file_sd`** — static JSON / YAML files
   - **`dns_sd`** — DNS SRV records
   - **`static_configs`** — hardcoded
2. **For relabeling**:
   - `relabel_configs` — pre-scrape (filter targets, set labels)
   - `metric_relabel_configs` — post-scrape (transform metrics)
   - Order matters; chain transformations
3. **For Kubernetes SD**:
   - Roles: pod, service, endpoints, node, ingress, endpointslice
   - Discovered labels (`__meta_kubernetes_*`)
   - Standard pattern: annotation-driven scrape
4. **For "target missing"**:
   - SD not discovering
   - Filter relabeling dropping
   - Auth / network reach
5. **For label noise**:
   - SD adds many `__meta_*` labels
   - `keep` action filters to desired
   - `labelmap` copies useful ones
6. **For scrape interval**:
   - Global default; per-job override
   - Faster = more samples / more storage; balance
7. **For scrape timeout**:
   - Should be < scrape interval
   - For slow exporters: separate slow / fast jobs
8. **For multiple Prometheus** instances:
   - Different jobs per Prom
   - Avoid duplicate scrapes

Mark DESTRUCTIVE: drop relabel matching everything (no targets), changing labels in alerts dependents, very short scrape interval (CPU storm).

---

Discovery: [DESCRIBE]
Current scrape config:
```yaml
[PASTE]
```
Symptom: [DESCRIBE]

Why this prompt works

Scrape config is the input layer. This prompt walks patterns.

How to use it

  1. Pick SD for your environment.
  2. Relabel for clean labels.
  3. Set scrape interval thoughtfully.
  4. Verify discovery in Prometheus UI.

Useful commands

# Verify in Prometheus
# UI: Status → Service Discovery
# UI: Status → Targets

# API
curl http://prometheus:9090/api/v1/targets?state=active | jq
curl http://prometheus:9090/api/v1/discovery/targets

# Reload config
curl -XPOST http://prometheus:9090/-/reload

# Validate config
promtool check config /etc/prometheus/prometheus.yml

# Test scrape from CLI
curl http://<target>:<port>/metrics | head

Patterns

Kubernetes SD (annotation-driven)

scrape_configs:
- job_name: kubernetes-pods
  kubernetes_sd_configs:
  - role: pod
  relabel_configs:
  # Keep pods with prometheus.io/scrape=true
  - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
    action: keep
    regex: true

  # Use prometheus.io/path annotation as metrics path
  - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
    action: replace
    target_label: __metrics_path__
    regex: (.+)

  # Use prometheus.io/port annotation as port
  - source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]
    action: replace
    regex: ([^:]+)(?::\d+)?;(\d+)
    replacement: $1:$2
    target_label: __address__

  # Copy useful K8s labels
  - action: labelmap
    regex: __meta_kubernetes_pod_label_(.+)

  - source_labels: [__meta_kubernetes_namespace]
    target_label: namespace
  - source_labels: [__meta_kubernetes_pod_name]
    target_label: pod

EC2 SD

- job_name: ec2-nodes
  ec2_sd_configs:
  - region: us-east-1
    port: 9100
    filters:
    - name: tag:role
      values: [monitoring]
  relabel_configs:
  - source_labels: [__meta_ec2_tag_Name]
    target_label: instance
  - source_labels: [__meta_ec2_instance_type]
    target_label: instance_type

File SD

- job_name: external-services
  file_sd_configs:
  - files:
    - /etc/prometheus/file_sd/*.json
    refresh_interval: 1m
[
  {
    "targets": ["host1.example.com:9100", "host2.example.com:9100"],
    "labels": { "env": "prod", "team": "platform" }
  }
]

Drop noisy metrics post-scrape

metric_relabel_configs:
- source_labels: [__name__]
  regex: 'go_(memstats_alloc_bytes|gc_duration_seconds_count)'
  action: drop

Common findings this catches

  • Targets discovered but not scraping → relabel dropped them.
  • Scrape errors (5xx, timeout) → app endpoint issue or auth.
  • Duplicate targets across jobs → narrow with relabel.
  • Pod IPs changing breaks SD assumption → use Service.
  • up{} metric shows 0 → target unreachable.
  • Stale targets persist → SD source not refreshing.
  • High cardinality from labelmap → filter more.

When to escalate

  • New SD source (custom) — engage Prometheus team.
  • Massive label change — coordinate dashboards/alerts.
  • Cross-region / multi-DC scraping — strategic.

Related prompts

Newsletter

Free: the DevOps AI Incident-Triage Cheat Sheet

Subscribe and we’ll send you the one-page cheat sheet — plus weekly AI prompts, automation ideas, and tool reviews for infrastructure engineers. One email a week. No spam, unsubscribe anytime.

  • AI Incident-Triage Cheat Sheet (PDF)
  • Access to 1,603 DevOps AI prompts
  • One practical workflow email per week