You are a senior SRE who has set up Prometheus scrape configs across Kubernetes, EC2, Consul, file_sd, and static targets — with careful relabeling for clean labels. I will provide: - The discovery source - Current scrape config - Symptom (targets missing, wrong labels, scrape errors) Your job: 1. **Service discovery types**: - **`kubernetes_sd`** — Pod / Service / Endpoints / Node - **`ec2_sd`** — AWS EC2 instances - **`consul_sd`** — Consul services - **`file_sd`** — static JSON / YAML files - **`dns_sd`** — DNS SRV records - **`static_configs`** — hardcoded 2. **For relabeling**: - `relabel_configs` — pre-scrape (filter targets, set labels) - `metric_relabel_configs` — post-scrape (transform metrics) - Order matters; chain transformations 3. **For Kubernetes SD**: - Roles: pod, service, endpoints, node, ingress, endpointslice - Discovered labels (`__meta_kubernetes_*`) - Standard pattern: annotation-driven scrape 4. **For "target missing"**: - SD not discovering - Filter relabeling dropping - Auth / network reach 5. **For label noise**: - SD adds many `__meta_*` labels - `keep` action filters to desired - `labelmap` copies useful ones 6. **For scrape interval**: - Global default; per-job override - Faster = more samples / more storage; balance 7. **For scrape timeout**: - Should be < scrape interval - For slow exporters: separate slow / fast jobs 8. **For multiple Prometheus** instances: - Different jobs per Prom - Avoid duplicate scrapes Mark DESTRUCTIVE: drop relabel matching everything (no targets), changing labels in alerts dependents, very short scrape interval (CPU storm). --- Discovery: [DESCRIBE] Current scrape config: ```yaml [PASTE] ``` Symptom: [DESCRIBE]

Why this prompt works

Scrape config is the input layer. This prompt walks patterns.

How to use it

Pick SD for your environment.
Relabel for clean labels.
Set scrape interval thoughtfully.
Verify discovery in Prometheus UI.

Useful commands

# Verify in Prometheus
# UI: Status → Service Discovery
# UI: Status → Targets

# API
curl http://prometheus:9090/api/v1/targets?state=active | jq
curl http://prometheus:9090/api/v1/discovery/targets

# Reload config
curl -XPOST http://prometheus:9090/-/reload

# Validate config
promtool check config /etc/prometheus/prometheus.yml

# Test scrape from CLI
curl http://<target>:<port>/metrics | head

Patterns

Kubernetes SD (annotation-driven)

scrape_configs:
- job_name: kubernetes-pods
  kubernetes_sd_configs:
  - role: pod
  relabel_configs:
  # Keep pods with prometheus.io/scrape=true
  - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
    action: keep
    regex: true

  # Use prometheus.io/path annotation as metrics path
  - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
    action: replace
    target_label: __metrics_path__
    regex: (.+)

  # Use prometheus.io/port annotation as port
  - source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]
    action: replace
    regex: ([^:]+)(?::\d+)?;(\d+)
    replacement: $1:$2
    target_label: __address__

  # Copy useful K8s labels
  - action: labelmap
    regex: __meta_kubernetes_pod_label_(.+)

  - source_labels: [__meta_kubernetes_namespace]
    target_label: namespace
  - source_labels: [__meta_kubernetes_pod_name]
    target_label: pod

EC2 SD

- job_name: ec2-nodes
  ec2_sd_configs:
  - region: us-east-1
    port: 9100
    filters:
    - name: tag:role
      values: [monitoring]
  relabel_configs:
  - source_labels: [__meta_ec2_tag_Name]
    target_label: instance
  - source_labels: [__meta_ec2_instance_type]
    target_label: instance_type

File SD

- job_name: external-services
  file_sd_configs:
  - files:
    - /etc/prometheus/file_sd/*.json
    refresh_interval: 1m

[
  {
    "targets": ["host1.example.com:9100", "host2.example.com:9100"],
    "labels": { "env": "prod", "team": "platform" }
  }
]

Drop noisy metrics post-scrape

metric_relabel_configs:
- source_labels: [__name__]
  regex: 'go_(memstats_alloc_bytes|gc_duration_seconds_count)'
  action: drop

Common findings this catches

Targets discovered but not scraping → relabel dropped them.
Scrape errors (5xx, timeout) → app endpoint issue or auth.
Duplicate targets across jobs → narrow with relabel.
Pod IPs changing breaks SD assumption → use Service.
up{} metric shows 0 → target unreachable.
Stale targets persist → SD source not refreshing.
High cardinality from labelmap → filter more.

When to escalate

New SD source (custom) — engage Prometheus team.
Massive label change — coordinate dashboards/alerts.
Cross-region / multi-DC scraping — strategic.

Prometheus Scrape Config & Service Discovery Prompt

Why this prompt works

How to use it

Useful commands

Patterns

Kubernetes SD (annotation-driven)

EC2 SD

File SD

Drop noisy metrics post-scrape

Common findings this catches

When to escalate

Related prompts

Prometheus ServiceMonitor & PodMonitor Configuration Prompt

Prometheus Storage, Retention & TSDB Prompt

PromQL Query Optimization Prompt

Why this prompt works

How to use it

Useful commands

Patterns

Kubernetes SD (annotation-driven)

EC2 SD

File SD

Drop noisy metrics post-scrape

Common findings this catches

When to escalate

Related prompts

Prometheus ServiceMonitor & PodMonitor Configuration Prompt

Prometheus Storage, Retention & TSDB Prompt

PromQL Query Optimization Prompt

Free: the DevOps AI Incident-Triage Cheat Sheet