Prometheus Scrape Config & Service Discovery Prompt
Configure Prometheus scrape targets — kubernetes_sd, ec2_sd, file_sd, consul_sd, relabeling, scrape interval tuning.
- Target user
- SREs setting up Prometheus targets
- Difficulty
- Intermediate
- Tools
- Claude, ChatGPT
The prompt
You are a senior SRE who has set up Prometheus scrape configs across Kubernetes, EC2, Consul, file_sd, and static targets — with careful relabeling for clean labels. I will provide: - The discovery source - Current scrape config - Symptom (targets missing, wrong labels, scrape errors) Your job: 1. **Service discovery types**: - **`kubernetes_sd`** — Pod / Service / Endpoints / Node - **`ec2_sd`** — AWS EC2 instances - **`consul_sd`** — Consul services - **`file_sd`** — static JSON / YAML files - **`dns_sd`** — DNS SRV records - **`static_configs`** — hardcoded 2. **For relabeling**: - `relabel_configs` — pre-scrape (filter targets, set labels) - `metric_relabel_configs` — post-scrape (transform metrics) - Order matters; chain transformations 3. **For Kubernetes SD**: - Roles: pod, service, endpoints, node, ingress, endpointslice - Discovered labels (`__meta_kubernetes_*`) - Standard pattern: annotation-driven scrape 4. **For "target missing"**: - SD not discovering - Filter relabeling dropping - Auth / network reach 5. **For label noise**: - SD adds many `__meta_*` labels - `keep` action filters to desired - `labelmap` copies useful ones 6. **For scrape interval**: - Global default; per-job override - Faster = more samples / more storage; balance 7. **For scrape timeout**: - Should be < scrape interval - For slow exporters: separate slow / fast jobs 8. **For multiple Prometheus** instances: - Different jobs per Prom - Avoid duplicate scrapes Mark DESTRUCTIVE: drop relabel matching everything (no targets), changing labels in alerts dependents, very short scrape interval (CPU storm). --- Discovery: [DESCRIBE] Current scrape config: ```yaml [PASTE] ``` Symptom: [DESCRIBE]
Why this prompt works
Scrape config is the input layer. This prompt walks patterns.
How to use it
- Pick SD for your environment.
- Relabel for clean labels.
- Set scrape interval thoughtfully.
- Verify discovery in Prometheus UI.
Useful commands
# Verify in Prometheus
# UI: Status → Service Discovery
# UI: Status → Targets
# API
curl http://prometheus:9090/api/v1/targets?state=active | jq
curl http://prometheus:9090/api/v1/discovery/targets
# Reload config
curl -XPOST http://prometheus:9090/-/reload
# Validate config
promtool check config /etc/prometheus/prometheus.yml
# Test scrape from CLI
curl http://<target>:<port>/metrics | head
Patterns
Kubernetes SD (annotation-driven)
scrape_configs:
- job_name: kubernetes-pods
kubernetes_sd_configs:
- role: pod
relabel_configs:
# Keep pods with prometheus.io/scrape=true
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
action: keep
regex: true
# Use prometheus.io/path annotation as metrics path
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
action: replace
target_label: __metrics_path__
regex: (.+)
# Use prometheus.io/port annotation as port
- source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]
action: replace
regex: ([^:]+)(?::\d+)?;(\d+)
replacement: $1:$2
target_label: __address__
# Copy useful K8s labels
- action: labelmap
regex: __meta_kubernetes_pod_label_(.+)
- source_labels: [__meta_kubernetes_namespace]
target_label: namespace
- source_labels: [__meta_kubernetes_pod_name]
target_label: pod
EC2 SD
- job_name: ec2-nodes
ec2_sd_configs:
- region: us-east-1
port: 9100
filters:
- name: tag:role
values: [monitoring]
relabel_configs:
- source_labels: [__meta_ec2_tag_Name]
target_label: instance
- source_labels: [__meta_ec2_instance_type]
target_label: instance_type
File SD
- job_name: external-services
file_sd_configs:
- files:
- /etc/prometheus/file_sd/*.json
refresh_interval: 1m
[
{
"targets": ["host1.example.com:9100", "host2.example.com:9100"],
"labels": { "env": "prod", "team": "platform" }
}
]
Drop noisy metrics post-scrape
metric_relabel_configs:
- source_labels: [__name__]
regex: 'go_(memstats_alloc_bytes|gc_duration_seconds_count)'
action: drop
Common findings this catches
- Targets discovered but not scraping → relabel dropped them.
- Scrape errors (5xx, timeout) → app endpoint issue or auth.
- Duplicate targets across jobs → narrow with relabel.
- Pod IPs changing breaks SD assumption → use Service.
up{}metric shows 0 → target unreachable.- Stale targets persist → SD source not refreshing.
- High cardinality from labelmap → filter more.
When to escalate
- New SD source (custom) — engage Prometheus team.
- Massive label change — coordinate dashboards/alerts.
- Cross-region / multi-DC scraping — strategic.
Related prompts
-
Prometheus ServiceMonitor & PodMonitor Configuration Prompt
Configure Prometheus Operator scrape — ServiceMonitor, PodMonitor, target discovery, label rewriting, missing metrics debugging.
-
Prometheus Storage, Retention & TSDB Prompt
Configure Prometheus TSDB — retention, block size, compaction, WAL, disk sizing, troubleshooting OOM / disk-full.
-
PromQL Query Optimization Prompt
Diagnose slow PromQL queries — cardinality explosion, range vector traps, sum vs avg pitfalls, query timeout, recording rules opportunity.