Blackbox Exporter Probe Configuration Prompt
Configure blackbox_exporter for HTTP, TCP, ICMP, DNS probes — uptime monitoring, certificate expiry, response validation.
- Target user
- SREs monitoring external endpoints
- Difficulty
- Intermediate
- Tools
- Claude, ChatGPT
The prompt
You are a senior SRE who has set up blackbox_exporter for thousands of probes — HTTP health checks, TCP connectivity, certificate expiry, DNS validation. I will provide: - The probe use case - Current blackbox.yaml config - Symptom (probe failing, false positive, slow) Your job: 1. **Probe modules**: - **http_2xx** — HTTP GET, expect 2xx - **tcp_connect** — TCP handshake - **icmp** — ping - **dns** — DNS query 2. **For HTTP probes**: - URL, method, headers, body - Expected status codes - Expected/forbidden body content (regex) - TLS verification - Follow redirects - Time-to-first-byte vs total time 3. **For certificate expiry**: - `probe_ssl_earliest_cert_expiry` metric - Alert when < 30d - Tracks the soonest-expiring cert in chain 4. **For TCP probe**: - Connection only (no application-layer) - Useful for ports without HTTP 5. **For ICMP**: - Often requires CAP_NET_RAW on exporter - Useful for basic reachability 6. **For DNS**: - Query type, expected answers - Validate name resolution 7. **For probe target list**: - Static or `file_sd` - With Prometheus relabeling 8. **For multi-stage** (newer): - HTTP probe with multiple steps Mark DESTRUCTIVE: probe target list with massive cardinality (resource), HTTP body containing secrets (logged), removing health probe of critical endpoint. --- Use case: [DESCRIBE] blackbox.yaml: ```yaml [PASTE] ``` Symptom: [DESCRIBE]
Why this prompt works
Blackbox monitoring is foundational. This prompt walks configurations.
How to use it
- Pick module per endpoint.
- Validate response, not just connection.
- Cert expiry alerts for renewals.
- Test config before deploy.
Useful commands
# Test probe manually
curl 'http://blackbox-exporter:9115/probe?target=https://example.com&module=http_2xx'
# Metrics from probe
# probe_success # 1 = OK, 0 = fail
# probe_http_status_code # response code
# probe_duration_seconds # total
# probe_http_duration_seconds # per-phase (resolve, connect, tls, processing, transfer)
# probe_ssl_earliest_cert_expiry # cert expiry timestamp
# Cert expiring in 30 days alert (PromQL)
probe_ssl_earliest_cert_expiry - time() < 30 * 86400
blackbox.yaml patterns
HTTP probe
modules:
http_2xx:
prober: http
timeout: 5s
http:
method: GET
valid_http_versions: [HTTP/1.1, HTTP/2.0]
valid_status_codes: [200, 204]
no_follow_redirects: false
fail_if_ssl: false
fail_if_not_ssl: true
fail_if_body_matches_regexp:
- "error"
fail_if_body_not_matches_regexp:
- "healthy"
tls_config:
insecure_skip_verify: false
preferred_ip_protocol: ip4
http_post_2xx:
prober: http
http:
method: POST
headers:
Content-Type: application/json
body: '{"test": "data"}'
valid_status_codes: [200, 201]
tcp_connect:
prober: tcp
timeout: 5s
icmp:
prober: icmp
timeout: 5s
dns:
prober: dns
timeout: 5s
dns:
query_name: example.com
query_type: A
valid_rcodes: [NOERROR]
Prometheus scrape config
scrape_configs:
- job_name: blackbox-http
metrics_path: /probe
params:
module: [http_2xx]
static_configs:
- targets:
- https://example.com
- https://api.example.com
- https://www.example.com
relabel_configs:
- source_labels: [__address__]
target_label: __param_target
- source_labels: [__param_target]
target_label: instance
- target_label: __address__
replacement: blackbox-exporter:9115
File SD for targets
file_sd_configs:
- files:
- /etc/prometheus/blackbox-targets/*.json
[
{
"targets": ["https://example.com", "https://www.example.com"],
"labels": { "team": "web", "severity": "critical" }
}
]
Alert on probe failure
- alert: EndpointDown
expr: probe_success == 0
for: 5m
labels: { severity: critical }
annotations:
summary: "{{ $labels.instance }} is down"
- alert: CertExpiringSoon
expr: probe_ssl_earliest_cert_expiry - time() < 30 * 86400
for: 1h
labels: { severity: warning }
annotations:
summary: "{{ $labels.instance }} cert expires in less than 30 days"
Common findings this catches
- Probe always failing on HTTPS → cert validation; insecure_skip_verify=true OR fix cert.
- ICMP fails despite host reachable → CAP_NET_RAW missing on exporter.
- HTTP probe returns 503 but probe_success=1 → status codes config too broad.
- Body regex never matches → check actual response body.
- Cert expiry alerts but renewals manual → automate.
- DNS probe always passes → checking local DNS, not upstream.
- Many targets compounding load → split jobs or scale.
When to escalate
- Cert renewal automation — coordinated.
- Probe target list growth — capacity.
- Network policy blocking probes — networking.
Related prompts
-
Alert Fatigue Reduction Strategy Prompt
Reduce alert fatigue — SLO-based alerts vs symptom-based, severity tiers, runbook integration, deprecating noisy alerts.
-
Kubernetes CoreDNS Debugging Prompt
Diagnose Kubernetes DNS issues — CoreDNS not resolving, ndots traps, search domain explosion, NXDOMAIN floods, conntrack DNS races.
-
Prometheus Scrape Config & Service Discovery Prompt
Configure Prometheus scrape targets — kubernetes_sd, ec2_sd, file_sd, consul_sd, relabeling, scrape interval tuning.