You are a senior SRE who has set up blackbox_exporter for thousands of probes — HTTP health checks, TCP connectivity, certificate expiry, DNS validation. I will provide: - The probe use case - Current blackbox.yaml config - Symptom (probe failing, false positive, slow) Your job: 1. **Probe modules**: - **http_2xx** — HTTP GET, expect 2xx - **tcp_connect** — TCP handshake - **icmp** — ping - **dns** — DNS query 2. **For HTTP probes**: - URL, method, headers, body - Expected status codes - Expected/forbidden body content (regex) - TLS verification - Follow redirects - Time-to-first-byte vs total time 3. **For certificate expiry**: - `probe_ssl_earliest_cert_expiry` metric - Alert when < 30d - Tracks the soonest-expiring cert in chain 4. **For TCP probe**: - Connection only (no application-layer) - Useful for ports without HTTP 5. **For ICMP**: - Often requires CAP_NET_RAW on exporter - Useful for basic reachability 6. **For DNS**: - Query type, expected answers - Validate name resolution 7. **For probe target list**: - Static or `file_sd` - With Prometheus relabeling 8. **For multi-stage** (newer): - HTTP probe with multiple steps Mark DESTRUCTIVE: probe target list with massive cardinality (resource), HTTP body containing secrets (logged), removing health probe of critical endpoint. --- Use case: [DESCRIBE] blackbox.yaml: ```yaml [PASTE] ``` Symptom: [DESCRIBE]

Why this prompt works

Blackbox monitoring is foundational. This prompt walks configurations.

How to use it

Pick module per endpoint.
Validate response, not just connection.
Cert expiry alerts for renewals.
Test config before deploy.

Useful commands

# Test probe manually
curl 'http://blackbox-exporter:9115/probe?target=https://example.com&module=http_2xx'

# Metrics from probe
# probe_success                       # 1 = OK, 0 = fail
# probe_http_status_code              # response code
# probe_duration_seconds              # total
# probe_http_duration_seconds         # per-phase (resolve, connect, tls, processing, transfer)
# probe_ssl_earliest_cert_expiry      # cert expiry timestamp

# Cert expiring in 30 days alert (PromQL)
probe_ssl_earliest_cert_expiry - time() < 30 * 86400

blackbox.yaml patterns

HTTP probe

modules:
  http_2xx:
    prober: http
    timeout: 5s
    http:
      method: GET
      valid_http_versions: [HTTP/1.1, HTTP/2.0]
      valid_status_codes: [200, 204]
      no_follow_redirects: false
      fail_if_ssl: false
      fail_if_not_ssl: true
      fail_if_body_matches_regexp:
      - "error"
      fail_if_body_not_matches_regexp:
      - "healthy"
      tls_config:
        insecure_skip_verify: false
      preferred_ip_protocol: ip4

  http_post_2xx:
    prober: http
    http:
      method: POST
      headers:
        Content-Type: application/json
      body: '{"test": "data"}'
      valid_status_codes: [200, 201]

  tcp_connect:
    prober: tcp
    timeout: 5s

  icmp:
    prober: icmp
    timeout: 5s

  dns:
    prober: dns
    timeout: 5s
    dns:
      query_name: example.com
      query_type: A
      valid_rcodes: [NOERROR]

Prometheus scrape config

scrape_configs:
- job_name: blackbox-http
  metrics_path: /probe
  params:
    module: [http_2xx]
  static_configs:
  - targets:
    - https://example.com
    - https://api.example.com
    - https://www.example.com
  relabel_configs:
  - source_labels: [__address__]
    target_label: __param_target
  - source_labels: [__param_target]
    target_label: instance
  - target_label: __address__
    replacement: blackbox-exporter:9115

File SD for targets

file_sd_configs:
- files:
  - /etc/prometheus/blackbox-targets/*.json

[
  {
    "targets": ["https://example.com", "https://www.example.com"],
    "labels": { "team": "web", "severity": "critical" }
  }
]

Alert on probe failure

- alert: EndpointDown
  expr: probe_success == 0
  for: 5m
  labels: { severity: critical }
  annotations:
    summary: "{{ $labels.instance }} is down"

- alert: CertExpiringSoon
  expr: probe_ssl_earliest_cert_expiry - time() < 30 * 86400
  for: 1h
  labels: { severity: warning }
  annotations:
    summary: "{{ $labels.instance }} cert expires in less than 30 days"

Common findings this catches

Probe always failing on HTTPS → cert validation; insecure_skip_verify=true OR fix cert.
ICMP fails despite host reachable → CAP_NET_RAW missing on exporter.
HTTP probe returns 503 but probe_success=1 → status codes config too broad.
Body regex never matches → check actual response body.
Cert expiry alerts but renewals manual → automate.
DNS probe always passes → checking local DNS, not upstream.
Many targets compounding load → split jobs or scale.

When to escalate

Cert renewal automation — coordinated.
Probe target list growth — capacity.
Network policy blocking probes — networking.

Blackbox Exporter Probe Configuration Prompt

Why this prompt works

How to use it

Useful commands

blackbox.yaml patterns

HTTP probe

Prometheus scrape config

File SD for targets

Alert on probe failure

Common findings this catches

When to escalate

Related prompts

Alert Fatigue Reduction Strategy Prompt

Kubernetes CoreDNS Debugging Prompt

Prometheus Scrape Config & Service Discovery Prompt

Why this prompt works

How to use it

Useful commands

blackbox.yaml patterns

HTTP probe

Prometheus scrape config

File SD for targets

Alert on probe failure

Common findings this catches

When to escalate

Related prompts

Alert Fatigue Reduction Strategy Prompt

Kubernetes CoreDNS Debugging Prompt

Prometheus Scrape Config & Service Discovery Prompt

Free: the DevOps AI Incident-Triage Cheat Sheet