Skip to content
CloudOps
Newsletter
All prompts
AI for Prometheus & Monitoring Difficulty: Intermediate ClaudeChatGPT

Blackbox Exporter Probe Configuration Prompt

Configure blackbox_exporter for HTTP, TCP, ICMP, DNS probes — uptime monitoring, certificate expiry, response validation.

Target user
SREs monitoring external endpoints
Difficulty
Intermediate
Tools
Claude, ChatGPT

The prompt

You are a senior SRE who has set up blackbox_exporter for thousands of probes — HTTP health checks, TCP connectivity, certificate expiry, DNS validation.

I will provide:
- The probe use case
- Current blackbox.yaml config
- Symptom (probe failing, false positive, slow)

Your job:

1. **Probe modules**:
   - **http_2xx** — HTTP GET, expect 2xx
   - **tcp_connect** — TCP handshake
   - **icmp** — ping
   - **dns** — DNS query
2. **For HTTP probes**:
   - URL, method, headers, body
   - Expected status codes
   - Expected/forbidden body content (regex)
   - TLS verification
   - Follow redirects
   - Time-to-first-byte vs total time
3. **For certificate expiry**:
   - `probe_ssl_earliest_cert_expiry` metric
   - Alert when < 30d
   - Tracks the soonest-expiring cert in chain
4. **For TCP probe**:
   - Connection only (no application-layer)
   - Useful for ports without HTTP
5. **For ICMP**:
   - Often requires CAP_NET_RAW on exporter
   - Useful for basic reachability
6. **For DNS**:
   - Query type, expected answers
   - Validate name resolution
7. **For probe target list**:
   - Static or `file_sd`
   - With Prometheus relabeling
8. **For multi-stage** (newer):
   - HTTP probe with multiple steps

Mark DESTRUCTIVE: probe target list with massive cardinality (resource), HTTP body containing secrets (logged), removing health probe of critical endpoint.

---

Use case: [DESCRIBE]
blackbox.yaml:
```yaml
[PASTE]
```
Symptom: [DESCRIBE]

Why this prompt works

Blackbox monitoring is foundational. This prompt walks configurations.

How to use it

  1. Pick module per endpoint.
  2. Validate response, not just connection.
  3. Cert expiry alerts for renewals.
  4. Test config before deploy.

Useful commands

# Test probe manually
curl 'http://blackbox-exporter:9115/probe?target=https://example.com&module=http_2xx'

# Metrics from probe
# probe_success                       # 1 = OK, 0 = fail
# probe_http_status_code              # response code
# probe_duration_seconds              # total
# probe_http_duration_seconds         # per-phase (resolve, connect, tls, processing, transfer)
# probe_ssl_earliest_cert_expiry      # cert expiry timestamp

# Cert expiring in 30 days alert (PromQL)
probe_ssl_earliest_cert_expiry - time() < 30 * 86400

blackbox.yaml patterns

HTTP probe

modules:
  http_2xx:
    prober: http
    timeout: 5s
    http:
      method: GET
      valid_http_versions: [HTTP/1.1, HTTP/2.0]
      valid_status_codes: [200, 204]
      no_follow_redirects: false
      fail_if_ssl: false
      fail_if_not_ssl: true
      fail_if_body_matches_regexp:
      - "error"
      fail_if_body_not_matches_regexp:
      - "healthy"
      tls_config:
        insecure_skip_verify: false
      preferred_ip_protocol: ip4

  http_post_2xx:
    prober: http
    http:
      method: POST
      headers:
        Content-Type: application/json
      body: '{"test": "data"}'
      valid_status_codes: [200, 201]

  tcp_connect:
    prober: tcp
    timeout: 5s

  icmp:
    prober: icmp
    timeout: 5s

  dns:
    prober: dns
    timeout: 5s
    dns:
      query_name: example.com
      query_type: A
      valid_rcodes: [NOERROR]

Prometheus scrape config

scrape_configs:
- job_name: blackbox-http
  metrics_path: /probe
  params:
    module: [http_2xx]
  static_configs:
  - targets:
    - https://example.com
    - https://api.example.com
    - https://www.example.com
  relabel_configs:
  - source_labels: [__address__]
    target_label: __param_target
  - source_labels: [__param_target]
    target_label: instance
  - target_label: __address__
    replacement: blackbox-exporter:9115

File SD for targets

file_sd_configs:
- files:
  - /etc/prometheus/blackbox-targets/*.json
[
  {
    "targets": ["https://example.com", "https://www.example.com"],
    "labels": { "team": "web", "severity": "critical" }
  }
]

Alert on probe failure

- alert: EndpointDown
  expr: probe_success == 0
  for: 5m
  labels: { severity: critical }
  annotations:
    summary: "{{ $labels.instance }} is down"

- alert: CertExpiringSoon
  expr: probe_ssl_earliest_cert_expiry - time() < 30 * 86400
  for: 1h
  labels: { severity: warning }
  annotations:
    summary: "{{ $labels.instance }} cert expires in less than 30 days"

Common findings this catches

  • Probe always failing on HTTPS → cert validation; insecure_skip_verify=true OR fix cert.
  • ICMP fails despite host reachable → CAP_NET_RAW missing on exporter.
  • HTTP probe returns 503 but probe_success=1 → status codes config too broad.
  • Body regex never matches → check actual response body.
  • Cert expiry alerts but renewals manual → automate.
  • DNS probe always passes → checking local DNS, not upstream.
  • Many targets compounding load → split jobs or scale.

When to escalate

  • Cert renewal automation — coordinated.
  • Probe target list growth — capacity.
  • Network policy blocking probes — networking.

Related prompts

Newsletter

Free: the DevOps AI Incident-Triage Cheat Sheet

Subscribe and we’ll send you the one-page cheat sheet — plus weekly AI prompts, automation ideas, and tool reviews for infrastructure engineers. One email a week. No spam, unsubscribe anytime.

  • AI Incident-Triage Cheat Sheet (PDF)
  • Access to 1,603 DevOps AI prompts
  • One practical workflow email per week