Generating Blackbox Exporter Probe Configs With AI Safely

The blackbox exporter is one of those tools where the concept is trivial — probe an endpoint, report whether it’s up — and the configuration is a thicket of modules, TLS settings, expected status codes, and the multi-target scrape pattern that confuses everyone the first time. It’s exactly the kind of fiddly, well-documented YAML that AI generates in seconds, which is wonderful right up until it generates a probe that reports green while the endpoint is actually broken. A false-green synthetic check is worse than no check, because it manufactures confidence. So I let AI draft these constantly, but I review them against a specific failure model. Here’s how.

Why the multi-target pattern trips everyone

The blackbox exporter doesn’t scrape itself in the usual way. Prometheus scrapes the exporter, passing the real target as a URL parameter, and relabeling rewires the address. AI knows this pattern but assembles it wrong about a third of the time, producing a config that scrapes the exporter’s own metrics instead of probing your target. The tell is that probe_success never appears. I always verify the relabeling chain that makes the indirection work:

scrape_configs:
  - job_name: blackbox-http
    metrics_path: /probe
    params:
      module: [http_2xx]
    static_configs:
      - targets:
          - https://api.example.com/health
    relabel_configs:
      - source_labels: [__address__]
        target_label: __param_target
      - source_labels: [__param_target]
        target_label: instance
      - target_label: __address__
        replacement: blackbox-exporter:9115

That last block is where AI most often gets the order wrong. The __address__ must end up pointing at the exporter, with the real target carried in __param_target. If the model produces a config without this rewire, it’s scraping nothing useful.

The false-green failure: status codes

The default http_2xx module accepts any 2xx response, but plenty of broken services return 200 OK with an error page, or your health endpoint returns 503 deliberately during maintenance. A probe that only checks “did I get a 2xx?” is the classic false-green. I make the model define explicit expectations and justify them:

modules:
  http_api_health:
    prober: http
    timeout: 5s
    http:
      valid_status_codes: [200]
      method: GET
      fail_if_body_not_matches_regexp:
        - '"status"\s*:\s*"ok"'

The fail_if_body_not_matches_regexp is the difference between “the server answered” and “the server is actually healthy.” AI omits body matching unless asked, because the bare http_2xx module is shorter — and shorter is exactly what produces false greens.

TLS and certificate expiry are easy to fake-pass

A probe that follows redirects and ignores TLS errors will happily report success against a site with an expired certificate. I check that the model didn’t set insecure_skip_verify: true (it sometimes does, to “make it work”) and that certificate expiry is actually being monitored:

http:
  fail_if_not_ssl: true   # require TLS, don't silently accept HTTP
tls_config:
  insecure_skip_verify: false

Then the cert-expiry alert is a separate PromQL rule on a metric the exporter provides:

# Alert 7 days before any probed cert expires
probe_ssl_earliest_cert_expiry - time() < 7 * 24 * 3600

This is the kind of thing the model writes correctly when prompted and skips when not — a fast junior engineer who knows the metric exists but won’t volunteer the alert.

Pro Tip: After the AI generates a probe module, ask it to describe three ways the target could be genuinely broken while this probe still reports probe_success 1. If it lists “returns 200 with an error body” or “expired cert with skip-verify on,” those are your missing checks. The failure-enumeration step turns a green-by-default probe into a real one.

Timeouts and the scrape-interval relationship

A probe timeout longer than the scrape interval causes overlapping probes and confusing data. AI sets these independently and rarely checks their relationship. I confirm the module timeout is comfortably shorter than the scrape interval, and that the timeout reflects the real SLA of the endpoint — a 5s timeout on a service that legitimately takes 8s under load will flap. That’s a judgment about your service that the model can’t make for you.

DNS and TCP probes have their own traps

For non-HTTP probes the false-green risk shifts. A TCP probe that only checks “did the port accept a connection?” passes even when the service behind it is wedged. A DNS probe needs an explicit expected answer or it passes on any response. I have the model add the meaningful assertion for the protocol — a query-response match for TCP where possible, an expected record for DNS — rather than accepting “something answered” as health.

Treat probe configs as reviewable code

These configs live in Git alongside the rest of the monitoring stack, so they get the same treatment. Our code review dashboard catches the structural issues — broken relabeling, insecure_skip_verify: true, missing body checks — before a human reviews the intent. The alert rules built on top of probe_success and probe_duration_seconds go through the Alert Rule Generator so they arrive with proper for: windows and runbook annotations. I keep my probe-module prompts in the prompt workspace so the team generates consistent, body-checking probes by default.

Probe from where your users actually are

A subtle design point the model won’t raise: a probe from inside your cluster tests a different thing than a probe from outside it. An internal probe confirms the service is up; it says nothing about whether DNS resolves correctly for users, whether your load balancer is routing, or whether a CDN is serving stale errors. AI will happily generate a single internal probe and call your endpoint “monitored,” but real synthetic monitoring needs probes from the vantage points your users occupy. I ask the model to generate the same module targeted from multiple scrape sources and then reconcile the results, because a discrepancy between an internal green and an external red is itself the most valuable signal — it localizes the failure to the network path rather than the service. The decision about which vantage points matter is mine; the model just replicates the module across them.

The human decides what “healthy” means

The throughline: AI assembles blackbox YAML fast and gets the boilerplate right, but it defaults to the weakest possible definition of health — “the server answered.” Only you know what a genuinely healthy response looks like for your endpoint, what status codes are acceptable, and how long it’s allowed to take. So the model drafts, and a human makes the probe actually prove health, then confirms it can explain why each check matters before it ships. A synthetic check you can’t explain is a false sense of security with a green dot on it.

Conclusion

The blackbox exporter is fiddly enough that AI generation is a real time-saver, and dangerous enough that a naive config manufactures false confidence. Verify the multi-target relabeling, demand explicit status-code and body checks, keep TLS verification on, and enumerate the ways the probe could pass while the target is broken. A synthetic check that can’t fail when the service is down is worse than nothing. More probe and synthetic-monitoring patterns are in the monitoring guides, and reusable prompts are in the prompts library.