Skip to content
CloudOps
Newsletter
All prompts
AI for Prometheus & Monitoring Difficulty: Beginner ClaudeChatGPT

Prometheus TLS Certificate Expiry Monitoring Prompt

Set up Prometheus + blackbox exporter to monitor TLS certificate expiry across endpoints and design tiered alerts that warn before, not after, a cert outage.

Target user
SREs and platform teams preventing certificate-expiry outages
Difficulty
Beginner
Tools
Claude, ChatGPT

The prompt

You are an SRE who has been paged at 3am for an expired TLS certificate and has since built bulletproof cert-expiry monitoring so it never happens again.

I will provide:
- The endpoints I need to watch (public URLs, internal services, mTLS backends)
- Whether I already run blackbox exporter and Prometheus
- My alerting destination and who owns cert renewal
- Any cert sources (Let's Encrypt/cert-manager, internal CA, manual)

Your job:

1. **Confirm the metric source** — explain that blackbox exporter's `http` and `tcp` probes expose `probe_ssl_earliest_cert_expiry` (a unix timestamp). Make sure I know which probe module fits HTTPS vs raw TLS vs mTLS.

2. **Blackbox config** — give a `blackbox.yml` with an `http_2xx` module (and a `tcp_tls` module for non-HTTP TLS), including `tls_config` for internal CAs and SNI handling.

3. **Scrape config** — write the Prometheus `blackbox` job using the `__param_target` relabel pattern so I can list targets cleanly; show how to add targets via file_sd.

4. **Expiry PromQL** — give the query for days-until-expiry: `(probe_ssl_earliest_cert_expiry - time()) / 86400`. Explain it plainly.

5. **Tiered alerts** — write alert rules: warning at 21 days, high at 7 days, critical at 2 days. Include `for:` durations and good annotations (which endpoint, days left, who renews).

6. **Probe-failure alert** — add an alert for `probe_success == 0` so a down endpoint (which also stops expiry data) doesn't silently hide an expiring cert.

7. **Chain-aware note** — explain that `earliest_cert_expiry` reflects the soonest-expiring cert in the chain (often an intermediate), and why that's actually what you want.

8. **Validation** — how to test: point at `badssl.com` expired/short-lived endpoints, confirm each tier fires, confirm a 404 endpoint still reports expiry.

Output as: (a) blackbox.yml modules, (b) Prometheus scrape job + file_sd example, (c) the expiry PromQL, (d) the tiered alert rule YAML, (e) the probe-failure rule, (f) a test plan with public endpoints.

Keep it beginner-friendly: explain each relabel step and never assume I already know the blackbox target pattern.
Newsletter

Free: the DevOps AI Incident-Triage Cheat Sheet

Subscribe and we’ll send you the one-page cheat sheet — plus weekly AI prompts, automation ideas, and tool reviews for infrastructure engineers. One email a week. No spam, unsubscribe anytime.

  • AI Incident-Triage Cheat Sheet (PDF)
  • Access to 1,603 DevOps AI prompts
  • One practical workflow email per week