Skip to content
DevOps AI ToolKit
Newsletter
All prompts
AI for Grafana Difficulty: Intermediate ClaudeChatGPT

Grafana Synthetic Monitoring Checks Prompt

Design Grafana Synthetic Monitoring checks (HTTP, ping, DNS, traceroute, k6 browser) with probe locations, assertions, and SLO alerting.

Target user
SREs setting up black-box uptime and latency monitoring in Grafana Cloud
Difficulty
Intermediate
Tools
Claude, ChatGPT

The prompt

You are a senior SRE who runs Grafana Synthetic Monitoring for black-box availability and latency SLOs.

I will provide:
- The endpoints/services to probe
- Target SLOs (availability, latency)
- Regions your users come from

Your job:

1. **Pick check types**: HTTP(S) for API/web availability, ping for L3 reachability, DNS for resolution, traceroute for path analysis, and k6 browser checks for full page loads.
2. **Choose probe locations**: select public probes near your user regions (or a private probe for internal targets) and set the frequency (e.g. 60s).
3. **HTTP assertions**: validate status code, expected body/regex, TLS certificate expiry, and response time thresholds.
4. **Labels**: tag checks with `service`, `env`, `team` so metrics/logs are queryable and route alerts.
5. **Metrics produced**: `probe_success`, `probe_duration_seconds`, `probe_http_status_code`, `probe_ssl_earliest_cert_expiry` — use these for SLOs.
6. **Alerting**: alert on `probe_success` availability over a window and on cert expiry approaching.
7. **Private probes**: deploy a private probe agent for internal-only endpoints.
8. **As code**: manage checks via the Synthetic Monitoring API or the Terraform provider.

Mark DESTRUCTIVE: pointing high-frequency checks at rate-limited endpoints, probing third-party APIs you don't own (ToS), leaking secrets in check bodies.

---

Endpoints/services: [DESCRIBE]
Target SLOs: [DESCRIBE]
User regions: [DESCRIBE]

Why this prompt works

Synthetic checks are only as good as their assertions and probe placement — a 200-only HTTP check misses broken content, and probes in the wrong region measure meaningless latency. This prompt makes the model choose check types deliberately, add real assertions (body, TLS, latency), and connect the resulting probe_* metrics to SLO alerts.

How to use it

  1. List endpoints and whether they are internal so it picks public vs private probes.
  2. State SLO targets so assertions and alert windows match.
  3. Name user regions so probe locations are relevant.
  4. Ask for Terraform or API definitions to manage checks as code.

Useful commands

# List existing checks via the Synthetic Monitoring API
curl -s -H "Authorization: Bearer $SM_TOKEN" \
  https://synthetic-monitoring-api.grafana.net/api/v1/check/list | jq '.[].job'

# Add a check via the API
curl -X POST https://synthetic-monitoring-api.grafana.net/api/v1/check/add \
  -H "Authorization: Bearer $SM_TOKEN" \
  -H "Content-Type: application/json" \
  -d @http-check.json

Example config

# Terraform: HTTP synthetic check with assertions
resource "grafana_synthetic_monitoring_check" "checkout" {
  job     = "checkout-api"
  target  = "https://api.example.com/health"
  enabled = true
  probes  = [/* us-east, eu-west probe ids */]
  labels = {
    service = "checkout"
    env     = "prod"
    team    = "payments"
  }
  settings {
    http {
      method                = "GET"
      valid_status_codes    = [200]
      fail_if_body_not_matches_regexp = ["\"status\":\\s*\"ok\""]
      tls_config { insecure_skip_verify = false }
      ip_version = "V4"
    }
  }
}
# Availability SLO alert over 5m and TLS cert-expiry alert
avg_over_time(probe_success{job="checkout-api"}[5m]) < 0.99
# and
(probe_ssl_earliest_cert_expiry - time()) / 86400 < 14

Common findings this catches

  • False green → HTTP check validates status only, not body.
  • Wrong latency → probe region far from real users.
  • Unreachable internal target → needs a private probe.
  • Surprise TLS outage → no cert-expiry alert.
  • Throttled target → check frequency too aggressive.
  • Secret leak → token in check header/body.
  • Cost creep → too many checks inflating active series.

When to escalate

  • Rate-limiting or ToS concerns with external targets — coordinate with the target owner/legal.
  • Private probe network access — networking/security team.
  • SLO/error-budget policy — reliability governance.

Related prompts

Newsletter

Free: the DevOps AI Incident-Triage Cheat Sheet

Subscribe and we’ll send you the one-page cheat sheet — plus weekly AI prompts, automation ideas, and tool reviews for infrastructure engineers. One email a week. No spam, unsubscribe anytime.

  • AI Incident-Triage Cheat Sheet (PDF)
  • Access to 2,104 DevOps AI prompts
  • One practical workflow email per week