Grafana Synthetic Monitoring Checks Prompt
Design Grafana Synthetic Monitoring checks (HTTP, ping, DNS, traceroute, k6 browser) with probe locations, assertions, and SLO alerting.
- Target user
- SREs setting up black-box uptime and latency monitoring in Grafana Cloud
- Difficulty
- Intermediate
- Tools
- Claude, ChatGPT
The prompt
You are a senior SRE who runs Grafana Synthetic Monitoring for black-box availability and latency SLOs. I will provide: - The endpoints/services to probe - Target SLOs (availability, latency) - Regions your users come from Your job: 1. **Pick check types**: HTTP(S) for API/web availability, ping for L3 reachability, DNS for resolution, traceroute for path analysis, and k6 browser checks for full page loads. 2. **Choose probe locations**: select public probes near your user regions (or a private probe for internal targets) and set the frequency (e.g. 60s). 3. **HTTP assertions**: validate status code, expected body/regex, TLS certificate expiry, and response time thresholds. 4. **Labels**: tag checks with `service`, `env`, `team` so metrics/logs are queryable and route alerts. 5. **Metrics produced**: `probe_success`, `probe_duration_seconds`, `probe_http_status_code`, `probe_ssl_earliest_cert_expiry` — use these for SLOs. 6. **Alerting**: alert on `probe_success` availability over a window and on cert expiry approaching. 7. **Private probes**: deploy a private probe agent for internal-only endpoints. 8. **As code**: manage checks via the Synthetic Monitoring API or the Terraform provider. Mark DESTRUCTIVE: pointing high-frequency checks at rate-limited endpoints, probing third-party APIs you don't own (ToS), leaking secrets in check bodies. --- Endpoints/services: [DESCRIBE] Target SLOs: [DESCRIBE] User regions: [DESCRIBE]
Why this prompt works
Synthetic checks are only as good as their assertions and probe placement — a 200-only HTTP check misses broken content, and probes in the wrong region measure meaningless latency. This prompt makes the model choose check types deliberately, add real assertions (body, TLS, latency), and connect the resulting probe_* metrics to SLO alerts.
How to use it
- List endpoints and whether they are internal so it picks public vs private probes.
- State SLO targets so assertions and alert windows match.
- Name user regions so probe locations are relevant.
- Ask for Terraform or API definitions to manage checks as code.
Useful commands
# List existing checks via the Synthetic Monitoring API
curl -s -H "Authorization: Bearer $SM_TOKEN" \
https://synthetic-monitoring-api.grafana.net/api/v1/check/list | jq '.[].job'
# Add a check via the API
curl -X POST https://synthetic-monitoring-api.grafana.net/api/v1/check/add \
-H "Authorization: Bearer $SM_TOKEN" \
-H "Content-Type: application/json" \
-d @http-check.json
Example config
# Terraform: HTTP synthetic check with assertions
resource "grafana_synthetic_monitoring_check" "checkout" {
job = "checkout-api"
target = "https://api.example.com/health"
enabled = true
probes = [/* us-east, eu-west probe ids */]
labels = {
service = "checkout"
env = "prod"
team = "payments"
}
settings {
http {
method = "GET"
valid_status_codes = [200]
fail_if_body_not_matches_regexp = ["\"status\":\\s*\"ok\""]
tls_config { insecure_skip_verify = false }
ip_version = "V4"
}
}
}
# Availability SLO alert over 5m and TLS cert-expiry alert
avg_over_time(probe_success{job="checkout-api"}[5m]) < 0.99
# and
(probe_ssl_earliest_cert_expiry - time()) / 86400 < 14
Common findings this catches
- False green → HTTP check validates status only, not body.
- Wrong latency → probe region far from real users.
- Unreachable internal target → needs a private probe.
- Surprise TLS outage → no cert-expiry alert.
- Throttled target → check frequency too aggressive.
- Secret leak → token in check header/body.
- Cost creep → too many checks inflating active series.
When to escalate
- Rate-limiting or ToS concerns with external targets — coordinate with the target owner/legal.
- Private probe network access — networking/security team.
- SLO/error-budget policy — reliability governance.
Related prompts
-
Grafana Business KPI Dashboard Design Prompt
Design an executive/business KPI dashboard in Grafana with stat panels, trend sparklines, thresholds, and period-over-period comparisons.
-
Grafana PagerDuty/Opsgenie Contact Point Prompt
Configure Grafana Alerting contact points for PagerDuty and Opsgenie with notification policies, routing by label, and severity mapping.
-
Grafana Terraform Provider Dashboards Prompt
Manage Grafana dashboards, folders, and alerts as code using the Terraform grafana provider with stable UIDs and state.