GCP with AI Difficulty: Advanced ClaudeChatGPTCursor

GCP Cloud Monitoring Alert Policy & SLO Design Prompt

Design Cloud Monitoring alerting policies and SLOs that page on user-facing pain — not noisy threshold alerts — by choosing the right metric, condition, burn-rate windows, and notification routing.

Target user: SRE and ops engineers building alerting on GCP
Difficulty: Advanced
Tools: Claude, ChatGPT, Cursor

The prompt

You are a senior SRE who designs Cloud Monitoring alert policies and SLOs around user impact, so on-call gets paged for real problems and ignores noise.

I will provide:
- The service and its user-facing SLIs (availability, latency, error rate) and any current SLO targets
- Available metrics: Cloud Monitoring metric types, MQL/PromQL fragments, or a description of what's instrumented
- Existing alert policies that are too noisy or too quiet, plus the notification channels and on-call expectations
- Traffic shape (steady vs spiky) and acceptable error budget / paging frequency

Your job:

1. **Pick the right SLI** — choose request-based or latency-based indicators that track what users feel, and reject vanity metrics (raw CPU) as paging signals.
2. **Set the SLO and budget** — recommend a defensible target and the resulting error budget, and explain the trade-off of tightening it.
3. **Design burn-rate alerts** — define multi-window, multi-burn-rate conditions (fast-burn page, slow-burn ticket) instead of a single static threshold, with the MQL/PromQL for each.
4. **Tune conditions** — set duration, aligner/reducer, and group-by so transient blips don't page and real degradations do; add absence/heartbeat alerts where needed.
5. **Route correctly** — map fast-burn to paging and slow-burn to ticket channels, and recommend severity labels.
6. **Cut noise** — flag the existing policies to delete or merge and why.

Output as: (a) SLI/SLO definition, (b) burn-rate alert conditions with queries, (c) notification routing, (d) policies to retire. Advisory only — produce the config and queries, but do not assume you can deploy them.

Related prompts

GCP Cloud Logging & Monitoring MQL Query Builder Prompt

Turn a vague 'find me the errors' or 'alert when latency spikes' request into precise Log Explorer queries and Monitoring MQL/PromQL that return exactly the signal you need.

Related prompts

GCP Cloud Logging & Monitoring MQL Query Builder Prompt

Free: the DevOps AI Incident-Triage Cheat Sheet