AI for Slack Difficulty: Advanced ClaudeChatGPT

Slack SLO Burn-Rate Notification Design Prompt

Design Slack notifications for SLO burn rates — multi-window multi-burn alerts, severity gradient, error-budget displays, ack workflow, and quiet-window suppression.

Target user: SREs operating SLOs and tuning the human signal of burn alerts
Difficulty: Advanced
Tools: Claude, ChatGPT

The prompt

You are a senior SRE who has implemented Google-style multi-window multi-burn-rate SLO alerts and tuned how they surface in Slack to drive action without alert fatigue.

I will provide:
- SLO definitions (services, SLIs, targets — e.g. 99.9%, 99.95%)
- SLO tooling (Sloth, Nobl9, Datadog SLOs, custom Prometheus rules)
- Existing alerting channels
- Pain points (too noisy, missed slow burns, no link to action)

Your job:

1. **Multi-window multi-burn-rate primer** — short explanation for the team:
   - **Fast burn** — short window (1h), high burn rate (14.4x) → page someone
   - **Slow burn** — longer window (6h or 24h), lower rate (3x or 1x) → notify + open a ticket
   - **Reset window** — also a long window (the previous one) to avoid false-positives

2. **Alert rules per SLO** — generate Prometheus alert YAML (or Sloth manifest) for one example service at 99.9% target:
   ```
   # 2% budget burned in 1h AND 5% in 5m → critical / page
   # 5% budget burned in 6h AND 10% in 30m → high / notify
   # 10% budget burned in 24h AND 20% in 2h → ticket
   ```

3. **Slack message design** per burn type:
   - **Critical (fast burn)** — red attachment, page-the-on-call buttons, link to:
     - Live SLO dashboard with the burn-rate panel pre-selected
     - Top-error-budget-consumers query
     - Service map / dependency view
     - Suggested runbook
   - **High (slow burn)** — orange, notify channel, link to a longer-window dashboard
   - **Ticket (very slow burn)** — yellow, auto-create a ticket assigned to service owner with budget context

4. **Error-budget display** — every burn message includes a visual: "Budget remaining: 47% (was 89% 24h ago)" with a small bar chart (Block Kit element_mrkdwn with unicode bar or image). Show the bar.

5. **Anti-spam rules**:
   - Group alerts by SLO; one message per SLO per window
   - Resolve when burn rate drops below threshold for sustained period
   - Don't fire fast-burn during a known maintenance window
   - Quiet hours for slow-burn (don't ticket at 3am for a 24h-window burn)

6. **Acknowledgement workflow** — Block Kit buttons:
   - **Acknowledge** — marks the burn as triaged; suppresses re-notifications for N min
   - **Open Investigation** — creates an incident channel
   - **Silence 1h** — creates Alertmanager silence with the user's id

7. **Per-team customization** — some teams want all burns paged, others only fast burns. Default + override pattern.

8. **Validation** — measure: % fast-burn alerts that result in action, MTTA / MTTR per burn type, error budget recovery rate, false-positive rate.

Output as: (a) multi-window alert YAML for one example SLO, (b) Slack message Block Kit JSON per burn level, (c) anti-spam policy, (d) acknowledgement workflow, (e) per-team config schema, (f) validation metrics dashboard.

Bias toward: distinguishing actionable burn from informational burn, every notification tells the on-call what to do RIGHT NOW.

Free: the DevOps AI Incident-Triage Cheat Sheet