Slack SLO Burn-Rate Notification Design Prompt
Design Slack notifications for SLO burn rates — multi-window multi-burn alerts, severity gradient, error-budget displays, ack workflow, and quiet-window suppression.
- Target user
- SREs operating SLOs and tuning the human signal of burn alerts
- Difficulty
- Advanced
- Tools
- Claude, ChatGPT
The prompt
You are a senior SRE who has implemented Google-style multi-window multi-burn-rate SLO alerts and tuned how they surface in Slack to drive action without alert fatigue.
I will provide:
- SLO definitions (services, SLIs, targets — e.g. 99.9%, 99.95%)
- SLO tooling (Sloth, Nobl9, Datadog SLOs, custom Prometheus rules)
- Existing alerting channels
- Pain points (too noisy, missed slow burns, no link to action)
Your job:
1. **Multi-window multi-burn-rate primer** — short explanation for the team:
- **Fast burn** — short window (1h), high burn rate (14.4x) → page someone
- **Slow burn** — longer window (6h or 24h), lower rate (3x or 1x) → notify + open a ticket
- **Reset window** — also a long window (the previous one) to avoid false-positives
2. **Alert rules per SLO** — generate Prometheus alert YAML (or Sloth manifest) for one example service at 99.9% target:
```
# 2% budget burned in 1h AND 5% in 5m → critical / page
# 5% budget burned in 6h AND 10% in 30m → high / notify
# 10% budget burned in 24h AND 20% in 2h → ticket
```
3. **Slack message design** per burn type:
- **Critical (fast burn)** — red attachment, page-the-on-call buttons, link to:
- Live SLO dashboard with the burn-rate panel pre-selected
- Top-error-budget-consumers query
- Service map / dependency view
- Suggested runbook
- **High (slow burn)** — orange, notify channel, link to a longer-window dashboard
- **Ticket (very slow burn)** — yellow, auto-create a ticket assigned to service owner with budget context
4. **Error-budget display** — every burn message includes a visual: "Budget remaining: 47% (was 89% 24h ago)" with a small bar chart (Block Kit element_mrkdwn with unicode bar or image). Show the bar.
5. **Anti-spam rules**:
- Group alerts by SLO; one message per SLO per window
- Resolve when burn rate drops below threshold for sustained period
- Don't fire fast-burn during a known maintenance window
- Quiet hours for slow-burn (don't ticket at 3am for a 24h-window burn)
6. **Acknowledgement workflow** — Block Kit buttons:
- **Acknowledge** — marks the burn as triaged; suppresses re-notifications for N min
- **Open Investigation** — creates an incident channel
- **Silence 1h** — creates Alertmanager silence with the user's id
7. **Per-team customization** — some teams want all burns paged, others only fast burns. Default + override pattern.
8. **Validation** — measure: % fast-burn alerts that result in action, MTTA / MTTR per burn type, error budget recovery rate, false-positive rate.
Output as: (a) multi-window alert YAML for one example SLO, (b) Slack message Block Kit JSON per burn level, (c) anti-spam policy, (d) acknowledgement workflow, (e) per-team config schema, (f) validation metrics dashboard.
Bias toward: distinguishing actionable burn from informational burn, every notification tells the on-call what to do RIGHT NOW.