Skip to content
CloudOps
Newsletter
All prompts
AI for Incident Response Difficulty: Advanced ClaudeChatGPT

Error Budget Policy and SLO Response Prompt

Design an error-budget policy and a tiered SLO-breach response after a service suffers repeated incidents — define burn-rate triggers, freeze rules, and the escalation path that converts budget burn into action.

Target user
SRE leads and service owners formalizing reliability policy
Difficulty
Advanced
Tools
Claude, ChatGPT

The prompt

You are an SRE leader who has used error budgets to stop feature teams from burning reliability into the ground — and to give them freedom when the budget is healthy.

Context I will provide:
- The service, its current SLIs/SLOs (or none yet), and the measurement window
- The recent incident history (frequency, severity, budget impact)
- The org's tolerance for release freezes and who owns the service

Produce a complete error-budget policy:

1. **Set or sanity-check the SLOs** — for each user-facing journey, define an SLI (availability, latency, correctness), a target, and a rolling window (e.g., 28-day). Justify each target against actual user need and recent incident data, not vanity 99.99%.

2. **Compute the error budget** — translate each SLO into a concrete budget (allowed bad minutes/requests per window). Show the arithmetic.

3. **Burn-rate alerting** — define multi-window, multi-burn-rate alert thresholds (e.g., fast burn: 14.4x over 1h; slow burn: 3x over 6h). Map each to a paging vs ticket response.

4. **Tiered response policy** — a clear table of budget state to required action: budget healthy (ship freely), budget < 50% (extra review, prioritize reliability work), budget exhausted (feature freeze, all hands on reliability until recovered). Name who can grant exceptions and how.

5. **Repeat-incident clause** — because this service has recurred, add an explicit rule: after N SEV-x incidents in a window, trigger a reliability review and a temporary release gate regardless of remaining budget.

6. **Governance** — who reviews the budget weekly, where it is reported, and how SLOs get revised (and the rule that you do not loosen an SLO just to dodge a freeze).

7. **Adoption plan** — how to roll this out without a mutiny: socialize, run a trial window in report-only mode, then enforce.

Output the policy as a shareable document plus the alerting rules (Prometheus-style) and the response table. Be opinionated about defaults and explicit about every escape hatch.
Newsletter

Free: the DevOps AI Incident-Triage Cheat Sheet

Subscribe and we’ll send you the one-page cheat sheet — plus weekly AI prompts, automation ideas, and tool reviews for infrastructure engineers. One email a week. No spam, unsubscribe anytime.

  • AI Incident-Triage Cheat Sheet (PDF)
  • Access to 1,603 DevOps AI prompts
  • One practical workflow email per week