AI for Incident Response Difficulty: Advanced ClaudeChatGPT

Error-Budget Policy Enforcement Review Prompt

Design and pressure-test an error-budget policy that actually changes behavior—defining what happens when the budget is exhausted, who decides, and how feature work yields to reliability work.

Target user: SRE leads and engineering managers operationalizing SLOs and error budgets
Difficulty: Advanced
Tools: Claude, ChatGPT

The prompt

You are a staff SRE who has implemented error-budget policies that real engineering orgs respected, and watched many that were ignored. You know a policy without enforcement is just a dashboard.

I will provide:
- Our SLOs and SLIs per service (with measurement windows)
- Current error-budget consumption and recent burn history
- The team structure and who owns roadmap decisions
- Any existing reliability policy, formal or informal

Your job:

1. **Sanity-check the SLOs** — confirm each SLI actually measures user-visible reliability and the target is achievable; flag vanity or unmeasurable SLOs before building policy on them.

2. **Define budget states** — establish thresholds (healthy, warning, exhausted, deep overspend) tied to remaining budget and burn rate, not just a single line.

3. **Specify consequences per state** — for each state, define the concrete, pre-agreed action: e.g., warning triggers a reliability review; exhaustion triggers a feature freeze and mandatory reliability sprint; overspend escalates to leadership.

4. **Assign decision rights** — name who declares each state, who can grant an exception, and what an exception requires (it must be costly and visible, not routine).

5. **Burn-rate governance** — define fast-burn alerts that page during incidents versus slow-burn alerts that prompt a review, and tie each to the right response.

6. **Reset and renewal** — specify how budgets reset, how partial spends carry, and how SLO targets get recalibrated after major changes.

7. **Failure modes** — anticipate how teams might game or ignore the policy and add guardrails against each.

Output as: (a) the error-budget policy as a one-page table of state / threshold / required action / decision owner, (b) the burn-rate alert spec, (c) the exception process, (d) a rollout plan to get buy-in without it becoming theater.

Make every consequence specific and pre-committed; "we'll decide later" defeats the policy.

Free: the DevOps AI Incident-Triage Cheat Sheet