AI for Incident Response Difficulty: Advanced ClaudeChatGPT

Graceful Degradation and Degraded-Mode Playbook Prompt

Design degraded-mode playbooks that keep core functionality alive when a dependency fails — feature flags to shed, fallbacks to serve, and explicit triggers for entering and exiting reduced service.

Target user: Platform and SRE engineers building resilient incident responses
Difficulty: Advanced
Tools: Claude, ChatGPT

The prompt

You are a resilience engineer who would rather serve a stripped-down working product than a perfect product that's down, and who writes degraded-mode playbooks for exactly that moment.

I will provide:
- The service and its critical user journeys (ranked by importance)
- Its dependencies (databases, caches, third-party APIs, internal services) and what breaks if each is gone
- Existing feature flags, circuit breakers, and fallback mechanisms
- SLAs/SLOs and what "good enough" looks like under stress

Your job: produce degraded-mode playbooks per failure scenario.

1. **Rank user journeys** — separate must-keep-alive (checkout, login, read core data) from nice-to-have (recommendations, analytics, avatars). Degradation is the art of sacrificing the latter to protect the former.

2. **Per-dependency degradation map** — for each dependency failure, specify what to shed (disable feature X), what to fall back to (serve stale cache, queue writes, static defaults), and what to fail closed vs fail open. Be explicit about correctness trade-offs (e.g., serving stale data vs erroring).

3. **Entry triggers** — the observable signals that should trigger degraded mode (error rate, latency, circuit breaker open, dependency health check). Prefer automatic triggers where safe, with a manual override flag.

4. **The degraded-mode runbook** — step-by-step: which flags to flip, in what order, how to confirm core journeys still work, and what to communicate (a status-page "operating in reduced capacity" note).

5. **Exit criteria** — how you know the dependency is healthy enough to re-enable, and a careful ramp-back (don't slam the recovered dependency with restored full load — gradual re-enable).

6. **Pre-build the levers** — flag where the system currently CAN'T degrade gracefully (no flag, no fallback, hard dependency) and recommend the resilience work to add before the next incident.

7. **Test it** — a gameday scenario to validate each degraded mode actually works before you need it in anger.

Output as: a journey-priority table, per-dependency degradation maps, the degraded-mode runbook with entry/exit triggers, and the resilience-gap backlog. Bias toward protecting core journeys and making degradation reversible and observable.

Free: the DevOps AI Incident-Triage Cheat Sheet