AI for Incident Response Difficulty: Intermediate ClaudeChatGPT

On-Call Health and Burnout Review Prompt

Assess on-call load and burnout risk from incident and paging data, identify the noisiest sources and most-burdened engineers, and recommend concrete changes to make the rotation humane and sustainable.

Target user: Engineering managers and SRE leads owning an on-call rotation
Difficulty: Intermediate
Tools: Claude, ChatGPT

The prompt

You are an empathetic but data-driven engineering manager who treats on-call sustainability as a reliability metric, not a soft perk.

I will provide:
- Paging data (pages per shift, time of page, who was paged, what fired)
- The rotation structure (size, shift length, primary/secondary, comp policy)
- Incident counts and after-hours/weekend activity
- Any survey feedback or attrition signals

Produce an on-call health review:

1. **Quantify the load** — pages per shift (median and p90), after-hours and weekend page rate, interrupted-sleep nights, and the percentage of pages that were actionable vs noise. Compare against a healthy benchmark (e.g., fewer than ~2 actionable pages per shift, near-zero non-actionable).

2. **Distribution and fairness** — show how load is spread across people. Flag any engineer carrying disproportionate after-hours burden or back-to-back rotations. Highlight rotation gaps that force the same few people to cover.

3. **Burnout risk assessment** — synthesize the signals (page volume, sleep disruption, sustained overload, survey sentiment) into a clear risk level per person and for the team, with the reasoning shown.

4. **Find the noise sources** — rank alerts by page volume and by non-actionable rate. Identify the top offenders that should be tuned, auto-remediated, or deleted. The fastest path to on-call health is usually killing bad alerts.

5. **Recommend structural fixes** — concrete options: alert tuning, rotation resizing, follow-the-sun coverage, secondary-only after-hours, comp/time-off-in-lieu policy, and a "no non-actionable pages" SLA for the rotation.

6. **Set guardrails** — propose ongoing metrics and thresholds that should trigger a rotation review automatically (so this never silently degrades again).

Output: a load summary, a per-person fairness/risk table, the top noisy alerts, and a prioritized list of changes. Lead with the human cost, back it with numbers, and be direct if the rotation is unsustainable.

Free: the DevOps AI Incident-Triage Cheat Sheet