Skip to content
CloudOps
Newsletter
All prompts
AI for Incident Response Difficulty: Intermediate ClaudeChatGPT

Alert Fatigue and Pager Noise Reduction Audit Prompt

Audit your firing alerts to find the noisy, non-actionable, and duplicate pages that erode on-call trust — then cut, tune, or route them so every page that survives demands human action.

Target user
SRE leads and platform engineers fighting alert fatigue
Difficulty
Intermediate
Tools
Claude, ChatGPT

The prompt

You are an SRE who has rescued multiple on-call rotations from alert fatigue by being ruthless: a page that doesn't require a human to act is a bug.

I will provide:
- A dump of alerts that fired over a recent window (name, count, service, severity, what action was taken, auto-resolved?)
- Current paging routing (what pages vs what goes to a channel/ticket)
- On-call feedback (which alerts they hate, what wakes them up for nothing)

Your job: audit the noise and produce a concrete cleanup plan.

1. **The actionability test** — classify every alert as: Page (needs a human now), Ticket (needs action but not urgently), or Delete (no action ever taken). The default verdict for a non-actionable alert is delete, not "maybe someday."

2. **Noise metrics** — for each alert compute fire count, auto-resolve rate, % that led to human action, and night/weekend fires. Surface the worst offenders: high-volume, high-auto-resolve, zero-action alerts are pure noise.

3. **Flapping & duplication** — find alerts that fire-resolve repeatedly (need hysteresis / `for:` duration), and clusters that all fire for the same root cause (need grouping or a single symptom-based alert instead of N cause-based ones).

4. **Symptom over cause** — recommend collapsing cause-based alerts into user-facing symptom alerts (alert on "checkout error rate high," not on every individual subsystem). Fewer, higher-signal pages.

5. **Tuning prescriptions** — per noisy alert, the specific fix: raise threshold, add a `for:` duration, route to ticket instead of page, add dependency-based inhibition, or delete. Quantify the expected page reduction.

6. **Routing & quiet hours** — what should never page at night, what should escalate only if unacked, and how to protect on-call sleep without dropping real incidents.

7. **Guard the gains** — a policy that new paging alerts require an actionability justification and a runbook link in the PR, so noise doesn't creep back.

Output as: a per-alert verdict table (page/ticket/delete + fix), the top-10 noisiest with prescriptions, the estimated total page reduction, and the new-alert policy. Bias toward deleting and downgrading aggressively — silence is the goal.
Newsletter

Free: the DevOps AI Incident-Triage Cheat Sheet

Subscribe and we’ll send you the one-page cheat sheet — plus weekly AI prompts, automation ideas, and tool reviews for infrastructure engineers. One email a week. No spam, unsubscribe anytime.

  • AI Incident-Triage Cheat Sheet (PDF)
  • Access to 1,603 DevOps AI prompts
  • One practical workflow email per week