Home

AI for Incident Response

Faster RCAs, postmortems, runbooks, and on-call workflows powered by AI.

Prompts

Guides

Recommended tools

Claude

by Anthropic

4.8

The most cautious and context-aware AI assistant for infrastructure work.

Best for

Production troubleshooting, postmortems, IaC review

Pricing

Free tier; Pro $20/mo; Team & Enterprise tiers

Read review
Gemma

by Google DeepMind

4.4

Open-weights LLM family that runs locally — for air-gapped ops, on-prem inference, and privacy-sensitive infrastructure work.

Best for

Air-gapped incident response, on-prem log analysis, cost-controlled bulk processing

Pricing

Free — open weights under Gemma terms of use; commercial use permitted

Read review
Datadog Bits AI

by Datadog

4.2

An AI SRE inside Datadog — auto-investigates alerts, queries your telemetry in plain English, and accelerates incident triage.

Best for

Investigating alerts and incidents inside Datadog, natural-language queries across metrics/logs/traces

Pricing

Bundled with Datadog; AI features vary by plan. Datadog billed per host/usage (often expensive at scale)

Read review
PagerDuty SRE Agent

by PagerDuty

4.0

An agentic AI that triages incidents like an SRE — gathers context, runs diagnostics, drafts comms, and cuts on-call toil.

Best for

Automated incident triage, on-call toil reduction, and stakeholder-update drafting

Pricing

Part of PagerDuty's AI / Advance add-ons; enterprise pricing (contact sales)

Read review

AI for Incident Response

Prompts

Customer-Facing Incident Comms Writer Prompt

Firing Alert Severity & Escalation Decision Prompt

First-Alert Triage & Hypothesis Ranking Prompt

Incident Status Update for Stakeholders Prompt

Log-Driven Incident Timeline Builder Prompt

On-Call Shift Handoff Summary Builder Prompt

Post-Incident Follow-Up Action Items Extractor Prompt

Structured RCA & Causal Chain Builder Prompt

Targeted Rollback Plan Generator Prompt

Cache Stampede and Thundering-Herd Mitigation Prompt

Cloud API Quota and Throttling Incident Triage Prompt

Database Failover and Replication-Lag Decision Prompt

DNS Resolution Failure Live Diagnosis Prompt

Emergency Load-Shedding and Rate-Limit Config Prompt

Expired TLS Certificate Incident Triage Prompt

Is-This-Real Page Triage Prompt

Incident Alert-to-Owning-Team Router Prompt

Internal Tooling Outage Employee Comms Prompt

Incident Conference Bridge Noise Control Prompt

Incident Data Integrity Verification After Recovery Prompt

Incident Degraded-Mode Customer Tradeoff Prompt

Incident Deputy Commander Load-Sharing Prompt

Incident Go/No-Go Mitigation Decision Prompt

Incident Mid-Incident Scope Creep Control Prompt

Incident On-Call Fatigue Handoff During Prolonged Incidents Prompt

Incident Stand-Down and All-Clear Criteria Prompt

Incident Third-Party Status Triage Prompt

Incident Comms Approval and Sign-Off Workflow Prompt

Incident First-Responder Quickstart Card Prompt

Incident Glossary and Terminology Standardization Prompt

Follow-the-Sun On-Call Overlap Coverage Design Prompt

Incident Severity Misclassification Audit Prompt

Vendor SLA Accountability Review Prompt

Incident War-Room Situation Board Design Prompt

On-Call Compensation and Pay Policy Review Prompt

Customer Incident Comms Tone and Empathy Review Prompt

Incident Decision Log Rationale Capture Prompt

Live Incident Evidence Preservation Checklist Prompt

Incident Open-Loops and Follow-Up Tracker Prompt

Incident Merge and Deduplication Triage Prompt

Incident Pre-Mortem Failure Mode Brainstorm Prompt

Incident War-Game Injects and Curveball Designer Prompt

Runbook Prerequisite and Access Audit Prompt

Alert-Storm Correlation and Triage Prompt

Error-Budget Policy Enforcement Review Prompt

Escalation Policy Gap and Single-Point-of-Failure Analysis Prompt

Game-Day Hypothesis and Abort-Criteria Design Prompt

Incident Commander Training Simulator Prompt

On-Call Runbook Authoring Standard Prompt

SEV Downgrade and Incident Closure Criteria Prompt

War-Room Scribe and Live Timeline Capture Prompt

Incident Chat-Log Auto-Summarizer Prompt

Incident Drill Scoring Rubric Prompt

Live Incident Hypothesis Tracker Prompt

On-Call Shadow and Mentorship Program Prompt

PagerDuty Escalation Policy Config Generator Prompt

Recovery Smoke-Test Suite Generator Prompt

Runbook Dry-Run Validation Prompt

SLA Breach and Service-Credit Calculator Prompt

Incident Acknowledgment SLA Compliance Audit Prompt

Incident Detection Source Effectiveness Review Prompt

Incident Recovery Verification Checklist Prompt

Incident Stakeholder Communication Map Prompt

Incident Tooling Consolidation Audit Prompt

Incident Update Cadence Planner Prompt

Observability Gap Analysis From Incidents Prompt

Runbook Freshness and Decay Audit Prompt

Disaster Recovery Gameday and RTO Validation Prompt

Feature-Flag Kill-Switch and Fast-Mitigation Design Prompt

In-Incident Severity Re-Evaluation Prompt

Live Incident Log and Telemetry Correlation Assistant Prompt

Multi-Region Failover Decision Playbook Prompt

Near-Miss and Close-Call Capture Program Prompt

On-Call Schedule Fairness and Coverage Optimizer Prompt

Post-Incident SLO and Error-Budget Recalibration Prompt

Regulatory and Contractual Breach Notification Drafting Prompt

ChatOps Incident Automation Bot Workflow Prompt

Error Budget Burn-Rate Alert Design Prompt