PagerDuty SRE Agent
by PagerDuty 4.0 / 5An agentic AI that triages incidents like an SRE — gathers context, runs diagnostics, drafts comms, and cuts on-call toil.
- Best for
- Automated incident triage, on-call toil reduction, and stakeholder-update drafting
- Pricing
- Part of PagerDuty's AI / Advance add-ons; enterprise pricing (contact sales)
- Vendor
- PagerDuty
Pros
- Agentic triage — gathers context and runs diagnostics on incoming incidents before a human is fully online
- Drafts incident updates and stakeholder comms, the time-sink during a live incident
- Ties into PagerDuty Automation / Runbook Automation for safe, approval-gated remediation
- Sits on the alerting and on-call workflow many teams already run
- Reduces 3am cognitive load and repetitive triage steps
Cons
- PagerDuty-ecosystem dependent — value is bounded by the runbooks and automation you've wired in
- Any remediation needs guardrails and human approval; don't hand it the keys unscoped
- Enterprise pricing — not aimed at small teams or hobby setups
- Agentic features are still maturing; quality depends heavily on your data and integrations
- Not a diagnosis tool outside the PagerDuty/automation context
PagerDuty already owns the moment an alert becomes an incident for a lot of teams. The SRE Agent extends that from routing the incident to working it — gathering context and starting the triage an on-call engineer would, automatically.
What sets it apart
It’s agentic and it’s wired into the response workflow you already run. When an incident triggers, the agent can pull together the relevant context, run diagnostics through PagerDuty Automation / Runbook Automation, correlate the signals, and draft the first stakeholder update — so the human who joins the bridge starts ahead of cold, not behind. The comms drafting alone is meaningful: writing the “what’s happening, impact, next steps” update is exactly the work that distracts a responder from actually fixing the problem.
Because remediation runs through PagerDuty Automation, you get the agent’s speed with the approval gates and scoping you’ve defined — rather than an LLM improvising against production.
Where it shines for DevOps
- First-response triage — context-gathering and diagnostics on incoming incidents.
- Stakeholder comms — drafting status-page and exec updates so a human just edits and sends.
- Toil reduction — the repetitive, every-incident steps an SRE shouldn’t be doing by hand.
- Runbook execution — safe, approval-gated automation rather than ad-hoc commands.
Where to be careful
- The same boundary from our incident-response guides: AI gathers, synthesizes, and drafts; a human owns the decision to mitigate or roll back. Keep remediation approval-gated and blast-radius-scoped.
- Its usefulness is a function of what you’ve built — the runbooks, the automation actions, the data integrations. A thin automation library means a thin agent.
- It’s not a general diagnosis tool. For the open-ended “what’s actually broken?” reasoning, pair it with a model like Claude or the free AI Incident Response Assistant.
How to get the most out of it
- Invest in your automation and runbooks first — the agent amplifies what’s already wired in.
- Let it draft comms and run read-only diagnostics freely; require human approval for anything that mutates state.
- Use its summaries as the seed for your postmortem, then add the human judgment that an AI can’t.
Pricing notes
The SRE Agent is part of PagerDuty’s AI / Advance capabilities and is priced for the enterprise — there’s no self-serve cheap tier here. If you already run PagerDuty as your incident backbone and your on-call toil is real, it’s a credible way to compress triage time; if you’re a small team, the free incident tooling plus solid runbooks gets you most of the value at none of the cost.