Incident Alert-to-Owning-Team Router Prompt
Take a freshly fired alert and route it to the team that actually owns the failing component, so the right responder is paged first instead of bouncing through three on-call rotations.
- Target user
- On-call engineers and incident responders triaging which team owns a misrouted page
- Difficulty
- Beginner
- Tools
- Claude, ChatGPT
The prompt
You are a seasoned SRE who knows the most expensive minutes of an incident are often the ones spent paging the wrong team. A page lands on the platform on-call that's really a payments-service bug; twenty minutes evaporate in handoffs before the right person even sees it. I will paste the alert (name, service, labels, error, dashboard link) plus our ownership references if available: a service-catalog/ownership map, team boundaries, and the on-call directory. Your job: 1. **Extract ownership signals** — pull the service name, namespace, repo, label, or component from the alert that indicate ownership, and map them against the catalog I provided. 2. **Name the most-likely owning team** — state which team owns the failing component and your confidence, with the specific signal that points there. 3. **Handle ambiguity honestly** — if the alert spans a boundary (e.g., a shared gateway in front of a team's service), say so and name the two candidate teams and how to disambiguate fast (one query or dashboard). 4. **Avoid the hot-potato trap** — if the symptom is downstream of another team's failing dependency, name the upstream owner as the true target so we don't ping-pong the page. 5. **Page recommendation** — recommend who to page first and who to pull in as secondary, and the one-line context to give them so they start with the symptom, not a cold "can you look at this." 6. **Escalation fallback** — if ownership genuinely can't be determined from the evidence, recommend the escalation path (e.g., incident commander or a catch-all rotation) rather than guessing and paging the wrong team. Output as: (a) the owning team with confidence and the signal, (b) any ambiguity and the disambiguating check, (c) the page-first recommendation with one-line context, (d) the fallback if ownership is unknown. Propose; the human pages. Never page a team on a low-confidence guess — when unsure, route to the documented escalation path instead and say why.
Why this prompt works
The slowest part of many incidents isn’t diagnosis — it’s getting the alert in front of the person who can fix it. Pages misroute constantly because alerts fire on the symptom’s service, not the cause’s owner, and a tired on-call passes the page along instead of solving the ownership question. This prompt automates the lookup that turns “not my service” into “page the payments team, here’s why,” compressing the handoff window that quietly inflates MTTA on cross-team incidents.
What makes it more than a label lookup is the hot-potato and ambiguity handling. The prompt explicitly asks the model to trace downstream symptoms back to the upstream owner and to name two candidate teams with a disambiguating check when the boundary is genuinely unclear. That reflects how real ownership disputes happen — at shared gateways and inherited services — rather than pretending every alert maps cleanly to one team.
The guardrails keep the routing honest. A confidently wrong team assignment wastes more time than an honest “I can’t tell, escalate,” so the prompt forbids low-confidence pages and routes ambiguity to the documented escalation path. The AI proposes the route and the human pulls the trigger, which is exactly right for a step where being wrong is costly and being fast is only valuable if you’re also correct.
Related prompts
-
Escalation Policy Gap and Single-Point-of-Failure Analysis Prompt
Audit your existing escalation policies and on-call schedules to find coverage gaps, dead-ends, and single points of failure where a page could go unanswered during a real incident.
-
Service Dependency and Blast Radius Mapping Prompt
Map a service's upstream and downstream dependencies, identify single points of failure and shared-fate risks, and estimate the blast radius of each failure so the team can prioritize resilience work.