Blast-Radius and Dependency Mapping Prompt
Turn a failing component plus its dependency graph into a scoped blast-radius map — what is affected, what is merely downstream, and what is safe — so the team scopes the incident in minutes instead of guessing at impact.
- Target user
- On-call SREs and incident commanders scoping impact
- Difficulty
- Intermediate
- Tools
- Claude, ChatGPT, Cursor
The prompt
You are a senior SRE who scopes incidents by reasoning about dependency graphs — what fails when X fails, what is merely correlated, and what is genuinely insulated. Help me scope this incident fast. Map impact; do not diagnose root cause and do not propose fixes. Inputs: - The failing or suspect component: [SERVICE / DB / QUEUE / REGION] - Its dependencies and dependents: [PASTE GRAPH / SERVICE LIST / ARCHITECTURE NOTES] - Failure mode observed: [HARD DOWN / DEGRADED / ELEVATED LATENCY / PARTIAL] - Known facts: [WHICH SERVICES ARE ALREADY CONFIRMED AFFECTED] Produce a blast-radius map: 1. **Direct blast radius** — list services that call the failing component synchronously and would degrade or fail directly. For each, the likely user-facing symptom. 2. **Indirect blast radius** — services one or more hops downstream that could degrade via timeouts, retries, or queue backpressure. Mark these as "likely, unconfirmed." 3. **Insulated** — services that depend on the component only asynchronously, behind a cache, or with a working fallback, and so are probably unaffected. State the assumption that makes each safe. 4. **Containment levers** — read-only checks to confirm which tier a service is actually in (a health endpoint, a dependency dashboard), and any existing circuit-breaker or fallback that may already be limiting spread. 5. **Scope statement** — one sentence the IC can paste into the channel: what is in scope, what is being ruled out, and what is still unconfirmed. Output format: three tiers (direct / indirect / insulated) as tables — service | predicted impact | confidence | confirm-with check. Then the one-line scope statement. Propose only; mark every unconfirmed impact as such; all checks read-only. The human confirms scope and decides containment.
Why this prompt works
Scoping is an underrated MTTR sink. Before a team can fix anything, it has to answer “who is affected and how bad” — and when that answer is fuzzy, responders either over-escalate a contained issue or miss services quietly failing two hops downstream. This prompt turns the dependency graph into a tiered blast-radius map so the incident commander can state scope with confidence in the opening minutes instead of assembling it from scattered dashboards.
The three-tier structure mirrors how experienced SREs actually reason about impact: synchronous callers fail directly, downstream services fail via timeouts and backpressure, and asynchronous or cached consumers are usually insulated. Forcing the model to place every dependent in one tier and state the assumption that justifies it makes the reasoning auditable. The team isn’t handed a verdict; it’s handed a structured map with each claim labeled by confidence and paired with a read-only check that confirms which tier the service truly occupies.
The guardrails defend against the two failure modes of impact prediction: over-scoping and false safety. A worst-case graph traversal can declare half the platform down when fallbacks are working, so confidence labels keep the IC grounded in evidence. Conversely, dynamic or undocumented dependencies can hide a real victim in the “insulated” tier, so the prompt treats that tier as “confirm before ruling out.” The model proposes the map; the human confirms scope and owns the containment call.
Related prompts
-
First-5-Minutes Triage Prompt
From the alert alone, decide severity, estimate blast radius, and route to the right owner in the opening minutes — so the incident lands with the people who can fix it instead of bouncing, cutting time-to-triage.
-
Parallel Investigation Planner Prompt
Split a live investigation across N responders into non-overlapping workstreams with clear owners and a sync point — so added hands shrink time-to-diagnose instead of duplicating each other's work.