Deploy Correlation: Find the Suspect Change Prompt
Cross-reference incident onset against the deploy, config, and feature-flag changes in the window to produce a ranked list of suspect changes with a fast confirm-or-clear check for each — shrinking the time spent asking 'what changed?'
- Target user
- On-call SREs and release engineers mid-incident
- Difficulty
- Intermediate
- Tools
- Claude, ChatGPT, Cursor
The prompt
You are a senior SRE who treats "what changed right before this started" as the first lead in most incidents. Help me correlate onset with changes. Rank suspect changes and give a check for each; do not declare the cause and do not roll anything back. Inputs: - Incident onset: [EARLIEST CONFIRMED DEVIATION TIMESTAMP] - Affected scope: [SERVICES / REGIONS / TENANTS] - Symptom: [ERROR TYPE / LATENCY / SATURATION] - Change log in the window: [DEPLOYS / CONFIG PUSHES / FLAG FLIPS / INFRA CHANGES, WITH TIMESTAMPS AND OWNERS] Produce a change-correlation analysis: 1. **Time-align changes to onset** — list changes whose timestamp plausibly precedes the deviation, closest-first. Drop changes that landed clearly after onset and say why each was dropped. 2. **Score plausibility** — for each surviving change, rate how plausibly it could produce THIS symptom in THIS scope (a frontend flag is unlikely to spike database CPU). Tie the score to the mechanism, not just to recency. 3. **Rank suspects** — order by combined timing + mechanism plausibility, with a one-line rationale each. 4. **Confirm-or-clear check** — for the top suspects, the single read-only check that ties the change to the symptom (diff the rollout, check the flag's exposure %, compare error onset to canary start). State what result clears the change. 5. **Note the blind spot** — call out any change category your inputs probably don't include (silent config drift, dependency's deploy) that could still be the culprit. Output format: ranked table — change | owner | time vs. onset | mechanism plausibility | confirm-or-clear check. Then "investigate this change first." Propose and rank only; correlation is not causation; do not recommend a rollback. The human confirms and decides.
Why this prompt works
“What changed?” is the highest-yield question in incident response, because most incidents are self-inflicted by a recent change — yet the time spent answering it is often wasted on manual cross-referencing of deploy logs, flag dashboards, and chat history. This prompt compresses that cross-reference into a ranked list of suspect changes, each time-aligned to the confirmed onset and scored for whether it could plausibly cause this specific symptom, so the team points its first verification at the likeliest culprit.
The key discipline is refusing to rank on recency alone. The change that landed thirty seconds before the alert is the obvious suspect and is frequently innocent — a coincidental config push, an unrelated deploy. By forcing a mechanism-plausibility score that asks whether this change could produce this symptom in this scope, the prompt filters out blameless changes that merely happened to be recent, which is exactly where naive change correlation wastes time and triggers needless rollbacks.
The guardrails hold the line between correlation and causation. The prompt attaches a confirm-or-clear check to each suspect so the team validates the link before acting, refuses to recommend a rollback, and explicitly names the change categories the input probably omits — a dependency team’s deploy, silent drift, a vendor change. That last step matters: an empty or weak suspect list is framed as “you may be missing changes,” not “nothing changed.” The model proposes the ranking; the human confirms the link and owns the rollback decision.
Related prompts
-
Diagnosis Accelerator: Verify-First Hypotheses Prompt
Turn the opening burst of telemetry into a short, ranked list of diagnoses — each paired with a single command to confirm or kill it — so the team tests the likeliest cause first and shortens time-to-diagnose.
-
Have We Seen This Before? Symptom-Match Prompt
Match the current symptom signature against your own past incidents and their fixes — fast — so a recurrence is resolved from prior knowledge instead of diagnosed from scratch, collapsing time-to-diagnose.