Deploy Correlation: Find the Suspect Change Prompt

Cross-reference incident onset against the deploy, config, and feature-flag changes in the window to produce a ranked list of suspect changes with a fast confirm-or-clear check for each — shrinking the time spent asking 'what changed?'

Target user

On-call SREs and release engineers mid-incident

Difficulty

Intermediate

Tools

Claude, ChatGPT, Cursor

You are a senior SRE who treats "what changed right before this started" as the first lead in most incidents. Help me correlate onset with changes. Rank suspect changes and give a check for each; do not declare the cause and do not roll anything back. Inputs: - Incident onset: [EARLIEST CONFIRMED DEVIATION TIMESTAMP] - Affected scope: [SERVICES / REGIONS / TENANTS] - Symptom: [ERROR TYPE / LATENCY / SATURATION] - Change log in the window: [DEPLOYS / CONFIG PUSHES / FLAG FLIPS / INFRA CHANGES, WITH TIMESTAMPS AND OWNERS] Produce a change-correlation analysis: 1. **Time-align changes to onset** — list changes whose timestamp plausibly precedes the deviation, closest-first. Drop changes that landed clearly after onset and say why each was dropped. 2. **Score plausibility** — for each surviving change, rate how plausibly it could produce THIS symptom in THIS scope (a frontend flag is unlikely to spike database CPU). Tie the score to the mechanism, not just to recency. 3. **Rank suspects** — order by combined timing + mechanism plausibility, with a one-line rationale each. 4. **Confirm-or-clear check** — for the top suspects, the single read-only check that ties the change to the symptom (diff the rollout, check the flag's exposure %, compare error onset to canary start). State what result clears the change. 5. **Note the blind spot** — call out any change category your inputs probably don't include (silent config drift, dependency's deploy) that could still be the culprit. Output format: ranked table — change | owner | time vs. onset | mechanism plausibility | confirm-or-clear check. Then "investigate this change first." Propose and rank only; correlation is not causation; do not recommend a rollback. The human confirms and decides.

Why this prompt works

“What changed?” is the highest-yield question in incident response, because most incidents are self-inflicted by a recent change — yet the time spent answering it is often wasted on manual cross-referencing of deploy logs, flag dashboards, and chat history. This prompt compresses that cross-reference into a ranked list of suspect changes, each time-aligned to the confirmed onset and scored for whether it could plausibly cause this specific symptom, so the team points its first verification at the likeliest culprit.

The key discipline is refusing to rank on recency alone. The change that landed thirty seconds before the alert is the obvious suspect and is frequently innocent — a coincidental config push, an unrelated deploy. By forcing a mechanism-plausibility score that asks whether this change could produce this symptom in this scope, the prompt filters out blameless changes that merely happened to be recent, which is exactly where naive change correlation wastes time and triggers needless rollbacks.

The guardrails hold the line between correlation and causation. The prompt attaches a confirm-or-clear check to each suspect so the team validates the link before acting, refuses to recommend a rollback, and explicitly names the change categories the input probably omits — a dependency team’s deploy, silent drift, a vendor change. That last step matters: an empty or weak suspect list is framed as “you may be missing changes,” not “nothing changed.” The model proposes the ranking; the human confirms the link and owns the rollback decision.

Deploy Correlation: Find the Suspect Change Prompt

Why this prompt works

Related prompts

Diagnosis Accelerator: Verify-First Hypotheses Prompt

Have We Seen This Before? Symptom-Match Prompt

Why this prompt works

Related prompts

Diagnosis Accelerator: Verify-First Hypotheses Prompt

Have We Seen This Before? Symptom-Match Prompt

Free: the DevOps AI Incident-Triage Cheat Sheet