Faster Diagnosis: Ranked, Verify-First Hypotheses With AI

Twenty minutes into a SEV2, the channel had quietly agreed it was “the database again.” Nobody had checked. Someone said it early, it sounded plausible, and the whole team started chasing connection pools while the actual cause — a misconfigured retry storm hammering an upstream — sat untouched. We lost half an hour to a guess that anchored everyone. Diagnosis is usually the fattest slice of MTTR, and the biggest reason it drags isn’t a lack of ideas. It’s that the first idea, spoken with confidence, freezes everyone onto one track.

AI can help here, but only if you use it to broaden the hypothesis space instead of narrowing it prematurely. Used wrong, it just adds an authoritative-sounding voice to the anchor. Used right, it hands you a ranked set of candidates, each tied to a check you can run.

Anchoring is the diagnosis killer

The cognitive trap is well known: the first plausible explanation becomes the lens everyone looks through, and contradicting evidence gets explained away. Under incident pressure, with five people watching, this happens fast and hard. The fix isn’t “think harder.” It’s structural: force multiple hypotheses onto the table at once, before anyone commits, so no single guess gets to dominate. This is a recurring theme across the MTTR funnel — speed comes from the right structure, not heroics.

That’s a job AI is genuinely good at. A model reading your symptoms doesn’t get emotionally invested in being right the way a human does. Ask it for several ranked candidates and you’ve replaced one anchored guess with a portfolio.

Ask for a ranked differential, not an answer

The framing of the prompt is everything. If you ask “what’s wrong?”, you get one confident answer and you’ve recreated the anchoring problem with extra steps. Ask for a differential diagnosis — ranked candidates, each with the single check that would confirm or kill it.

You are assisting incident diagnosis. Given these symptoms — error-rate trend, latency percentiles, recent deploys, dependency health, and the relevant log excerpt — produce a ranked list of 4–6 candidate root causes. For each: a one-line hypothesis, your confidence (high/med/low), the single fastest check to confirm or rule it out (as a command or query), and what evidence would kill it. Rank by likelihood given the data. Do not pick a winner. Do not recommend a fix. Flag any hypothesis that the data can neither support nor refute.

Notice what that demands: a check for every hypothesis, and an explicit kill condition. That turns a vibe into an experiment. The output looks like this:

Upstream retry storm (high) — Check: sum(rate(http_client_retries_total[1m])) by (target). Killed if retry rate is flat.

DB connection exhaustion (med) — Check: pg_stat_activity count vs max_connections. Killed if active connections are well under the cap.

Bad deploy (med) — Check: kubectl rollout history; correlate error onset to rollout time. Killed if errors predate the deploy.

Noisy-neighbor saturation (low) — Check: node CPU throttling on the affected pods. Killed if throttle metric is zero.

Now the team has four threads to pull, ranked, each with a falsifiable check — and crucially, “the database” is now #2 with a one-command kill test instead of an unexamined consensus.

Run the cheap kills first

The discipline that makes this fast: run the cheapest, most decisive checks first, regardless of rank. A low-confidence hypothesis you can kill in one command is worth checking before a high-confidence one that takes ten minutes to investigate — eliminating it shrinks the space immediately.

# Kill check for the retry-storm hypothesis (#1)
curl -s "http://prom:9090/api/v1/query?query=\
sum%20by(target)(rate(http_client_retries_total[1m]))" \
  | jq -r '.data.result[] | "\(.metric.target): \(.value[1])"'

# Kill check for the connection-exhaustion hypothesis (#2)
kubectl exec -n payments deploy/payments -- \
  psql -tc "select count(*) from pg_stat_activity;"

# Kill check for the bad-deploy hypothesis (#3)
kubectl rollout history deploy/payments -n payments | tail -4

In the incident that opened this post, the retry-storm check came back screaming — auth-service retry rate up 40x — within ninety seconds. If we’d had the ranked differential at minute two instead of the anchored guess at minute two, we’d have saved twenty-five minutes.

Verify-first means the AI never wins by assertion

The non-negotiable rule: a hypothesis is not “the cause” until a human runs its check and sees the evidence. The model’s ranking is a search order, not a verdict. I’ve seen the #1 candidate be wrong and the #4 be right — the value was never in the ranking being correct, it was in having four falsifiable threads instead of one anchored one.

This is what “verify-first” buys you against the failure mode of AI in incidents: confident-sounding wrong answers. Because every hypothesis ships with a kill condition, a wrong AI guess gets eliminated by your own query in seconds rather than sending the team down a rabbit hole. The structure is self-correcting precisely because nothing is taken on the model’s word.

A few rules I hold to:

No fix suggestions during diagnosis. Keep the model on hypotheses and checks. Mixing in “and here’s how to fix it” reintroduces anchoring on a cause you haven’t confirmed.
Re-run the differential when a check surprises you. New evidence should regenerate the list, not get bolted onto the old one.
Keep the kept and killed hypotheses visible in the channel. The list of what you’ve ruled out is as valuable as the live threads, and it survives handoffs.

You can practice this loop on the free incident assistant — paste real symptoms and ask for the ranked differential, then notice how having checks attached changes how you investigate. The prompt library has a hardened version of the differential prompt with the kill-condition framing baked in.

The fastest path through diagnosis isn’t a smarter guess. It’s refusing to make one guess. AI is uniquely suited to laying several candidates on the table at once, each with a way to be proven wrong — and that’s how you keep the team moving without anchoring it on the first thing that sounded right.

Anchoring is the diagnosis killer

Ask for a ranked differential, not an answer

Run the cheap kills first

Verify-first means the AI never wins by assertion

Download the Free 500-Prompt DevOps AI Toolkit