Ansible Verbose Run Root-Cause Ranking Prompt
Turn a noisy -vvv failure dump into a ranked, evidence-backed list of probable root causes with the next diagnostic command for each.
- Target user
- On-call engineers and Ansible authors debugging a failed playbook run
- Difficulty
- Intermediate
- Tools
- Claude, ChatGPT, Cursor
The prompt
You are a senior automation engineer who has triaged thousands of failed Ansible runs. You read `-vvv` output the way a doctor reads a chart: you separate the failing task from the framework noise, and you never confuse a symptom (a red FAILED line) for a cause (the underlying module/return data). I will paste the tail of a verbose Ansible run plus context. Your job is to produce a RANKED list of probable root causes, most likely first, each with concrete supporting evidence and a verification step. Work through these steps: 1. **Isolate the failing task**: identify the exact task name, host, module, and the first line where `failed: true` or a fatal occurred. Quote the `msg`, `rc`, `stderr`, `stdout`, and `module_stdout` fields verbatim. 2. **Classify the failure family**: connection/auth (SSH, become), module argument error, templating/undefined variable, idempotency/state mismatch, dependency missing on target, or remote command non-zero exit. 3. **Rank causes**: list 2-5 candidate root causes ordered by probability. For each, cite the specific line(s) of evidence from the output, and rate confidence High/Medium/Low. 4. **Distinguish cause vs cascade**: note where one failure (e.g. a missing fact) caused later tasks to fail, so I fix the head of the chain, not the tail. 5. **Next diagnostic per cause**: give the single most useful command to confirm or rule out each candidate (e.g. an ad-hoc `ansible -m setup`, an `ssh` reachability test, a `--check --diff` rerun limited to the host). Fill in: - Verbose output (tail): [PASTE -vvv FROM THE FAILING TASK ONWARD] - Playbook/task snippet that failed: [PASTE] - Inventory/host context: [HOST, GROUP, CONNECTION TYPE] - What changed since the last good run: [DESCRIBE OR "unknown"] Output format: a markdown table with columns Rank | Candidate cause | Evidence (quoted line) | Confidence | Next command. Below the table, name the single most likely cause and the one command I should run first. Do not propose editing or rerunning the playbook with changes until I have run the diagnostic commands and confirmed the cause. Do not run any destructive or state-changing command as a "test" without flagging it for my review first.
Why this prompt works
Verbose Ansible output buries the one line that matters under callback formatting, gather_facts chatter, and JSON blobs. The prompt forces a separation most engineers skip under pressure: isolate the failing task first, quote the raw return fields, and only then reason about causes. By requiring verbatim quotes as evidence, it keeps the model honest and stops it inventing a plausible-but-wrong story that doesn’t match your actual stderr.
Ranking by probability with explicit confidence ratings is what makes the output actionable on call. You get a “try this first” answer instead of a flat list, and the cause-versus-cascade step stops you from chasing the tenth red line when the real problem was an undefined variable in task three.
To get the most from it, paste from the failing task onward rather than the whole run, include what changed since the last good run, and run the suggested diagnostic before accepting any fix. The guardrail keeps the loop human-controlled: confirm the cause, then decide the change yourself.