Reviewing Ansible Check and Diff Dry Runs With AI Before

Last quarter I almost rebooted forty hypervisors because I trusted a green --check run. The playbook reported changed=0 against the OpenStack compute nodes, so I cleared the change ticket and moved on. What I missed was a command: task buried in a role that templated out a new sysctl config and then ran sysctl -p — neither of which --check had any idea about, because command and shell are black boxes to check mode. The dry run lied, and it lied confidently. The only reason prod survived is that a colleague eyeballed the actual diff on the templated file and caught it before the maintenance window.

That is the whole problem with dry runs: ansible-playbook --check --diff is the single most useful pre-prod safety tool Ansible gives you, and it is also full of quiet exceptions that will burn you if you read the summary line instead of the diff. AI is genuinely good at the part humans skim — decoding a wall of --diff output and telling you in plain English what will change. But it only helps if you understand what the dry run can and cannot see. Let me walk through how I actually run this now.

Run the dry run, then actually read it

The baseline command is boring and you should run it every single time before a prod apply:

ansible-playbook -i inventories/prod site.yml \
  --check --diff \
  --limit compute-nodes \
  --tags sysctl,kernel

--check puts Ansible in “predict, don’t change” mode. --diff makes file-touching modules show you the before/after. The summary at the bottom — changed=3 unreachable=0 failed=0 — is the part everyone reads and the part you should trust the least. A changed count tells you how many tasks think they would change something. It tells you nothing about whether that prediction is accurate, and nothing about the blast radius of any single change.

Read the diffs. All of them. A real one looks like this:

TASK [hardening : deploy sshd config] ******************************
--- before: /etc/ssh/sshd_config
+++ after: /etc/ssh/sshd_config
@@ -22,7 +22,7 @@
-PermitRootLogin yes
+PermitRootLogin no
-PasswordAuthentication yes
+PasswordAuthentication no
changed: [compute-07]

That is the good case: template and copy fully support check mode and produce an honest diff. If your whole playbook is templates, files, and well-behaved modules, the dry run is trustworthy. The trouble starts when it isn’t.

Which modules tell the truth, and which lie

Here is the distinction that matters more than any other in this post. Modules fall into three buckets:

Honest in check mode. template, copy, file, lineinfile, blockinfile, user, group, most package modules. They simulate the change and report accurately. The apt/dnf/yum modules will tell you what they’d install given current cache state.
Skipped in check mode. Many modules just refuse to run and report skipped. That’s safe but it means you get no prediction at all — a silent blind spot.
Liars. command and shell. By default they do not execute under --check, so they report skipped and contribute nothing. But the moment someone adds check_mode: false to force them to run, they execute for real — including in your dry run.

That last point is the one that bites. Watch what command does:

- name: Regenerate the GRUB config
  ansible.builtin.command: grub2-mkconfig -o /boot/grub2/grub.cfg
  # In --check this is SKIPPED. You see nothing. It is not a guarantee
  # of "no change" — it's a guarantee of "I have no idea."

A skipped line on a command task is not reassurance. It is the absence of information. When I see one in a check run, I treat it as a question I have to answer some other way — by reading the script it calls, or by knowing what the command does. The dry run will not do it for me.

The check_mode: false escape hatch and why it’s dangerous

Sometimes you genuinely need a task to run even during a dry run — most often a read-only fact-gathering step whose output later tasks depend on. The canonical example:

- name: Get current kernel version (safe to run in check mode)
  ansible.builtin.command: uname -r
  check_mode: false
  changed_when: false
  register: running_kernel

That’s fine — uname -r reads, it doesn’t write, and changed_when: false keeps it from polluting your change count. The pattern is correct here.

The danger is when check_mode: false is slapped onto a task that does mutate state, usually to “fix” a playbook that wouldn’t otherwise work in check mode. When you do that, your dry run is no longer a dry run. It is a partial real apply. I grep every playbook for this before I trust a check run:

grep -rn "check_mode:\s*false\|check_mode:\s*no" roles/ playbooks/

Every hit is something to inspect by hand. If it’s a read with changed_when: false, fine. If it writes, the --check run is misleading and I flag it loudly in the change ticket. This is exactly the kind of subtle-correctness reasoning that pairs well with making a playbook genuinely safe to re-run — see using AI to make an Ansible playbook truly idempotent for the companion problem.

Taming —diff noise

--diff is honest but verbose. Run it against thirty hosts with a few templated configs and you get hundreds of lines, most of them whitespace or a single changed setting buried in unchanged context. The noise is what makes people stop reading — and that’s exactly when the real change slips through.

A few practical knobs. Narrow the scope so you’re not diffing the world:

ansible-playbook -i inventories/prod site.yml \
  --check --diff \
  --limit compute-07 \
  --tags sshd

Pin it to one representative host and one tag, confirm the change is what you expect, then widen. For tasks that produce intentionally huge or sensitive diffs (a rendered secrets file, a giant generated config), suppress the body per task so it doesn’t drown everything else:

- name: Render the big generated inventory file
  ansible.builtin.template:
    src: hosts.j2
    dest: /etc/ansible/generated_hosts
  diff: false   # still reports changed, just doesn't dump 4000 lines

You still see that it changed in the summary; you just opt out of the wall of text. Use this sparingly — turning off diff to make output prettier defeats the purpose. I only do it for files I review another way.

Where AI earns its keep

Here is the honest division of labor: AI does not run your dry run, does not have prod access, and does not get to approve anything. What it is genuinely good at is reading a 600-line --diff dump faster than you will and producing a change summary you can sanity-check. I capture the run and feed it the raw output:

ansible-playbook -i inventories/prod site.yml --check --diff \
  --limit compute-nodes 2>&1 | tee /tmp/dryrun.log

Then a prompt like this:

Below is ansible-playbook --check --diff output. Summarize what would change, grouped by host and by file. Call out: (1) any command/shell tasks that were skipped, since those are blind spots, (2) any task with check_mode: false, (3) changes to sshd, sudoers, firewall, or kernel/sysctl, ranked by risk. Do not reassure me — list what you cannot verify.

A good response reads back something I can actually verify against the log:

compute-07 to compute-12: /etc/ssh/sshd_config — PermitRootLogin yes→no and PasswordAuthentication yes→no. High risk: if your control path relies on password auth or root SSH, you lock yourself out.

Blind spots: Task Regenerate the GRUB config (command) was skipped in check mode — the dry run cannot tell you whether grub.cfg changes. Verify manually before the boot-affecting window.

No check_mode: false writes detected in the visible output.

That is the right shape: it decodes the diff, flags the SSH lockout risk a tired engineer skims past, and — critically — admits the GRUB task is unverifiable rather than papering over it. The AI surfaced the question; I still have to answer it.

The control stays with you. AI drafts the summary, decodes the diff, and ranks the risk. A human reads the actual --diff for the high-risk files, resolves every blind spot by hand, and clicks apply. I keep a few of these review prompts in my prompts collection so the framing is consistent across the team, and the rest of my Ansible workflow lives under the ansible category.

The workflow I actually use

grep the roles for check_mode: false and command/shell tasks. Know your blind spots before you run.
Run --check --diff scoped to one host and tag, then widen.
Pipe the full output to a log, feed it to AI for a grouped, risk-ranked summary.
Read the raw diff for every high-risk file myself — sshd, sudoers, firewall, kernel.
Manually resolve every skipped command/shell task. A skipped line is a question, not an answer.
Only then apply, without --check, in the window.

The dry run is a powerful tool that quietly tells you the truth most of the time and lies the rest. AI makes the truthful parts faster to read and the dangerous parts harder to skim past. It does not make the dry run honest — that’s still your job.

Reviewing Ansible Check and Diff Dry Runs With AI Before Prod