AI-Assisted Review of an Ansible Merge Request

A junior engineer on my team opened a merge request last week that looked clean. Twelve lines changed in a role that configures our OpenStack compute nodes. Green CI, sensible commit message, no merge conflicts. I almost rubber-stamped it on my phone between meetings. Then I noticed the change touched a task that writes a credentials file, and the no_log: true that used to guard it was gone — collateral damage from a refactor. That secret would have landed in plaintext in every Ansible run log and our central syslog. CI didn’t catch it because CI doesn’t know what a secret looks like. I caught it because I happened to be paying attention, which is a terrible thing to rely on at scale.

That near-miss is exactly the kind of thing an AI reviewer is good at. Not replacing my judgment — flagging the boring, easy-to-miss regressions so my judgment has something to focus on. Here’s how I wire AI into Ansible reviews without handing it the keys.

Feed It the Diff, Not the Repo

The single most important decision is what you give the model. Don’t paste the whole repository. Don’t paste the whole file. Paste the diff. A focused git diff keeps the model anchored to what actually changed, which is what review is about, and it keeps your token budget sane.

# What the reviewer should actually see
git diff origin/main...HEAD -- roles/ playbooks/ group_vars/

If the change spans many files, include a little context with -U10 so the model can see surrounding tasks, handlers, and variable references rather than guessing. The diff is the artifact. Everything else is noise that dilutes the review.

What to Make It Catch

Generic “review this code” prompts produce generic mush. Ansible has a specific failure surface, and you should name it. Here’s the checklist I bake into every review prompt:

Idempotency regressions. command/shell tasks added without creates, removed_when, or a changed_when guard. A task that reports “changed” on every run is a lie that breaks --check and masks real drift. This is the single most common rot in Ansible roles — I wrote a whole piece on making a playbook truly idempotent if you want the deep version.
Missing no_log. Any task templating credentials, tokens, private keys, or passwords. This is the one that bit me.
Hardcoded values that should be variables. IPs, hostnames, ports, file paths, version pins buried in tasks instead of defaults/main.yml.
become misuse. Blanket become: true at the play level when only two tasks need root, or privilege escalation missing where a task clearly writes to /etc.
Missing tags and handlers. A config change with no notify to restart the service, or a new task block with no tags so nobody can run it selectively.
Lint violations the model can reason about. Deprecated module names, bare variables, when conditions comparing to string "true".

Structuring the Review Prompt

A good prompt is explicit about the artifact, the checklist, and the output format. Vague in, vague out. This is the template I keep in our shared prompt library so the whole team runs the same review:

You are reviewing an Ansible merge request. Below is the git diff.

Review ONLY the changed lines. For each issue, give:
- severity (blocker / warning / nit)
- the file and task name
- why it matters operationally
- a concrete fix

Check specifically for:
1. Idempotency: command/shell without creates/changed_when
2. Secrets: tasks handling credentials missing no_log: true
3. Hardcoded values (IPs, paths, versions) that belong in defaults/vars
4. become scope: play-level escalation that should be task-level
5. Missing handlers (config change with no notify) or missing tags
6. Deprecated modules or anti-patterns ansible-lint would flag

Do not invent problems in unchanged code. If a hunk is fine, say so.

--- DIFF ---
<paste git diff here>

That last instruction — “do not invent problems in unchanged code” — matters more than you’d think. Without it, models love to editorialize about lines you didn’t touch, and the signal-to-noise ratio tanks.

A Real Diff and a Real Review

Here’s the kind of hunk that shows up in an actual MR. The engineer added a step to fetch a config and write a token file:

@@ roles/compute/tasks/main.yml @@
-    - name: Write API token
-      ansible.builtin.template:
-        src: token.j2
-        dest: /etc/nova/api_token
-        mode: "0600"
-      no_log: true
+    - name: Write API token
+      ansible.builtin.template:
+        src: token.j2
+        dest: /etc/nova/api_token
+        mode: "0600"
+
+    - name: Refresh nova metadata cache
+      ansible.builtin.shell: nova-manage cell_v2 discover_hosts
+      become: true

Two problems hiding in five added lines. Feed it the prompt above and you get back something like this:

blocker — roles/compute/tasks/main.yml, “Write API token”: the no_log: true guard was dropped in this change. This task renders a credential into /etc/nova/api_token; without no_log, the templated content is exposed in verbose output and run logs. Restore no_log: true.

warning — roles/compute/tasks/main.yml, “Refresh nova metadata cache”: this shell task has no changed_when or creates, so it reports changed on every run and breaks --check mode. Add changed_when: false if it’s read-only, or gate it on actual state. Consider a more specific tag so it can be run in isolation.

The file mode (0600) and become: true scope look appropriate for the new task.

That’s a useful review. It names the file, the task, the operational consequence, and the fix — and it confirms the parts that are fine so I’m not left wondering whether it just gave up.

Let the Linter Do the Mechanical Part

AI is for judgment calls; ansible-lint is for rules. Run both. The linter is deterministic, fast, and free, so there’s no reason to spend model tokens on things a rule engine nails every time.

# Lint only what changed, fail the review on real violations
ansible-lint --offline --strict \
  $(git diff --name-only origin/main...HEAD -- '*.yml' '*.yaml')

Let the linter own the mechanical violations and let the model own the contextual ones — “this value should be a variable,” “this task probably wants a handler.” They cover different ground.

Light CI Integration

I run the AI review as a non-blocking CI job that posts a comment on the MR. Advisory, not a gate. The linter and a syntax check are the hard gates; the AI is a second set of eyes that shows up before mine.

# .gitlab-ci.yml (sketch)
ai-review:
  stage: review
  script:
    - git diff origin/$CI_MERGE_REQUEST_TARGET_BRANCH_NAME...HEAD > /tmp/mr.diff
    - ./scripts/ai-review.sh /tmp/mr.diff > review.md
    - ./scripts/post-mr-comment.sh review.md
  allow_failure: true   # never block the merge on the AI
  rules:
    - if: $CI_PIPELINE_SOURCE == "merge_request_event"

allow_failure: true is deliberate. The day the model is down, hallucinates, or rate-limits, your delivery pipeline must not care.

Why the Human Still Approves

The model doesn’t know that the hardcoded IP is our one legacy load balancer that genuinely can’t move to a variable yet. It doesn’t know we’re mid-migration and that “missing handler” is intentional because the service gets restarted by a separate maintenance window. It has no stake in the 2 a.m. page when a compute node won’t rejoin the cluster. AI drafts the review, decodes the diff, and flags the patterns. I verify against reality and click approve. That ordering is the whole point.

Wire it up, keep it advisory, and keep your name on the approval. The reviews on our Ansible work got faster and the silly regressions stopped reaching production — not because a model started making decisions, but because it stopped letting me skim past the ones that matter.