Using AI to Document an Undocumented Ansible Codebase

There’s a particular dread to inheriting an Ansible repo with no documentation. Hundreds of roles, group_vars files with cryptic keys, playbooks that import each other in ways nobody alive can fully explain, and a README.md that says “see Confluence” linking to a page that 404s. You can’t safely change what you can’t understand, and reverse-engineering it by hand takes weeks. Documentation generation is one of the genuinely best uses of AI for infrastructure, because the model can read the whole repo faster than any human and the output is text a human verifies, not config that runs.

The framing still holds: AI is a fast junior engineer that produces draft documentation quickly, but I verify every factual claim it makes against the actual code, because confidently-wrong docs are worse than no docs.

Start with a structural map, not prose

Before generating any READMEs, I get the lay of the land. I ask AI to produce a dependency map:

“Read this Ansible repo. List every playbook and the roles it imports. For each role, list the roles it depends on via meta/main.yml. Flag any role that’s imported but never defined, and any role defined but never used.”

This immediately surfaces the structural truth: which roles are entry points, which are leaf utilities, and — critically — which are dead code. In my last inheritance, AI found eleven roles that nothing referenced. That’s not documentation, that’s a cleanup backlog, and it came out of the mapping step for free.

Generate role READMEs, then verify against the code

Galaxy-style roles are supposed to have a README.md documenting their variables and behavior. Most inherited roles don’t. AI writes a solid first draft from the role’s defaults/main.yml and tasks/:

# roles/app_deploy/defaults/main.yml
app_version: "latest"
app_port: 8080
app_replicas: 2
app_health_path: "/healthz"

The AI-drafted README turns that into a documented variable table — but here’s the discipline: I verify each described variable against where it’s actually used in tasks/, not just where it’s defined. AI sometimes documents a default’s name accurately but invents its purpose. The fix is to make it cite:

“For each variable, quote the exact task line where it’s used. If you can’t find a usage, mark it ‘unused — verify’.”

Forcing citations turns “the AI says this controls replicas” into “this variable appears on line 14 of deploy.yml, here’s the line” — which I can check in seconds.

Pro Tip: Never let AI document a variable’s behavior without quoting the task that consumes it. The variable name suggests intent; the consuming task proves it. Documentation built on the name alone is documentation built on a guess.

Document the variables that hurt people

Some variables are dangerous — flip the wrong one and you take down prod. I ask AI specifically to flag these:

“Which variables in this role, if set incorrectly, would cause downtime or data loss? Explain the failure mode for each.”

AI is good at spotting these because they follow patterns — anything controlling state, replica counts, deletion behavior, or ports. A variable like app_purge_data: false is exactly the kind of footgun I want loudly documented at the top of the README, not buried in a table.

Generate the onboarding doc you wish existed

The single most valuable artifact is the “how does a deploy actually work here” walkthrough. I ask AI to trace one path end to end:

“Trace what happens when someone runs ansible-playbook deploy.yml -e env=prod. List the order of roles applied, the key variables resolved at each step, and the hosts targeted. Write it as an onboarding doc for a new engineer.”

This produces the mental model that otherwise takes a new hire a month to build. I read it carefully against the actual play order, because a wrong trace is dangerous — but a verified one is gold.

Don’t let it document secrets

Documenting a codebase means reading group_vars, and group_vars often contains references to vault-encrypted values. AI sees variable names like vault_db_password and should document that the variable exists and what it’s for — never its value. I never decrypt vault content to “help it document,” and I make sure the generated docs reference secret variables by name and purpose only:

| Variable | Purpose | Source |
|----------|---------|--------|
| `vault_db_password` | Postgres password for the app DB | ansible-vault (do not commit plaintext) |

That’s the right level of documentation: it tells a new engineer the secret exists and where it lives, without ever exposing the value.

Verify, then commit the docs as code

Generated docs go through the same review as any change. I read them against the code, fix the AI’s inaccuracies, and commit the READMEs into the repo next to the roles they describe so they live and die with the code. Docs that live in a wiki rot; docs that live next to the role at least have a chance of staying honest, especially if you regenerate them when the role changes.

The verification step is non-negotiable. A confidently-wrong doc that says “setting app_purge_data: true is safe” is far more dangerous than an honest blank page, because someone will trust it. So every claim gets checked against the code that backs it.

Make documentation a continuous habit

The best time to document is right after you understand something. I keep my documentation prompts in the prompt workspace so generating a role README is a two-minute habit, not a quarterly project. For the broader question of structuring an Ansible repo so it’s documentable in the first place, the Ansible roles and inventory structure guide is the companion to this one.

Inheriting an undocumented Ansible codebase used to mean weeks of archaeology. AI compresses that into days — it reads the whole repo, maps the dependencies, drafts the READMEs, and traces the deploy paths faster than any human could. But it’s a draft, not a source of truth. Make it cite the code, verify every claim, keep secrets to their names, and commit the result. The rest of this series is in the IaC category, and Claude handles whole-repo reading particularly well for this.

Map it, document it, verify it, commit it. Then the next person to inherit it won’t dread it.