Using AI to Document an Undocumented Ansible Codebase
You inherited a 300-role Ansible repo with no docs. Here's how I use AI to map it, generate role READMEs, and document variables without trusting it blindly.
- #iac
- #ansible
- #ai
- #documentation
- #onboarding
There’s a particular dread to inheriting an Ansible repo with no documentation. Hundreds of roles, group_vars files with cryptic keys, playbooks that import each other in ways nobody alive can fully explain, and a README.md that says “see Confluence” linking to a page that 404s. You can’t safely change what you can’t understand, and reverse-engineering it by hand takes weeks. Documentation generation is one of the genuinely best uses of AI for infrastructure, because the model can read the whole repo faster than any human and the output is text a human verifies, not config that runs.
The framing still holds: AI is a fast junior engineer that produces draft documentation quickly, but I verify every factual claim it makes against the actual code, because confidently-wrong docs are worse than no docs.
Start with a structural map, not prose
Before generating any READMEs, I get the lay of the land. I ask AI to produce a dependency map:
“Read this Ansible repo. List every playbook and the roles it imports. For each role, list the roles it depends on via
meta/main.yml. Flag any role that’s imported but never defined, and any role defined but never used.”
This immediately surfaces the structural truth: which roles are entry points, which are leaf utilities, and — critically — which are dead code. In my last inheritance, AI found eleven roles that nothing referenced. That’s not documentation, that’s a cleanup backlog, and it came out of the mapping step for free.
Generate role READMEs, then verify against the code
Galaxy-style roles are supposed to have a README.md documenting their variables and behavior. Most inherited roles don’t. AI writes a solid first draft from the role’s defaults/main.yml and tasks/:
# roles/app_deploy/defaults/main.yml
app_version: "latest"
app_port: 8080
app_replicas: 2
app_health_path: "/healthz"
The AI-drafted README turns that into a documented variable table — but here’s the discipline: I verify each described variable against where it’s actually used in tasks/, not just where it’s defined. AI sometimes documents a default’s name accurately but invents its purpose. The fix is to make it cite:
“For each variable, quote the exact task line where it’s used. If you can’t find a usage, mark it ‘unused — verify’.”
Forcing citations turns “the AI says this controls replicas” into “this variable appears on line 14 of deploy.yml, here’s the line” — which I can check in seconds.
Pro Tip: Never let AI document a variable’s behavior without quoting the task that consumes it. The variable name suggests intent; the consuming task proves it. Documentation built on the name alone is documentation built on a guess.
Document the variables that hurt people
Some variables are dangerous — flip the wrong one and you take down prod. I ask AI specifically to flag these:
“Which variables in this role, if set incorrectly, would cause downtime or data loss? Explain the failure mode for each.”
AI is good at spotting these because they follow patterns — anything controlling state, replica counts, deletion behavior, or ports. A variable like app_purge_data: false is exactly the kind of footgun I want loudly documented at the top of the README, not buried in a table.
Generate the onboarding doc you wish existed
The single most valuable artifact is the “how does a deploy actually work here” walkthrough. I ask AI to trace one path end to end:
“Trace what happens when someone runs
ansible-playbook deploy.yml -e env=prod. List the order of roles applied, the key variables resolved at each step, and the hosts targeted. Write it as an onboarding doc for a new engineer.”
This produces the mental model that otherwise takes a new hire a month to build. I read it carefully against the actual play order, because a wrong trace is dangerous — but a verified one is gold.
Don’t let it document secrets
Documenting a codebase means reading group_vars, and group_vars often contains references to vault-encrypted values. AI sees variable names like vault_db_password and should document that the variable exists and what it’s for — never its value. I never decrypt vault content to “help it document,” and I make sure the generated docs reference secret variables by name and purpose only:
| Variable | Purpose | Source |
|----------|---------|--------|
| `vault_db_password` | Postgres password for the app DB | ansible-vault (do not commit plaintext) |
That’s the right level of documentation: it tells a new engineer the secret exists and where it lives, without ever exposing the value.
Verify, then commit the docs as code
Generated docs go through the same review as any change. I read them against the code, fix the AI’s inaccuracies, and commit the READMEs into the repo next to the roles they describe so they live and die with the code. Docs that live in a wiki rot; docs that live next to the role at least have a chance of staying honest, especially if you regenerate them when the role changes.
The verification step is non-negotiable. A confidently-wrong doc that says “setting app_purge_data: true is safe” is far more dangerous than an honest blank page, because someone will trust it. So every claim gets checked against the code that backs it.
Make documentation a continuous habit
The best time to document is right after you understand something. I keep my documentation prompts in the prompt workspace so generating a role README is a two-minute habit, not a quarterly project. For the broader question of structuring an Ansible repo so it’s documentable in the first place, the Ansible roles and inventory structure guide is the companion to this one.
Inheriting an undocumented Ansible codebase used to mean weeks of archaeology. AI compresses that into days — it reads the whole repo, maps the dependencies, drafts the READMEs, and traces the deploy paths faster than any human could. But it’s a draft, not a source of truth. Make it cite the code, verify every claim, keep secrets to their names, and commit the result. The rest of this series is in the IaC category, and Claude handles whole-repo reading particularly well for this.
Map it, document it, verify it, commit it. Then the next person to inherit it won’t dread it.
Download the Free 500-Prompt DevOps AI Toolkit
500 battle-tested, copy-paste AI prompts engineered by a senior systems engineer — every one with fill-in placeholders and safety/back-out notes. Drop your email and it's yours.
- 500 prompts: Linux · Kubernetes · Terraform · OpenStack · GitLab · Docker · Monitoring · Incident Response
- Instant PDF download — yours free, forever
- Plus one practical AI-workflow email a week (no spam)
Single opt-in · unsubscribe anytime · no spam.