Writing Custom Ansible Lookup Plugins in Python With AI
Learn to write custom Ansible lookup plugins in Python, using AI to draft them safely with caching, error handling, and secret hygiene you actually verify.
- #ansible
- #ai
- #plugins
- #python
- #lookup
The first time I needed data from somewhere Ansible didn’t natively reach, I did the wrong thing. I shelled out to a command task, registered the output, and parsed it with three chained Jinja2 filters that I prayed nobody would ever have to read. It worked, technically, until the day the external tool changed its output format and a dozen playbooks broke at once. That mess is exactly what lookup plugins exist to prevent. A lookup runs on the controller during templating, returns data into your variables, and lives in one reviewable Python file instead of scattered across every playbook that needs it.
Lookups are also where I’m most comfortable letting AI do the first draft, because the plugin contract is small and well-defined. The risk isn’t the AI inventing a wild API; it’s the AI producing something that looks right, skips caching, swallows errors, and leaks a secret into a log. So I let it draft, and then I verify the parts that actually matter. Here’s how that goes.
What a lookup plugin actually is
A lookup plugin is a Python class that subclasses LookupBase and implements one method, run. Ansible calls it during templating whenever you use the lookup() function or the with_<name> loop form. The classic built-ins — file, env, template, password — are all lookups. Yours will be too.
The minimal shape looks like this:
# lookups/my_lookup.py
from ansible.plugins.lookup import LookupBase
from ansible.errors import AnsibleError
DOCUMENTATION = """
name: my_lookup
short_description: fetch a value from our internal config service
description: Returns the value for a given key from the internal config API.
options:
_terms:
description: keys to look up
required: True
"""
class LookupModule(LookupBase):
def run(self, terms, variables=None, **kwargs):
results = []
for term in terms:
results.append(self._fetch(term))
return results
def _fetch(self, key):
# real implementation goes here
raise NotImplementedError
The contract is genuinely that simple. run receives terms (whatever you passed to lookup()), and it must return a list — one entry per term. Forgetting that it returns a list is the single most common mistake, and it’s the first thing I check in any AI-generated draft.
The AI prompt that produces a usable draft
A vague request gets you a vague plugin. I give the model the contract, the source, and the failure behavior I want, and I make it justify its choices. Here’s the kind of prompt I use:
You are a senior Ansible engineer. Write a lookup plugin named
config_valuethat fetches keys from our internal config REST API at a base URL given as an option. Requirements: subclassLookupBase;runreturns a list with one result per term; declare options in aDOCUMENTATIONblock and read them withself.get_option; cache results per run so the same key isn’t fetched twice; on a missing key, raiseAnsibleErrorwith a clear message rather than returning an empty string; never log the returned value because some keys are secrets. Include a sample playbook usage and a note on where the plugin file goes.
The output I get back is usually 80% correct. The remaining 20% — the part I never trust blindly — is the caching key, the error path, and whether any secret value sneaks into a debug line. Those are exactly the things a plausible-looking draft gets subtly wrong.
Caching: the difference between fast and embarrassing
A lookup runs every time Ansible templates the expression. If you reference lookup('config_value', 'db_host') in a loop over a hundred hosts, a naive plugin makes a hundred API calls. The fix is a per-run cache keyed on the meaningful inputs:
class LookupModule(LookupBase):
def run(self, terms, variables=None, **kwargs):
self.set_options(var_options=variables, direct=kwargs)
cache = {}
results = []
for term in terms:
if term not in cache:
cache[term] = self._fetch(term)
results.append(cache[term])
return results
This is a within-call cache. For caching across the whole run you can lean on Ansible’s lookup caching options, but I find a local cache plus a sensibly small fetch surface covers most real cases without surprising anyone with stale data. The point is to never hammer an external source just because templating happens to reference the same key many times.
Error handling that fails loudly
The worst lookup bug is the quiet one. If your plugin returns an empty string when a key is missing, that empty string flows into a config template and produces a broken-but-valid file that nobody notices until production misbehaves. Lookups should fail loudly:
# Good: a missing key stops the play with a clear message
TASK [render config] ***********************************************************
fatal: [web01]: FAILED! => {"msg": "config_value: key 'db_host' not found in config service"}
I make the AI raise AnsibleError on missing or ambiguous data, and I test that path on purpose by looking up a key I know doesn’t exist. A plugin that only works on the happy path isn’t done.
Verifying the draft before you trust it
AI drafts the plugin; you verify it. My checklist is short and non-negotiable:
# 1. Does it return a list and resolve a known key?
ansible -m debug -a "msg={{ lookup('config_value', 'db_host') }}" localhost
# 2. Does a missing key fail loudly instead of returning ''?
ansible -m debug -a "msg={{ lookup('config_value', 'nope') }}" localhost
# 3. Does -vvv show the secret value anywhere it shouldn't?
ansible -vvv -m debug -a "msg=ok" localhost 2>&1 | grep -i secret
That third check is the one people skip. Because lookups run on the controller and often pull credentials, a careless display.v() call or an exception that includes the value will splash a secret into the run output. I read the verbose output specifically looking for leaked values before I let the plugin anywhere near a real playbook.
Pro Tip: Keep custom lookups in a lookup_plugins/ directory next to your playbook (or in a collection) and commit them with tests. A lookup that lives on one laptop is a templating failure waiting to happen for everyone else.
When not to write one
Not every external-data need justifies a plugin. If the data is static, put it in vars. If a built-in lookup already covers it — file, env, url, aws_secret, the community ones — use that instead of reinventing it. A custom lookup is worth it when the source is genuinely yours and you’d otherwise be parsing command output with Jinja2, which is the trap I started this piece with.
Lookup plugins are one of the highest-leverage, lowest-risk places to let AI help: the contract is small, the failure modes are knowable, and verification takes three commands. Let the model draft, then check the list return, the cache, the error path, and the logs yourself.
For more on the safe-drafting pattern, see writing custom Ansible modules in Python with AI and the broader AI for Ansible category. When you’re ready to scaffold one, the Ansible prompts collection has framing you can reuse.
Download the Free 500-Prompt DevOps AI Toolkit
500 battle-tested, copy-paste AI prompts engineered by a senior systems engineer — every one with fill-in placeholders and safety/back-out notes. Drop your email and it's yours.
- 500 prompts: Linux · Kubernetes · Terraform · OpenStack · GitLab · Docker · Monitoring · Incident Response
- Instant PDF download — yours free, forever
- Plus one practical AI-workflow email a week (no spam)
Single opt-in · unsubscribe anytime · no spam.