Writing Custom Ansible Lookup Plugins in Python With AI

The first time I needed data from somewhere Ansible didn’t natively reach, I did the wrong thing. I shelled out to a command task, registered the output, and parsed it with three chained Jinja2 filters that I prayed nobody would ever have to read. It worked, technically, until the day the external tool changed its output format and a dozen playbooks broke at once. That mess is exactly what lookup plugins exist to prevent. A lookup runs on the controller during templating, returns data into your variables, and lives in one reviewable Python file instead of scattered across every playbook that needs it.

Lookups are also where I’m most comfortable letting AI do the first draft, because the plugin contract is small and well-defined. The risk isn’t the AI inventing a wild API; it’s the AI producing something that looks right, skips caching, swallows errors, and leaks a secret into a log. So I let it draft, and then I verify the parts that actually matter. Here’s how that goes.

What a lookup plugin actually is

A lookup plugin is a Python class that subclasses LookupBase and implements one method, run. Ansible calls it during templating whenever you use the lookup() function or the with_<name> loop form. The classic built-ins — file, env, template, password — are all lookups. Yours will be too.

The minimal shape looks like this:

# lookups/my_lookup.py
from ansible.plugins.lookup import LookupBase
from ansible.errors import AnsibleError

DOCUMENTATION = """
  name: my_lookup
  short_description: fetch a value from our internal config service
  description: Returns the value for a given key from the internal config API.
  options:
    _terms:
      description: keys to look up
      required: True
"""


class LookupModule(LookupBase):
    def run(self, terms, variables=None, **kwargs):
        results = []
        for term in terms:
            results.append(self._fetch(term))
        return results

    def _fetch(self, key):
        # real implementation goes here
        raise NotImplementedError

The contract is genuinely that simple. run receives terms (whatever you passed to lookup()), and it must return a list — one entry per term. Forgetting that it returns a list is the single most common mistake, and it’s the first thing I check in any AI-generated draft.

The AI prompt that produces a usable draft

A vague request gets you a vague plugin. I give the model the contract, the source, and the failure behavior I want, and I make it justify its choices. Here’s the kind of prompt I use:

You are a senior Ansible engineer. Write a lookup plugin named config_value that fetches keys from our internal config REST API at a base URL given as an option. Requirements: subclass LookupBase; run returns a list with one result per term; declare options in a DOCUMENTATION block and read them with self.get_option; cache results per run so the same key isn’t fetched twice; on a missing key, raise AnsibleError with a clear message rather than returning an empty string; never log the returned value because some keys are secrets. Include a sample playbook usage and a note on where the plugin file goes.

The output I get back is usually 80% correct. The remaining 20% — the part I never trust blindly — is the caching key, the error path, and whether any secret value sneaks into a debug line. Those are exactly the things a plausible-looking draft gets subtly wrong.

Caching: the difference between fast and embarrassing

A lookup runs every time Ansible templates the expression. If you reference lookup('config_value', 'db_host') in a loop over a hundred hosts, a naive plugin makes a hundred API calls. The fix is a per-run cache keyed on the meaningful inputs:

class LookupModule(LookupBase):
    def run(self, terms, variables=None, **kwargs):
        self.set_options(var_options=variables, direct=kwargs)
        cache = {}
        results = []
        for term in terms:
            if term not in cache:
                cache[term] = self._fetch(term)
            results.append(cache[term])
        return results

This is a within-call cache. For caching across the whole run you can lean on Ansible’s lookup caching options, but I find a local cache plus a sensibly small fetch surface covers most real cases without surprising anyone with stale data. The point is to never hammer an external source just because templating happens to reference the same key many times.

Error handling that fails loudly

The worst lookup bug is the quiet one. If your plugin returns an empty string when a key is missing, that empty string flows into a config template and produces a broken-but-valid file that nobody notices until production misbehaves. Lookups should fail loudly:

# Good: a missing key stops the play with a clear message
TASK [render config] ***********************************************************
fatal: [web01]: FAILED! => {"msg": "config_value: key 'db_host' not found in config service"}

I make the AI raise AnsibleError on missing or ambiguous data, and I test that path on purpose by looking up a key I know doesn’t exist. A plugin that only works on the happy path isn’t done.

Verifying the draft before you trust it

AI drafts the plugin; you verify it. My checklist is short and non-negotiable:

# 1. Does it return a list and resolve a known key?
ansible -m debug -a "msg={{ lookup('config_value', 'db_host') }}" localhost

# 2. Does a missing key fail loudly instead of returning ''?
ansible -m debug -a "msg={{ lookup('config_value', 'nope') }}" localhost

# 3. Does -vvv show the secret value anywhere it shouldn't?
ansible -vvv -m debug -a "msg=ok" localhost 2>&1 | grep -i secret

That third check is the one people skip. Because lookups run on the controller and often pull credentials, a careless display.v() call or an exception that includes the value will splash a secret into the run output. I read the verbose output specifically looking for leaked values before I let the plugin anywhere near a real playbook.

Pro Tip: Keep custom lookups in a lookup_plugins/ directory next to your playbook (or in a collection) and commit them with tests. A lookup that lives on one laptop is a templating failure waiting to happen for everyone else.

When not to write one

Not every external-data need justifies a plugin. If the data is static, put it in vars. If a built-in lookup already covers it — file, env, url, aws_secret, the community ones — use that instead of reinventing it. A custom lookup is worth it when the source is genuinely yours and you’d otherwise be parsing command output with Jinja2, which is the trap I started this piece with.

Lookup plugins are one of the highest-leverage, lowest-risk places to let AI help: the contract is small, the failure modes are knowable, and verification takes three commands. Let the model draft, then check the list return, the cache, the error path, and the logs yourself.

For more on the safe-drafting pattern, see writing custom Ansible modules in Python with AI and the broader AI for Ansible category. When you’re ready to scaffold one, the Ansible prompts collection has framing you can reuse.