Configuring Ansible Fact Caching With Redis and jsonfile Using AI
Set up Ansible fact caching with the jsonfile and redis backends, using AI to reason about staleness, gather_subset, and TTLs so reruns are fast and correct.
- #ansible
- #ai
- #performance
- #facts
- #caching
Fact gathering is the silent tax on most playbook runs. Before your first real task executes, Ansible connects to every host and runs the setup module to collect facts — interfaces, OS details, memory, mounts, the lot. On a big fleet that’s a noticeable chunk of every run, and you pay it again the next run, and the run after that, even though almost none of those facts changed. Fact caching is the fix: gather once, store the results, and reuse them across runs. The catch, and the reason this needs careful thought rather than a copy-paste config, is that cached facts can go stale and feed wrong values into a play that trusted them.
I use AI here to reason through the staleness tradeoffs and the backend choice, then I verify the cache behaves the way I expect. The config itself is a few lines; the judgment about what’s safe to cache and for how long is the real work.
How fact caching works
When caching is on, Ansible stores gathered facts in a backend keyed by host. On the next run, if a host’s facts are in the cache and not expired, Ansible skips the setup step for that host and loads the cached values instead. The two backends most teams use are jsonfile (facts written to files on the controller) and redis (facts in a shared Redis instance).
[defaults]
gathering = smart
fact_caching = jsonfile
fact_caching_connection = /var/cache/ansible/facts
fact_caching_timeout = 7200
gathering = smart is the key companion setting. It tells Ansible to gather facts only for hosts that don’t already have valid cached facts — which is the whole point. Without it you can end up caching facts but still gathering them every run, getting the staleness risk without the speedup.
jsonfile vs redis
The backend choice comes down to how many controllers you have. jsonfile stores facts as files in a directory on the controller. It’s dead simple, needs no extra infrastructure, and is perfect when one controller (or one CI runner) does all your runs.
[defaults]
fact_caching = jsonfile
fact_caching_connection = /var/cache/ansible/facts
fact_caching_timeout = 86400
redis stores facts in a shared Redis instance, which matters the moment more than one controller runs playbooks — a team of engineers, or several CI runners. With jsonfile, each controller has its own cache and they don’t share; with redis, everyone reads the same cached facts.
[defaults]
fact_caching = redis
fact_caching_connection = redis-host:6379:0
fact_caching_timeout = 86400
Here’s the kind of question I hand to AI when I’m deciding:
We run Ansible from three CI runners and occasionally from engineer laptops. We want fact caching to cut per-run gather time. Walk me through whether jsonfile or redis fits, what
gathering = smartchanges, and what TTL makes sense if host hardware rarely changes but IPs occasionally do.
The answer I’m after isn’t just “use redis.” It’s the reasoning: shared controllers favor redis, the TTL should be short enough that an IP change can’t poison a deploy for a day, and smart gathering is what makes the cache actually save time.
The staleness problem is the real risk
This is where caching earns its reputation for surprises. Suppose a host gets a new IP address. Until the cache entry expires or you flush it, every run reads the old IP from the cache and may target the wrong place or render a config with a stale address. The cache faithfully served you yesterday’s truth.
The defenses are straightforward once you name the risk:
# Force a fresh gather, ignoring the cache, for one run
ansible-playbook site.yml --flush-cache
# Or clear the jsonfile cache directory directly
rm -rf /var/cache/ansible/facts/*
I set fact_caching_timeout based on how volatile the cached facts actually are. Hardware details that never change can have a long TTL. Anything that can shift under you — IPs, mounts, package versions you key decisions on — gets a shorter one, or I make sure the playbook re-gathers when it needs current data. A play that has to know the current state of a host should not blindly trust a day-old cache.
Pro Tip: If a specific play depends on fresh facts, put gather_facts: true on that play explicitly or run a setup task to refresh, rather than fighting the global cache. Cache for speed on the runs that can tolerate it; force-gather on the runs that can’t.
Gathering less, not just caching more
Caching pairs well with gathering fewer facts in the first place. The setup module collects a lot you’ll never use. gather_subset trims it:
- hosts: all
gather_facts: true
vars:
ansible_facts_gather_subset:
- "!all"
- "!min"
- network
# Or globally in ansible.cfg
[defaults]
gather_subset = network,virtual
Gathering only the subsets you actually use makes both the initial gather and the cached payload smaller. I ask AI to map which fact subsets a given playbook genuinely references, then trim to those — a quick audit that shrinks both gather time and cache size with no downside, since you can’t cache wrong values you never collected.
Verifying the cache does what you think
A cache config you didn’t verify is a config you’re hoping about. I confirm two things on a canary host: that the second run actually skips gathering, and that a flush forces a fresh gather.
# First run populates the cache
ansible-playbook site.yml --limit canary
# Second run should show fact gathering skipped for that host
ansible-playbook site.yml --limit canary -v | grep -i "Gathering Facts"
# Flush and confirm it gathers again
ansible-playbook site.yml --limit canary --flush-cache -v | grep -i "Gathering Facts"
If the second run still gathers facts, something’s off — usually gathering isn’t set to smart, or the timeout is too short. If --flush-cache doesn’t force a re-gather, the backend isn’t wired correctly. Both are worth catching on one host before the whole fleet runs on a cache you don’t trust.
Fact caching is a real, easily-won speedup, but it trades a little freshness for that speed. Let AI help you pick the backend and reason about TTLs, then verify the cache hits and flushes the way you expect — and never let a stale fact make a decision that needed the current one.
For the broader performance picture, see tuning Ansible performance with forks, pipelining, and fact caching and the AI for Ansible category. For a reusable caching-design prompt, browse the Ansible prompts.
Download the Free 500-Prompt DevOps AI Toolkit
500 battle-tested, copy-paste AI prompts engineered by a senior systems engineer — every one with fill-in placeholders and safety/back-out notes. Drop your email and it's yours.
- 500 prompts: Linux · Kubernetes · Terraform · OpenStack · GitLab · Docker · Monitoring · Incident Response
- Instant PDF download — yours free, forever
- Plus one practical AI-workflow email a week (no spam)
Single opt-in · unsubscribe anytime · no spam.