Tuning Ansible Performance: Forks, Pipelining, and Fact

The first time I really felt Ansible’s defaults bite, I was watching a deploy playbook crawl across 120 hosts at 4:50pm on a Friday. Forty minutes. I timed it. Most of that wall-clock was spent doing absolutely nothing useful: re-gathering facts on machines whose facts hadn’t changed in months, opening a fresh SSH connection for every single task, and processing hosts five at a time because nobody had ever touched forks. The CPU on the control node was idle. The network was idle. Ansible was just… waiting, politely, in a long line of its own making.

That playbook now runs in just under four minutes. I didn’t rewrite it. I tuned it. Below is the playbook-agnostic checklist I wish someone had handed me that Friday afternoon.

Start by Measuring, Not Guessing

Before you change a single line, find out where the time actually goes. Ansible ships with a callback plugin called profile_tasks that prints a per-task timing table at the end of a run. Enable it in ansible.cfg:

[defaults]
callbacks_enabled = profile_tasks

(On older Ansible, the key is callback_whitelist instead of callbacks_enabled — check your version.)

Run the playbook and you’ll get something like this at the end:

Wednesday 17 June 2026  09:12:44 +0000 (0:00:31.402) =====================
gather_facts ----------------------------------------------------- 31.40s
deploy : install application packages ---------------------------- 22.18s
deploy : template 40 nginx vhosts in a loop ---------------------- 19.07s
common : copy ssl certificates ------------------------------------ 4.51s
deploy : restart application --------------------------------------- 1.20s

That table is the whole game. Thirty-one seconds gathering facts before any real work happens, nineteen seconds in a templating loop. Now you know exactly what to attack. Optimizing anything that isn’t in the top three lines is a waste of your afternoon.

Pro Tip: Commit your profile_tasks output to the PR description before and after a tuning change. “It feels faster” is not a benchmark; a 31s → 2s line in a timing table is.

Turn Up the Forks

Ansible’s default forks = 5 means it talks to five hosts at a time. On a fleet of 120, that’s 24 sequential waves. Raising it is the single highest-leverage change for any inventory bigger than a handful of machines.

[defaults]
forks = 50

The ceiling is your control node’s resources (each fork is a Python process and an SSH connection) and how much concurrent load your targets can tolerate. I usually start at forks = 25, watch the control node’s load average and memory, and climb from there. Going from 5 to 50 on that Friday playbook alone roughly halved the wall-clock — the fact-gathering and package steps stopped queueing.

Pipelining: Fewer SSH Round Trips per Task

By default, each task copies a module file to the remote host, executes it, then cleans up — multiple SSH operations per task. Pipelining collapses that into a single connection by piping the module straight to the remote Python interpreter over the existing SSH session.

[ssh_connection]
pipelining = true

This is one of the biggest wins for playbooks with many small tasks, and it’s free. There’s exactly one caveat: pipelining needs requiretty disabled in the remote /etc/sudoers (it’s off by default on most modern distros, but locked-down or older images sometimes turn it on). If you see sudo errors right after enabling pipelining, that’s your culprit. Either remove Defaults requiretty from sudoers or leave pipelining off for those hosts.

Reuse SSH Connections with ControlPersist

Even with pipelining, you don’t want to renegotiate an SSH connection for every task. OpenSSH’s ControlMaster/ControlPersist keeps a connection open and multiplexes subsequent tasks over it:

[ssh_connection]
ssh_args = "-o ControlMaster=auto -o ControlPersist=60s"
control_path = "%(directory)s/%%h-%%r"

ControlPersist=60s keeps the master socket alive for 60 seconds after the last task, so the next task — or the next playbook you run within the minute — skips the entire TCP and SSH handshake. On high-latency links (think cross-region, or a VPN to a far-away data center) this is enormous, because handshake latency is paid once instead of per task.

Stop Re-Gathering Facts You Already Have

Fact gathering was the worst offender in my timing table, and it’s the easiest to fix because most of the data doesn’t change between runs. Turn on fact caching so facts persist across runs, and switch gathering to smart so Ansible only re-gathers when the cache is stale.

[defaults]
gathering = smart
fact_caching = jsonfile
fact_caching_connection = "/tmp/ansible_facts"
fact_caching_timeout = 7200

jsonfile writes facts to local disk — zero extra infrastructure, perfect for getting started. For a team or CI where the cache should be shared, point it at Redis instead:

[defaults]
gathering = smart
fact_caching = redis
fact_caching_connection = "localhost:6379:0"
fact_caching_timeout = 7200

Two more levers. First, if a play doesn’t reference any facts, just skip gathering entirely:

- name: "Roll out static config"
  hosts: web
  gather_facts: false
  tasks:
    - name: "Deploy nginx config"
      ansible.builtin.template:
        src: nginx.conf.j2
        dest: /etc/nginx/nginx.conf

Second, when you do need facts but only a slice of them, narrow the subset. The full network and hardware enumeration is what makes gathering slow; if all you need is the OS family, ask for min:

- name: "Patch hosts"
  hosts: all
  gather_subset:
    - "!all"
    - "!min"
    - "network"
  tasks:
    - name: "Apply security updates"
      ansible.builtin.package:
        name: "*"
        state: latest

That gather_facts step went from 31 seconds to under 2 once caching was warm and the subset was trimmed.

Fix the Slow Loop, Not the Whole Play

The nineteen-second templating loop was a classic anti-pattern: a with_items loop that ran a separate task per vhost, each with its own connection overhead. Many modules accept a list directly, which turns N round trips into one. And for genuinely long-running, independent work, fire it asynchronously with async/poll so hosts don’t block each other:

- name: "Run the long database migration without blocking"
  ansible.builtin.command: /opt/app/migrate.sh
  async: 1800
  poll: 0
  register: migration

- name: "Do other useful work while migration runs"
  ansible.builtin.service:
    name: app-cache
    state: restarted

- name: "Wait for the migration to finish"
  ansible.builtin.async_status:
    jid: "{{ migration.ansible_job_id }}"
  register: job_result
  until: job_result.finished
  retries: 60
  delay: 30

poll: 0 kicks the task off and moves on immediately; the async_status check reaps it later. For a 25-minute migration that would otherwise hold the connection open and stall the rest of the play, this is the difference between parallel and serial.

Strategy and the Mitogen Option

Ansible’s default execution strategy is linear: every host must finish a task before any host starts the next one. One slow box holds up the entire fleet. The free strategy lets each host race ahead through the play as fast as it can:

- name: "Independent per-host provisioning"
  hosts: all
  strategy: free

Use free when tasks are independent per host. Keep linear when ordering across hosts matters — for example, a rolling deploy with serial where you genuinely want batches to complete in lockstep.

For a heavier hammer, the Mitogen strategy plugin rewrites how Ansible ships and executes code, often cutting run times dramatically by avoiding repeated interpreter startup. It’s a third-party plugin, it’s version-sensitive, and it occasionally breaks on edge-case modules — so pin it, test it against your real playbooks, and treat a Mitogen upgrade like any other dependency bump.

Where AI Fits — and Where It Doesn’t

Here’s where this gets genuinely faster to iterate on. A profile_tasks table is exactly the kind of structured output an AI assistant reads well. Paste it into a tool like Claude or Cursor, describe your inventory size and latency, and ask it to propose ansible.cfg changes ranked by expected impact. It’s good at spotting that your 31-second fact-gathering line means you haven’t enabled caching, or that a per-item loop should become a single list call. Our prompt library and the IaC prompt packs have ready-made prompts for exactly this “read my profile output and recommend tuning” workflow.

But treat the AI as a fast junior engineer, not an oracle. It will confidently suggest forks = 100 without knowing your control node has 2GB of RAM. It can’t feel the latency to your hosts or know that one subnet has requiretty locked on. So the human stays in the loop on every change:

Benchmark, don’t trust. Apply one change, re-run profile_tasks, compare the numbers. AI’s “this should be faster” is a hypothesis, not a result.
Always dry-run first. Run new config and tasks under --check (and --diff) before they touch production. Check-mode catches the change that would have restarted the wrong service.
Never hand AI your vault keys. Paste timing tables and sanitized config, never the contents of ansible-vault or your vault_password_file. Secrets stay on your machine, full stop.

If you’re rolling tuning changes into a PR, route the diff through a code review pass so a second set of eyes — human or assisted — catches the forks value that’s too aggressive for your fleet before it ships.

Wrapping Up

None of this required rewriting the playbook. Raise forks, enable pipelining and ControlPersist, cache your facts and gather only what you need, async the long tasks, and pick the right strategy. Measure with profile_tasks before and after every change so you know which lever actually moved the needle. The forty-minute Friday playbook didn’t get smarter — it just stopped waiting in line. Let AI accelerate the diagnosis, but keep your own hand on the benchmark.

Tuning Ansible Performance: Forks, Pipelining, and Fact Caching