Skip to content
DevOps AI ToolKit
Newsletter
All guides
AI for Ansible By James Joyner IV · · 10 min read

Running Async Ansible Tasks With async and poll Using AI

Master Ansible async and poll for long-running and parallel tasks, with AI help choosing keep-alive vs fire-and-forget and reaping jobs with async_status.

  • #ansible
  • #ai
  • #async
  • #orchestration
  • #performance

There’s a moment every Ansible user hits: a task that takes too long. A multi-gigabyte download, a slow package build, a database dump, a host reboot. The connection times out, the task fails with something about a closed channel, and the playbook dies on work that was actually proceeding fine. The fix is async, and almost everyone who reaches for it uses it slightly wrong, because async quietly solves two very different problems with two very different idioms — and conflating them is the source of nearly every async bug.

I use AI to help classify which problem I’m actually solving and to wire up the reaping correctly, because the failure mode of getting it wrong is the worst kind: a play that reports success while the real work failed silently. Let’s separate the two cases cleanly.

Two problems, two idioms

The first problem is “this task is legitimately long and I don’t want the connection to time out.” Here you want Ansible to keep waiting, but with a keepalive so the connection doesn’t drop. That’s async with a non-zero poll:

- name: download the big artifact (keep-alive, wait for it)
  ansible.builtin.get_url:
    url: https://artifacts.internal/release-8gb.tar.gz
    dest: /opt/release.tar.gz
  async: 1800      # allow up to 30 minutes
  poll: 15         # check every 15 seconds, keeping the connection alive

Ansible starts the task, then polls it every 15 seconds until it finishes or hits the 1800-second ceiling. The play waits, but the connection never goes idle long enough to time out. This is the common case and the safe one.

The second problem is “I want to fire this off and let the play continue, then collect the result later.” That’s async with poll: 0 — fire-and-forget:

- name: kick off a long batch job and move on
  ansible.builtin.command: /opt/run-batch.sh
  async: 3600
  poll: 0
  register: batch_job

With poll: 0, Ansible launches the task and immediately moves on without waiting. This is how you run several independent long jobs in parallel. It’s also where the dangerous bug lives, because that job is now running unsupervised and nobody is checking whether it succeeded.

The reaping trap

Here is the mistake I see most often, and the one I make AI guard against explicitly: a poll: 0 task that’s never reaped. You fired the job, the play moved on, the play finished green — and the job failed twenty minutes later with nobody watching. The play looked successful because the only thing it checked was that the job started.

A fire-and-forget task is only half a pattern. The other half is async_status, which goes back and collects the real result:

- name: kick off the batch job
  ansible.builtin.command: /opt/run-batch.sh
  async: 3600
  poll: 0
  register: batch_job

# ... other work happens here in parallel ...

- name: wait for the batch job and capture its real result
  ansible.builtin.async_status:
    jid: "{{ batch_job.ansible_job_id }}"
  register: job_result
  until: job_result.finished
  retries: 60
  delay: 30

That async_status block polls the job ID until it’s finished, and crucially, if the job failed, this is what surfaces the failure and stops the play. Without it, the failure vanishes. My rule, and the rule I bake into any AI prompt about async, is simple: every poll: 0 task must have a matching async_status reap, or it doesn’t exist.

When I ask AI to design async work, I’m explicit:

I have three independent long-running tasks (a backup, a cache warm, a report build) that each take 10-20 minutes. I want them running in parallel, not sequentially. Show me the fire-and-forget pattern with poll:0, and the async_status reaping for each so a failure in any of them fails the play. Make the async timeouts safely above the real runtimes.

Get the timeout right

The async value is the maximum time the task is allowed to run. Set it too close to the real runtime and Ansible declares failure on a job that’s still legitimately working — the work may even complete on the host while Ansible has already given up on it. Always leave generous headroom:

async: 1800   # for a task you expect to take ~10 minutes

That gap between expected and allowed runtime is your safety margin against a slow day. I ask AI to set async well above the realistic worst case, not the average, because the average isn’t what bites you.

Pro Tip: A task with async but poll: 0 does not honor --check cleanly and detaches from the connection. Test async timeouts on one host with --limit first, watch a full run complete, and confirm the async_status reap catches a deliberately-failed job before you trust the pattern on the fleet.

Reboots are their own case

Rebooting a host is a special flavor of long-running task, because the connection genuinely goes away mid-operation. You can hand-roll this with async plus wait_for_connection, but the purpose-built reboot module is almost always the better choice — it handles the disconnect, the wait, and the reconnection for you:

- name: reboot and wait for the host to come back
  ansible.builtin.reboot:
    reboot_timeout: 600

I reach for raw async + wait_for_connection only when I need behavior the reboot module doesn’t cover. For an ordinary reboot, the module is the safe, readable default, and I let AI steer me to it rather than reinventing the wait loop.

Don’t let speed hide failures

The whole appeal of async is doing more in less wall-clock time, and that’s a real win when you use it right. But the appeal is also the trap: a fire-and-forget job that nobody reaps makes the play look fast and successful while quietly dropping failures on the floor. The discipline that keeps async honest is small — classify the need, reap every poll: 0 job, leave timeout headroom — and it’s exactly the kind of structure AI is good at scaffolding once you tell it the rules. Verify the reaping catches a failed job on one host, then scale out.

For making intermittent tasks reliable rather than just parallel, see making flaky Ansible tasks reliable with AI: retries, until, and wait_for and the AI for Ansible category. For a reusable async-design prompt, browse the Ansible prompts.

Free download · 368-page PDF

Download the Free 500-Prompt DevOps AI Toolkit

500 battle-tested, copy-paste AI prompts engineered by a senior systems engineer — every one with fill-in placeholders and safety/back-out notes. Drop your email and it's yours.

  • 500 prompts: Linux · Kubernetes · Terraform · OpenStack · GitLab · Docker · Monitoring · Incident Response
  • Instant PDF download — yours free, forever
  • Plus one practical AI-workflow email a week (no spam)

Single opt-in · unsubscribe anytime · no spam.