Skip to content
DevOps AI ToolKit
Newsletter
All prompts
AI for Ansible Difficulty: Intermediate ClaudeChatGPTCursor

Ansible async and poll Orchestration Design Prompt

Design fire-and-forget and long-running Ansible tasks with async, poll, and async_status so jobs don't time out and results are collected reliably.

Target user
Engineers running long tasks (large downloads, reboots, batch jobs) that hit connection timeouts or block the play
Difficulty
Intermediate
Tools
Claude, ChatGPT, Cursor

The prompt

You are a senior Ansible engineer who knows `async` is for two distinct problems: keeping a long task from hitting the SSH/connection timeout, and firing tasks off to run in parallel while the play continues. You know `async` + `poll: 0` (fire-and-forget) and `async` + `poll: N` (wait-with-keepalive) behave very differently, and that fire-and-forget needs `async_status` to collect results.

I will give you tasks that are timing out or blocking. Redesign them with the right async pattern.

Steps:

1. **Classify the need**: for each task, decide whether the goal is "don't time out a long task" (use `poll: N` with a generous `async`) or "fire and continue" (use `poll: 0` and reap later).
2. **Set async and poll correctly**: choose an `async` timeout safely above the expected runtime, and a `poll` interval that's frequent enough to catch completion without flooding the connection.
3. **Reap fire-and-forget jobs**: for any `poll: 0` task, register the job id and add an `async_status` loop with `retries`/`delay` and an `until` on `finished` to collect the real result — never leave a fired job unchecked.
4. **Handle reboots specially**: if the task reboots the host, note whether `reboot` module or `async` + `wait_for_connection` is the better fit.
5. **Failure visibility**: ensure a failed async job actually fails the play; a fire-and-forget task that's never reaped hides failures.
6. **Resource guardrail**: flag designs that fire too many parallel jobs and could overload the target or controller.

Fill in:
- Tasks that time out or block: [PASTE]
- Expected runtime of each: [DURATIONS]
- Goal per task: [keep-alive long task / fire-and-forget parallel]
- Does any task reboot the host: [yes/no]

Output format: the redesigned tasks YAML showing async/poll values and any `async_status` reaping loop, a table of task -> pattern -> async/poll -> how results are collected, and a note on verifying with one host before scaling out.

Do not run unbounded fire-and-forget jobs. Recommend testing async timeouts on one host first and always reaping poll:0 jobs so failures surface instead of silently disappearing.

Why this prompt works

async is one of those features people enable to silence a timeout without understanding that it solves two different problems with two different idioms. Using poll: N keeps a genuinely long task — a big download, a slow package build — alive past the connection timeout while the play waits for it. Using poll: 0 fires the task off and lets the play continue, which is how you parallelize independent long jobs. Conflating them is the root of most async bugs, so this prompt makes the very first step a classification: is the goal “don’t time out” or “fire and continue?” Everything downstream follows from that answer.

The reaping step is the one that prevents the most painful failure mode. A poll: 0 task that nobody collects with async_status will let the play finish green while the actual work failed on the host, because Ansible never went back to check. The prompt treats an unreaped fire-and-forget job as a defect, requiring an async_status loop with until: finished so the real result surfaces and a failure actually fails the play. That single discipline turns async from a way to hide problems into a way to manage real concurrency.

The timeout and resource guardrails keep the design grounded. An async value set too close to the real runtime makes Ansible declare failure on a job that is still legitimately running, so the prompt insists on generous headroom. And firing hundreds of parallel jobs can overwhelm the target or the controller, so the design gets a resource check and a one-host test before scaling out — the same careful, verify-first posture that keeps any concurrency change from becoming an outage.

Related prompts

Newsletter

Free: the DevOps AI Incident-Triage Cheat Sheet

Subscribe and we’ll send you the one-page cheat sheet — plus weekly AI prompts, automation ideas, and tool reviews for infrastructure engineers. One email a week. No spam, unsubscribe anytime.

  • AI Incident-Triage Cheat Sheet (PDF)
  • Access to 2,104 DevOps AI prompts
  • One practical workflow email per week