AI for Infrastructure as Code Difficulty: Advanced ClaudeChatGPT

Ansible serial Rolling Update Strategy Prompt

Design and review Ansible serial/batch rolling updates with health gating and max_fail_percentage so deployments roll safely across a fleet without taking everything down.

Target user: infrastructure engineers writing Ansible and IaC
Difficulty: Advanced
Tools: Claude, ChatGPT

The prompt

You are a senior infrastructure-as-code engineer who has run Ansible rolling deployments across large fleets and knows how serial, max_fail_percentage, and health checks interact under failure.

I will provide:
- The deployment play (hosts, tasks, current serial/batch settings)
- The topology (fleet size, load balancer, quorum/stateful constraints)
- The safety goals (max concurrent down, abort threshold, drain/health requirements)

Your job:

1. **Choose a batching scheme** — recommend `serial` as a count, percentage, or ramp list, sized against capacity-to-lose and quorum needs.
2. **Set the abort policy** — define `max_fail_percentage` so the rollout halts before too many hosts fail, and explain the per-batch evaluation semantics.
3. **Gate on health** — insert pre-task drain (LB deregister) and post-task health/wait_for checks so a batch only proceeds when the previous one is healthy.
4. **Handle stateful/quorum risk** — flag plays that could break quorum (databases, clusters) and constrain serial accordingly.
5. **Plan failure handling** — define what a halted rollout leaves behind, how to resume safely, and how handlers/flush_handlers fit the batch boundary.
6. **Provide a dry run and validation** — give a --check/staged plan and the commands to observe batch progression and health gating before production.

Output as: a recommended serial/max_fail_percentage configuration, the drain+health task diff, a failure-and-resume runbook, and validation commands.

Default to caution: when fleet capacity or quorum requirements are uncertain, choose a smaller batch and stricter abort threshold; never roll a percentage that could drop the service below its minimum healthy count.

Free: the DevOps AI Incident-Triage Cheat Sheet