Safer Targeted Ansible Runs With Tags and --limit

Last month I needed to push one tiny change: bump a single client_max_body_size line in an nginx vhost. Thirty seconds of edit. The problem was that the only path to apply it was site.yml — a 30-minute monster that touches every host in the fleet, restarts databases, rotates certs, and reconfigures half a dozen services I had absolutely no business poking that afternoon. To change one line of nginx config, the playbook wanted to redeploy the world.

That is the exact moment I learned to stop tolerating un-targetable playbooks. Ansible already ships everything you need to run a surgical subset of tasks against a surgical subset of hosts. You just have to wire it in: tags on the task side, --limit on the host side, and --check plus a couple of preview flags so you never fire blind. Here is the workflow I run now, and where I let an AI assistant help (and where I absolutely do not).

The blast radius problem

The default behavior of ansible-playbook site.yml is “do all the things, everywhere.” For a greenfield deploy that is correct. For a Tuesday-afternoon config tweak it is a loaded foot-gun. You want two independent dials:

What tasks run — controlled by tags.
What hosts they run on — controlled by --limit and host patterns.

Master both and “push one nginx line to one box” becomes a thirty-second, fully-previewed operation instead of a thirty-minute act of faith.

Tagging tasks, blocks, and roles

Tags are just labels you hang on tasks. Add them where the work happens:

- name: "Render nginx vhost"
  ansible.builtin.template:
    src: "vhost.conf.j2"
    dest: "/etc/nginx/conf.d/app.conf"
  notify: "reload nginx"
  tags:
    - "nginx"
    - "config"

- name: "Install base packages"
  ansible.builtin.apt:
    name: "{{ base_packages }}"
    state: "present"
  tags:
    - "packages"

Tags also apply to a whole block, and every task inside inherits them:

- name: "TLS certificate management"
  tags:
    - "tls"
  block:
    - name: "Issue certificate"
      ansible.builtin.command: "certbot certonly --nginx -d {{ domain }}"
    - name: "Verify cert chain"
      ansible.builtin.command: "openssl verify {{ cert_path }}"

And you can tag a role import so a whole role only fires when asked:

- hosts: "web"
  roles:
    - role: "nginx"
      tags: ["nginx"]
    - role: "app"
      tags: ["app"]

Now --tags nginx runs only the nginx-flavored work, wherever it lives across roles and includes.

Running and skipping with —tags and —skip-tags

Two flags do the heavy lifting:

# Run only nginx-tagged tasks
ansible-playbook site.yml --tags "nginx"

# Run config work but skip the slow package installs
ansible-playbook site.yml --tags "config" --skip-tags "packages"

# Run everything EXCEPT the database tasks
ansible-playbook site.yml --skip-tags "database"

--tags is allow-listing; --skip-tags is deny-listing. They compose, and --skip-tags wins on conflict. My rule of thumb: reach for --tags when you know the small thing you want, and --skip-tags when you know the dangerous thing you want to avoid.

Pro Tip: untagged tasks do not run when you pass --tags. That is a feature, not a bug — but it bites people who expect “run nginx plus all the usual setup.” If a task must always execute, tag it always (next section) rather than hoping someone remembers to include it.

The special tags: always and never

Ansible reserves two magic tags. A task tagged always runs on every invocation unless you explicitly --skip-tags always. A task tagged never is skipped on every invocation unless you explicitly ask for it by one of its other tags.

- name: "Assert we are not on production by accident"
  ansible.builtin.assert:
    that:
      - "env != 'prod' or confirm_prod | default(false)"
  tags:
    - "always"

- name: "Wipe and re-bootstrap the node"
  ansible.builtin.command: "/usr/local/bin/rebuild-node.sh"
  tags:
    - "never"
    - "destroy"

That always-tagged safety assertion fires no matter which --tags someone passes, so your guardrails cannot be tag-skipped away by accident. The never-tagged destructive task stays dormant until someone deliberately runs --tags destroy — turning the scariest operations into explicit, opt-in actions instead of latent landmines.

Targeting hosts with —limit and patterns

Tags control what. --limit controls where. It narrows the play’s host list to a subset, and it understands Ansible’s full pattern language:

# One host
ansible-playbook site.yml --tags "nginx" --limit "web01"

# Intersection: hosts in BOTH the web group AND staging
ansible-playbook site.yml --tags "nginx" --limit "web:&staging"

# Union minus exclusion: web group except web01
ansible-playbook site.yml --limit "web:!web01"

# Re-run only the hosts that failed last time
ansible-playbook site.yml --limit "@/home/me/site.retry"

The web:&staging intersection is my favorite — it lets me say “nginx hosts, but only the staging ones” without maintaining a separate inventory group. And --limit @retry_file is the quiet hero of bad days: when a run partially fails, Ansible writes a .retry file listing the failed hosts, and @ feeds it straight back in so you fix only what broke.

Pro Tip: --limit can only ever narrow the hosts already selected by the play’s hosts: line. It cannot add a host that the play never targeted. If --limit web01 returns “specified hosts and/or —limit does not match any hosts,” your problem is the play’s hosts: clause or your inventory, not the flag.

Preview before you pull the trigger

Never run a targeted command you have not previewed. Ansible gives you dry-run flags that cost nothing:

# What tasks would the tag selection actually run?
ansible-playbook site.yml --tags "nginx,config" --list-tasks

# What tags even exist in this playbook?
ansible-playbook site.yml --list-tags

# Which hosts does my --limit resolve to?
ansible-playbook site.yml --limit "web:&staging" --list-hosts

--list-hosts is the one that has saved me from the most embarrassment. Typing it before the real run turns “I think this hits two boxes” into “this hits exactly web02 and web03, confirmed.” If the list is wrong, you find out in dry-run, not in production logs.

Then layer on --check for a real no-op rehearsal:

# Dry-run the change set against the resolved hosts
ansible-playbook site.yml \
  --tags "nginx,config" \
  --limit "web:&staging" \
  --check --diff

--check reports what would change without changing anything, and --diff shows you the literal lines that would move in each file. Combining --tags with --check is the whole game: scope tight, then rehearse. Only when the diff looks exactly right do you drop --check and run for real.

Surgical runs with —start-at-task and —step

Sometimes even a tag is too coarse — a 40-task play died on task 28 and you just want to resume there. --start-at-task jumps straight to a named task:

ansible-playbook site.yml \
  --limit "web03" \
  --start-at-task "Reload nginx"

And --step turns the whole run into an interactive prompt — Ansible asks (N)o/(y)es/(c)ontinue before each task, so you can walk a sensitive change one step at a time and bail the instant something looks off:

ansible-playbook site.yml --tags "tls" --limit "web01" --step

I use --step for first-time runs of anything touching certs or load balancers. It is slower, and that slowness is precisely the point.

Where AI fits — and where it does not

Retrofitting a clean tagging scheme onto a sprawling site.yml is tedious, mechanical, pattern-heavy work — exactly what an AI assistant is good at. I treat the model like a fast, eager junior engineer: I hand it the playbook and ask for a proposed tag taxonomy (nginx, tls, packages, database, config), with always on the safety asserts and never on the destructive tasks. Tools like Claude, Cursor, or GitHub Copilot will happily churn out the diff across dozens of tasks in seconds, and a reusable prompt from the prompt packs keeps the convention consistent across every playbook in the repo.

But a junior engineer does not get merge rights, and neither does the model. The non-negotiables:

A human reviews every tag the AI adds. A mislabeled database task hiding under a config tag is a future outage. Run the diff through your code review dashboard and read it line by line.
Always preview before the real run. --list-tags, --list-hosts, then --check --diff. The AI proposes; the dry-run disposes.
Never hand the AI the vault keys. Ansible Vault passwords, SSH keys, and --limit against prod inventory stay with the human. The model suggests structure; it does not touch secrets and it does not execute targeted runs against live hosts.

That division of labor is the whole philosophy: let AI absorb the grunt work of tagging, keep a human on the trigger.

Conclusion

The fix for a terrifying 30-minute playbook is not a smaller playbook — it is a targetable one. Tags slice the work, --limit slices the hosts, and --list-hosts/--list-tags/--check make sure you see exactly what you are about to do before you do it. Let an AI assistant do the tedious tagging retrofit, review every line it writes, and keep the vault keys and the live-run trigger firmly in human hands. Do that, and “push one nginx line to one box” becomes the thirty-second, fully-previewed operation it always should have been. More in the IaC category and the prompt library.