Ansible Execution Environments and Collections Done Right

Here’s a failure I’ve watched happen more times than I’d like to admit: a playbook runs perfectly on the engineer’s laptop, then fails in CI because the CI runner has a different version of the kubernetes Python library, or a missing collection, or an Ansible-core version that deprecated a module the playbook relies on. The playbook didn’t change. The environment did.

Ansible’s answer to this is execution environments (EEs) — container images that bundle Ansible, your collections, and their Python dependencies into one reproducible, versioned artifact. Combined with disciplined collection management, they kill the “works on my machine” class of bug. Here’s how to do it without overcomplicating things.

Collections: the dependency layer everyone ignores

Since Ansible 2.10, the monolithic Ansible package split into ansible-core plus collections — namespaced bundles of modules, plugins, and roles like amazon.aws, community.general, or kubernetes.core. Most teams install whatever’s lying around and never pin versions. That’s the root of half the reproducibility problems.

Declare collections explicitly in a requirements.yml:

# requirements.yml
collections:
  - name: amazon.aws
    version: ">=7.0.0,<8.0.0"
  - name: kubernetes.core
    version: "==3.0.1"
  - name: community.general
    version: ">=8.0.0"

Pin to ranges that allow patch updates but block surprise major bumps. Then ansible-galaxy collection install -r requirements.yml gives you the same set everywhere — assuming the Python deps line up, which is exactly what EEs guarantee.

Building an execution environment

You define an EE declaratively and build it with ansible-builder, which produces a standard OCI image you can run anywhere containers run.

# execution-environment.yml
version: 3
images:
  base_image:
    name: quay.io/ansible/awx-ee:latest
dependencies:
  ansible_core:
    package_pip: ansible-core==2.16.4
  galaxy: requirements.yml
  python:
    - boto3>=1.34.0
    - kubernetes>=29.0.0
  system:
    - git
    - openssh-clients
additional_build_steps:
  append_final:
    - RUN echo "Built $(date)" > /etc/ee-build-info

Build it:

ansible-builder build \
  --tag registry.example.com/ee/cloud-ops:1.4.0 \
  --file execution-environment.yml

Now everything — ansible-core, collections, and their Python dependencies (boto3, kubernetes) — is frozen into cloud-ops:1.4.0. That tag runs identically on a laptop, in CI, and in AWX.

Running playbooks inside the EE

Use ansible-navigator, the runner built for EEs:

ansible-navigator run site.yml \
  --execution-environment-image registry.example.com/ee/cloud-ops:1.4.0 \
  --mode stdout \
  --inventory inventory/aws_ec2.yml

The playbook executes inside the container. Your laptop’s local Python, your colleague’s weird pip state, the CI runner’s base image — none of it matters. The EE is the runtime. This is the single biggest reliability win available to an Ansible team, and most haven’t adopted it.

Versioning strategy that scales

Treat EEs like any other release artifact:

Semantic tags, never latest in prod. cloud-ops:1.4.0, not cloud-ops:latest. Pin AWX job templates and CI pipelines to exact tags so a rebuild can’t silently change behavior under a running pipeline.
One EE per concern, not per team. A cloud-ops EE with AWS + Kubernetes collections, a network EE with cisco.ios and friends. Resist the urge to build one mega-image with everything.
Rebuild on a cadence and on CVEs. A weekly scheduled rebuild picks up base-image security patches. Scan the image and rebuild when something critical lands.

Wiring it into CI

The EE makes CI trivial because the runner just needs Docker and a pull:

# .gitlab-ci.yml excerpt
lint:
  image: registry.example.com/ee/cloud-ops:1.4.0
  script:
    - ansible-lint site.yml

deploy:
  image: registry.example.com/ee/cloud-ops:1.4.0
  script:
    - ansible-playbook site.yml -i inventory/aws_ec2.yml
  rules:
    - if: $CI_COMMIT_BRANCH == "main"

No pip install step, no collection install step, no version drift between lint and deploy. The image is the environment.

Where AI helps

The fiddly part of EEs is getting the dependency manifest right — which Python libraries a given collection actually needs, what system packages to add, how to pin compatible ranges. I lean on an assistant to draft the execution-environment.yml and requirements.yml from a description of what the playbooks do, and to debug build failures by reading ansible-builder’s output. Keep a few reusable Ansible prompts for generating these manifests, then verify the build actually succeeds before tagging.

Common mistakes

Building on latest base images so your “reproducible” artifact isn’t reproducible. Pin the base image digest for true determinism.
Skipping requirements.txt for Python deps and assuming the collection pulls them. It usually does, but pin the important ones explicitly.
Forgetting system packages. A collection that shells out to git or ssh fails cryptically if the binary isn’t in the image. The system: block fixes that.
Letting the image bloat. Use a minimal base and only the collections you need. A 4GB EE is a slow pull on every CI run.

The bottom line

Collections give you a declared dependency set. Execution environments freeze that set — plus Ansible itself and the Python runtime — into a single versioned image that runs the same everywhere. Together they convert Ansible from “works on my machine” to a genuinely reproducible automation platform.

If you’re running AWX, EEs are mandatory anyway — that’s how AWX executes jobs. If you’re not, adopt ansible-navigator and an EE for your most-run playbook first, and watch the flaky-CI tickets disappear. More on building reliable automation in our Infrastructure as Code category.

Execution environment manifests and generated dependency lists are assistive. Always build and test the image against a non-production inventory before promoting it.