Skip to content
DevOps AI ToolKit
Newsletter
All guides
AI for GitLab CI/CD By James Joyner IV · · 9 min read

GitLab CI Error Guide: 'ERROR: Failed to remove network for build' Docker Cleanup

Fix GitLab Runner's 'Failed to remove network for build' on the Docker executor: clear leaked per-build networks, endpoints, and stale containers blocking teardown.

  • #gitlab-cicd
  • #troubleshooting
  • #errors
  • #docker

Exact Error Message

A job finishes (or fails) and the runner errors while tearing down the Docker environment it created for the build:

Cleaning up project directory and file based variables
ERROR: Failed to remove network for build
ERROR: Error cleaning up containers: failed to remove network runner-abc123-project-42-concurrent-0:
  Error response from daemon: error while removing network:
  network runner-abc123 id 9f2c... has active endpoints
WARNING: Failed to process runner

You may also see it surface as a job that fails to start because the leaked network name still exists:

ERROR: Job failed (system failure): Error response from daemon:
  network with name runner-abc123-project-42-concurrent-0 already exists

What the Error Means

With the Docker executor, GitLab Runner creates a per-build network so the job container, service containers (like postgres or dind), and the helper can talk to each other. At the end of the build it removes that network. The error means the removal failed — almost always because the network still has active endpoints: containers (or leftover veth endpoints) are still attached, so Docker refuses to delete it.

The root issue is a teardown that did not complete cleanly: a container outlived its build, the Docker daemon hiccupped, or a previous interrupted job leaked the network. The next job then either fails to clean up or collides with the stale network name.

Common Causes

  1. Leaked containers from a killed job still attached to the per-build network.
  2. A dind (docker-in-docker) service that spawned containers the runner does not track, holding endpoints open.
  3. Daemon overload or restart mid-build, leaving the network in an inconsistent state.
  4. Two runners sharing one Docker daemon with overlapping concurrent indexes colliding on network names.
  5. Stale networks accumulating because cleanup repeatedly failed, eventually exhausting the bridge address pool too.

How to Reproduce the Error

Run jobs that leak a child container outliving the build (common with dind), or force-kill a job mid-run. A reliable trigger is a service container that keeps running after the job ends:

leaky:
  image: docker:27
  services:
    - docker:27-dind
  script:
    - docker run -d --name orphan alpine:3.20 sleep 600
ERROR: Failed to remove network for build
  network runner-...-concurrent-0 has active endpoints

The detached orphan container stays attached to the build network and blocks its removal.

Diagnostic Commands

All read-only — run on the host where the Docker daemon lives:

# List runner-created networks (these should be transient)
docker network ls --filter name=runner

# Inspect a stuck network to see what's still attached
docker network inspect runner-abc123-project-42-concurrent-0

# Find containers still attached to that network
docker ps -a --filter network=runner-abc123-project-42-concurrent-0

# Check daemon health and recent errors
systemctl status docker
journalctl -u docker --since "30 min ago" | grep -i "active endpoints"
NETWORK ID     NAME                                          DRIVER
9f2c1a...      runner-abc123-project-42-concurrent-0         bridge
"Containers": { "orphan": { "Name": "orphan" ... } }

Seeing a leftover container under "Containers" in network inspect confirms why removal fails: those are the active endpoints.

Step-by-Step Resolution

1. Remove the blocking containers, then the network

Detach or delete the leaked containers so the network has no endpoints, then prune it:

docker rm -f orphan
docker network rm runner-abc123-project-42-concurrent-0

For a broad cleanup of stale runner networks (only when no jobs are running):

docker network prune -f
docker container prune -f

2. Stop leaking containers from the job

If your job spawns containers (especially via dind), clean them up in an after_script so nothing outlives the build:

after_script:
  - docker ps -aq | xargs -r docker rm -f

after_script runs even when the main script fails, so leaked containers get cleared before the runner attempts network teardown.

3. Avoid network-name collisions across runners

If multiple runners share one Docker daemon, give each a distinct name and keep concurrent indexes from overlapping. In /etc/gitlab-runner/config.toml:

concurrent = 4
[[runners]]
  name = "runner-a"
  executor = "docker"

Distinct runner names keep the generated runner-<token>-... network names unique. Where possible, give each runner its own Docker daemon.

4. Recover a wedged daemon

If network inspect shows no containers but removal still fails, the daemon’s network state is stale. Restart it during a maintenance window:

sudo systemctl restart docker
sudo systemctl restart gitlab-runner

A restart releases dangling veth endpoints the daemon failed to clean.

5. Re-run a clean job

After clearing leaked containers and stale networks, re-run the job; teardown completes silently with no Failed to remove network line.

Prevention and Best Practices

  • Always add an after_script that force-removes containers your job spawns, especially when using docker:dind.
  • Give each runner its own Docker daemon, or at least unique runner names, to avoid per-build network collisions.
  • Schedule a periodic docker network prune / docker container prune on the host (when idle) to mop up any leaks before they accumulate.
  • Watch journalctl -u docker for active endpoints warnings as an early signal that cleanup is failing.
  • Pasting the teardown log into the free incident assistant helps separate a daemon problem from a job that leaks containers. More patterns are in the GitLab CI/CD guides.
  • network ... already exists — the collision counterpart: a leaked network blocks the next job from creating its own.
  • Cannot connect to the Docker daemon — a dind connectivity failure that often coexists with leaked networks; that guide covers the socket/TLS side.
  • failed to allocate gateway / no available addresses — the bridge pool is exhausted because stale networks were never removed; prune them.

Frequently Asked Questions

Does this error fail my pipeline?

It is a cleanup failure, logged as a runner system error. The job result is usually already recorded, but the leaked network can break the next job with a name collision. Treat it as a real problem even if one job appears green.

Why does docker-in-docker make this worse?

Containers your job starts inside dind are not tracked by the runner, so they can stay attached to the build network after the job ends, holding endpoints open. Add an after_script that removes all containers to prevent the leak.

Can I just prune networks on a schedule and ignore it?

Pruning is a good safety net, but only run it when no jobs are active — pruning a live build’s network breaks that job. Fix the leak source (after_script cleanup) so pruning is rarely needed.

Two runners share a Docker host. Is that the problem?

It can be. Overlapping concurrent indexes plus a shared daemon can collide on the generated network names. Use unique runner names, separate daemons, or non-overlapping concurrency to keep network names distinct.

Free download · 368-page PDF

Download the Free 500-Prompt DevOps AI Toolkit

500 battle-tested, copy-paste AI prompts engineered by a senior systems engineer — every one with fill-in placeholders and safety/back-out notes. Drop your email and it's yours.

  • 500 prompts: Linux · Kubernetes · Terraform · OpenStack · GitLab · Docker · Monitoring · Incident Response
  • Instant PDF download — yours free, forever
  • Plus one practical AI-workflow email a week (no spam)

Single opt-in · unsubscribe anytime · no spam.