Skip to content
DevOps AI ToolKit
Newsletter
All guides
AI for GitLab CI/CD By James Joyner IV · · 9 min read

GitLab CI Error Guide: 'no space left on device' Runner Disk Exhaustion

Fix GitLab CI 'no space left on device' errors: prune Docker images and volumes, clear runner build cache and artifacts, free /tmp, and resolve inode exhaustion.

  • #gitlab-cicd
  • #troubleshooting
  • #errors
  • #disk

Overview

When a GitLab job fails with no space left on device, the runner host (or the Docker daemon backing it) has run out of disk. The job was executing fine until something — a clone, a npm install, a docker build, an artifact upload — tried to write a byte and the filesystem refused. Unlike a flaky network error, this one is deterministic: every job on that runner will keep failing the same way until you free space, because the disk does not heal itself between pipelines.

The message shows up wherever the write happened, so the exact path is a clue about which step overran the disk:

ERROR: write /builds/group/app/node_modules/.cache/webpack/0.pack: no space left on device
ERROR: Job failed: exit code 1

A second, scarier variant appears once the disk is so full that the Docker daemon itself can no longer write its own state — at that point even cleanup commands start failing:

Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
ERROR: Job failed (system failure): Error response from daemon: failed to create task: no space left on device

The error code is fixed (no space left on device, ENOSPC); the cause varies. It is almost always an accumulation problem on a long-lived runner — old Docker layers, stale build directories, unbounded cache — not a bug in your .gitlab-ci.yml logic.

Symptoms

  • Jobs that used to pass start failing mid-run with no space left on device, often during clone, dependency install, docker build, or artifact upload.
  • Every job on one specific runner fails the same way while jobs on other runners are fine.
  • The Docker-based jobs report Cannot connect to the Docker daemon after the disk hits 100%.
  • df -h shows a filesystem (usually / or /var/lib/docker) at 100% Used, or df -i shows Inodes at 100% even though space looks free.
  • Artifact or cache upload fails with tar: Wrote only N of M bytes: No space left on device.
df -h
Filesystem      Size  Used Avail Use% Mounted on
/dev/root        49G   49G     0 100% /
/dev/nvme1n1    197G  131G   56G  71% /var/lib/docker
tmpfs           7.8G  1.2M  7.8G   1% /run
df -i
Filesystem      Inodes   IUsed   IFree IUse% Mounted on
/dev/root      3276800 3276800       0  100% /
/dev/nvme1n1  13107200 4102233 9004967   32% /var/lib/docker

The first output shows the root filesystem full while the data partition has room — a small-partition problem. The second shows free space but 100% inodes — exhaustion by millions of tiny files, not bytes.

Common Root Causes

1. /var/lib/docker filled by accumulated images, containers, and build cache

A long-lived docker or docker-machine runner never garbage-collects on its own. Every pulled image, exited container, anonymous volume, and BuildKit cache layer piles up under /var/lib/docker until it consumes the whole partition.

docker system df
TYPE            TOTAL     ACTIVE    SIZE      RECLAIMABLE
Images          412       6         88.4GB    81.2GB (91%)
Containers      230       2         3.1GB     3.0GB (98%)
Local Volumes   148       3         24.6GB    23.9GB (97%)
Build Cache     1971      0         41.8GB    41.8GB (100%)
ERROR: failed to register layer: write /var/lib/docker/overlay2/.../diff/...: no space left on device
ERROR: Job failed: exit code 1

RECLAIMABLE near 100% across images, volumes, and build cache is the signature. Almost all of that disk is dead weight from past jobs.

2. The runner’s builds_dir and git clones are never cleaned between jobs

The runner clones the repo into builds_dir (default /builds inside the container, or under the runner’s working dir for the shell executor). If the executor reuses the host or the build dir is mounted from the host, stale clones, leftover node_modules, and partial checkouts from old jobs accumulate.

du -sh /home/gitlab-runner/builds/* 2>/dev/null | sort -h | tail
1.2G    /home/gitlab-runner/builds/abc123/0/group/app
3.8G    /home/gitlab-runner/builds/abc123/0/group/data-service
11.4G   /home/gitlab-runner/builds/abc123/0/group/monolith
fatal: write error: No space left on device
fatal: unable to write file builds/group/monolith/...: No space left on device
ERROR: Job failed: exit code 128

The clone step itself fails (git exit 128) because the build directory partition is full of old checkouts that were never reclaimed.

3. A single job writes huge artifacts/cache or downloads a massive dataset

One greedy job can fill the disk on its own — pulling a multi-gigabyte dataset to /tmp, generating an enormous coverage report, or declaring an artifacts:/cache: path that sweeps in node_modules or build output.

# .gitlab-ci.yml — this quietly tars the entire workspace every run
test:
  script:
    - curl -o /tmp/dataset.tar.gz https://data.example.com/full-dump.tar.gz
    - tar xzf /tmp/dataset.tar.gz -C /tmp
  artifacts:
    paths:
      - ./        # grabs node_modules, build output, the dataset — everything
Uploading artifacts...
./: found 284113 matching artifact files and directories
tar: /builds/group/app/.tmp-artifacts: Wrote only 4096 of 10240 bytes: No space left on device
ERROR: Uploading artifacts as "archive" to coordinator... failed

The tar step that builds the artifact archive runs out of disk because the artifact path scoops up gigabytes it never needed to.

4. Docker build layer cache / BuildKit cache grows unbounded

Jobs that run docker build (or docker buildx) on a persistent daemon accumulate BuildKit cache mounts and intermediate layers. Without pruning, the build cache alone can dwarf your actual images.

docker buildx du
Reclaimable:    41.8GB
Total:          43.0GB

ID           RECLAIMABLE   SIZE      LAST ACCESSED
3k2j...      true          9.1GB     6 days ago
9fa1...      true          7.4GB     4 days ago
...
ERROR: failed to solve: failed to compute cache key: write /var/lib/docker/.../snapshots/...: no space left on device
ERROR: Job failed: exit code 1

docker system df undercounts BuildKit; docker buildx du shows the real cache size. Reclaimable in the tens of gigabytes means an unbounded build cache.

5. Inode exhaustion — space free, but df -i is 100%

A filesystem can run out of inodes before it runs out of bytes. Projects with massive node_modules trees or generated artifacts create millions of tiny files; once inodes hit 100%, writes fail with the same ENOSPC even though df -h shows free space.

df -i /builds
sudo find /home/gitlab-runner/builds -xdev -type f | wc -l
Filesystem      Inodes   IUsed  IFree IUse% Mounted on
/dev/root      3276800 3276800     0  100% /
2841190
ERROR: write /builds/group/app/node_modules/.pnpm/.../index.js: no space left on device
npm ERR! nospc ENOSPC: no space left on device, write

df -h will look healthy while df -i is pegged at 100%. The fix is deleting file count, not file size — usually stale node_modules and old build dirs.

6. dind storage or a small partition fills while the data disk has room

With Docker-in-Docker (docker:dind), the inner daemon writes to its own ephemeral storage; if it is not given a roomy volume it fills fast. Equally common: the root partition (or a tiny /tmp) hits 100% while the big data partition still has space, because something wrote to the wrong path.

# inside or about the dind service
df -h /var/lib/docker /tmp /
Filesystem      Size  Used Avail Use% Mounted on
overlay          10G   10G     0 100% /var/lib/docker   # dind's tiny default
tmpfs           2.0G  2.0G     0 100% /tmp
/dev/nvme1n1    197G   38G  159G  20% /
ERROR: failed to register layer: ApplyLayer ... no space left on device
ERROR: Job failed (system failure): preparing environment: ... no space left on device

The data partition is 20% used but dind’s 10G overlay and /tmp are full. The job dies even though the host “has plenty of disk.”

Diagnostic Workflow

Step 1: Confirm which filesystem is full — bytes or inodes

df -h
df -i

df -h finds the full byte filesystem; df -i catches inode exhaustion when df -h looks fine. Note which mount is at 100% — /, /var/lib/docker, /tmp, or the builds partition — because that decides the rest of the workflow.

Step 2: Find the biggest offender on the full filesystem

sudo du -xh / 2>/dev/null | sort -h | tail -20
sudo du -sh /var/lib/docker /home/gitlab-runner/builds /tmp 2>/dev/null

du -x stays on one filesystem so you do not chase mounted volumes. This tells you whether Docker, stale build directories, or a runaway /tmp download is eating the disk.

Step 3: Inspect Docker’s accumulated state

docker system df
docker buildx du

If RECLAIMABLE is high across images, containers, volumes, and build cache, the runner has simply never been pruned (Root Cause 1 and 4). buildx du reveals BuildKit cache that system df underreports.

Step 4: Check runner build directories and config

du -sh /home/gitlab-runner/builds/* 2>/dev/null | sort -h | tail
grep -E 'builds_dir|cache_dir|disable_cache' /etc/gitlab-runner/config.toml

Old clones in builds_dir (Root Cause 2) and a host-mounted, never-cleared cache_dir are common culprits. Note whether [runners.docker] disable_cache is set.

Step 5: Reclaim space and re-run

# Reclaim Docker space (drop the volumes flag if you keep named volumes)
docker system prune -af --volumes
docker buildx prune -af
# Clear stale runner clones and caches
sudo rm -rf /home/gitlab-runner/builds/* /home/gitlab-runner/cache/*

After freeing space, confirm with df -h / df -i, then retry the pipeline. If it recurs within days, the real fix is automated cleanup (see Prevention), not another manual prune.

Example Root Cause Analysis

Every job on the docker-1 runner has started failing during docker build. Other runners are fine; this one has been live for months without a rebuild.

The job log points at the daemon’s own storage:

ERROR: failed to register layer: write /var/lib/docker/overlay2/2f.../diff/usr/lib/...: no space left on device
ERROR: Job failed: exit code 1

Check the disk on the runner host:

df -h
df -i
Filesystem      Size  Used Avail Use% Mounted on
/dev/nvme1n1    197G  197G     0 100% /var/lib/docker

Filesystem       Inodes   IUsed   IFree IUse% Mounted on
/dev/nvme1n1   13107200 4102233 9004967   32% /var/lib/docker

Bytes are at 100% but inodes are fine, so this is accumulated data, not tiny files. Ask Docker what is reclaimable:

docker system df
docker buildx du
TYPE            TOTAL   ACTIVE   SIZE      RECLAIMABLE
Images          412     6        88.4GB    81.2GB (91%)
Build Cache     1971    0        41.8GB    41.8GB (100%)
Local Volumes   148     3        24.6GB    23.9GB (97%)

Reclaimable:    41.8GB / 43.0GB

Over 140 GB of the 197 GB partition is dead images, exited containers, orphaned volumes, and BuildKit cache from months of pipelines. Nothing was ever pruned.

Fix: prune Docker state, prune the build cache explicitly, and verify the partition recovers.

docker system prune -af --volumes
docker buildx prune -af
df -h /var/lib/docker
Total reclaimed space: 143.7GB
Filesystem      Size  Used Avail Use% Mounted on
/dev/nvme1n1    197G   54G  143G  27% /var/lib/docker

The partition drops from 100% to 27%, the next pipeline builds cleanly, and a daily prune timer (below) keeps it from creeping back to full.

Prevention Best Practices

  • Schedule a cleanup cron or systemd timer on every long-lived runner: a nightly docker system prune -af --volumes && docker buildx prune -af keeps /var/lib/docker from creeping to 100%.
  • Set a sane builds_dir/cache_dir retention story — clear stale clones between jobs, and avoid host-mounting cache_dir unless you actively manage its size.
  • Scope artifacts: and cache: paths tightly so you never tar node_modules or build output into the archive; exclude generated trees explicitly.
  • Put /var/lib/docker (and /builds) on a dedicated, generously sized partition, and give dind a roomy volume instead of its tiny default overlay so a small root or /tmp never trips the whole runner.
  • Alert on disk early — page on df -h/df -i crossing ~80% rather than waiting for the first failed job, since inode exhaustion (df -i) is easy to miss.
  • For fast triage when a runner-disk failure storm hits, the free incident assistant can read the no space left on device log and point at the likely offender. More pipeline fixes live in the GitLab CI/CD guides.

Quick Command Reference

# Which filesystem is full — bytes vs inodes?
df -h
df -i

# Find the biggest offender (stay on one filesystem)
sudo du -xh / 2>/dev/null | sort -h | tail -20
sudo du -sh /var/lib/docker /home/gitlab-runner/builds /tmp 2>/dev/null

# What is Docker holding that it could reclaim?
docker system df
docker buildx du

# Reclaim Docker space (images, containers, volumes, build cache)
docker system prune -af --volumes
docker buildx prune -af

# Clear stale runner clones and caches
sudo rm -rf /home/gitlab-runner/builds/* /home/gitlab-runner/cache/*

# Inspect runner build/cache config
grep -E 'builds_dir|cache_dir|disable_cache' /etc/gitlab-runner/config.toml

# Nightly cleanup timer payload (cron example)
# 0 3 * * * docker system prune -af --volumes && docker buildx prune -af

Conclusion

A no space left on device failure is the runner telling you a filesystem it needs to write to is out of space — or out of inodes. The usual root causes:

  1. /var/lib/docker filled by accumulated images, containers, volumes, and build cache on a long-lived runner.
  2. The runner’s builds_dir and old git clones never cleaned between jobs.
  3. A single job writing huge artifacts/cache or downloading a massive dataset to /tmp or the build dir.
  4. An unbounded Docker/BuildKit build-layer cache.
  5. Inode exhaustion — df -h shows space free, but df -i is at 100% from millions of small files.
  6. dind storage or a small / or /tmp partition filling while the data disk still has room.

Run df -h and df -i first to learn whether you are out of bytes or inodes, find the offender with du and docker system df, and the durable fix is almost always a scheduled prune plus tighter artifact and build-directory hygiene so the disk never creeps back to full.

Free download · 368-page PDF

Download the Free 500-Prompt DevOps AI Toolkit

500 battle-tested, copy-paste AI prompts engineered by a senior systems engineer — every one with fill-in placeholders and safety/back-out notes. Drop your email and it's yours.

  • 500 prompts: Linux · Kubernetes · Terraform · OpenStack · GitLab · Docker · Monitoring · Incident Response
  • Instant PDF download — yours free, forever
  • Plus one practical AI-workflow email a week (no spam)

Single opt-in · unsubscribe anytime · no spam.