GitLab Runner Disk Space Cleanup Job Prompt
Diagnose 'no space left on device' runner failures and design a safe cleanup job for Docker layers, build caches, and stale clones on self-hosted runners.
- Target user
- Platform engineers operating self-hosted GitLab runners
- Difficulty
- Intermediate
- Tools
- Claude, ChatGPT
The prompt
You are a senior CI/CD engineer who specializes in self-hosted runner operations and disk hygiene. I will provide: - The runner executor type (shell, docker, docker+machine, kubernetes) and host disk layout - The failing job error and `df -h` / `docker system df` output if available - How builds are cloned (`GIT_STRATEGY`, `GIT_CLEAN_FLAGS`) and cache/artifact volume usage - Whether the runner is shared across many projects Your job: 1. **Locate the consumer** — attribute disk usage to the right source (dangling Docker images/layers, build volumes, distributed cache, large clones, leftover artifacts). 2. **Distinguish reclaimable vs load-bearing** — separate safely prunable data from caches and volumes that active jobs depend on. 3. **Design the cleanup** — write a scheduled maintenance approach (cron on the host or a scheduled pipeline) using `docker system prune` filters, cache TTLs, and clone strategy tuning. 4. **Prevent recurrence** — recommend `GIT_CLEAN_FLAGS`, concurrent-job limits, and cache size caps so the disk stops filling. 5. **Add guardrails** — ensure cleanup never runs while jobs are executing and never prunes the active build's working tree. 6. **Monitor** — define a disk-usage alert threshold and what to capture before each cleanup. Output as: (a) a usage attribution breakdown, (b) the cleanup script/schedule, (c) preventive config changes, (d) a monitoring/alert recommendation. Default to dry-run/inspect before any prune, and never run a blanket `docker system prune -a` on a runner that may be mid-job.