Triaging a Full Disk on Linux: df, du, inodes, and AI
When a Linux server runs out of disk, find the culprit fast. Hunt down space and inode exhaustion with df, du, and ncdu, and use AI to triage the output safely.
- #linux
- #disk
- #troubleshooting
- #monitoring
- #sysadmin
“No space left on device” at 2 a.m. is a rite of passage. The application is throwing write errors, the database won’t accept connections, maybe you can’t even log a line because the journal is full too. The pressure is to delete something fast — and that’s how people rm -rf a directory the running service still needs and turn a full disk into a full outage.
After enough of these, the move that consistently works is to triage methodically instead of frantically. AI fits this perfectly as a fast junior engineer: it reads the du output and spots the obvious offender quicker than I scan it, and it suggests what’s safe to remove. But it never gets to run the rm. On a full disk, the difference between deleting a stale log and deleting live data is the difference between a five-minute fix and a restore.
Confirm it’s actually space — or inodes
df answers the first question, but you have to check two things:
df -h # bytes free
df -i # inodes free
Inode exhaustion fools people constantly. df -h shows plenty of space, but df -i shows 100% used — usually from millions of tiny files (a runaway mail queue, session files, or a cache directory). The error is identical, the fix is different. Pro Tip: If df -h says you have room but writes still fail with ENOSPC, check df -i immediately. Nine times out of ten the “mystery full disk” is actually inode exhaustion in /var/spool, a session store, or a Maildir.
Hand both outputs to your assistant. I keep this prompt with my other linux admin prompts:
Here’s
df -handdf -i. Which filesystem is the problem, is this a space or inode issue, and what directories should I investigate first given a typical Ubuntu server layout?
Find the heavy directories without flailing
The classic mistake is du -sh /* from root, which crawls everything including network mounts and /proc. Be targeted:
du -xh --max-depth=1 / 2>/dev/null | sort -rh | head -20
The -x keeps it on one filesystem so you don’t wander into mounts. That one-liner gives you the top-level offenders sorted biggest-first. Drill into the worst one and repeat. For interactive hunting, ncdu is unbeatable:
sudo ncdu -x /var
It builds a navigable tree sorted by size — you walk into the biggest directory, then the biggest subdirectory, until you find the culprit. Capture the du output and let the AI summarize it: “Here’s the top 20 directories by size under /var. What’s likely safe to clean and what’s load-bearing?” The model is good at recognizing that /var/lib/docker or /var/log/journal is the offender. You still verify before deleting.
The usual suspects
Most full-disk incidents are one of a handful of culprits, and knowing them saves time:
- Logs that never rotated —
/var/logballooning because logrotate is misconfigured or a service logs to a file logrotate doesn’t know about. - The journal —
journalctl --disk-usagethensudo journalctl --vacuum-size=500Mto cap it safely. - Docker —
docker system dfthendocker system prunefor dangling images and stopped containers. - Old package caches —
apt cleanordnf clean all. - Deleted-but-open files — a process holding a deleted log; the space won’t free until you restart it.
That last one is sneaky. df shows the disk full but du can’t find the space, because a process is still holding a file you already deleted:
sudo lsof +L1 | grep deleted
Find the PID, and the fix is restarting (not killing -9) that service so it releases the handle. The incident response helper is good at turning “df and du disagree” into the lsof investigation path — feed it the symptoms and it points you at the deleted-file case.
Buy yourself headroom safely
Sometimes you just need a few hundred megabytes to get the service breathing again before you do real cleanup. The safe quick wins, in order of how confidently I’ll run them:
sudo journalctl --vacuum-size=200M # caps systemd journal
sudo apt clean # or: dnf clean all
docker system prune -f # dangling images, stopped containers
These are reversible-ish and rarely touch anything live. What I won’t do under pressure is delete from a directory I haven’t confirmed is safe. If the AI suggests removing /var/lib/something, I check what owns it (dpkg -S or rpm -qf) and whether a running process has it open before anything happens. The model drafts the cleanup; a human confirms each deletion against reality.
Prevent the next one
Once the fire’s out, the real work is making sure it doesn’t recur. Set a disk-usage alert so you find out at 70%, not 100%:
df -h --output=pcent,target | awk 'NR>1 && $1+0 > 80'
Wire that into your monitoring, or let the monitoring alerts helper draft an alert rule for filesystem usage and inode usage both. Fix logrotate properly, cap the journal in /etc/systemd/journald.conf with SystemMaxUse=, and put a system df check on your Docker hosts. I keep these prevention snippets in the prompt packs and prompts library so the post-incident hardening is a checklist, not a memory test.
Conclusion
A full disk is an emergency that rewards calm. Confirm space versus inodes, find the heavy directories with -x and ncdu, check for deleted-but-open files when df and du disagree, and reclaim headroom with the safe, reversible cleanups first. AI earns its place as a fast reader of the du output and a triage partner — but every rm stays under human control, because on a full disk the cost of deleting the wrong thing is a restore, not a retry.
Download the Free 500-Prompt DevOps AI Toolkit
500 battle-tested, copy-paste AI prompts engineered by a senior systems engineer — every one with fill-in placeholders and safety/back-out notes. Drop your email and it's yours.
- 500 prompts: Linux · Kubernetes · Terraform · OpenStack · GitLab · Docker · Monitoring · Incident Response
- Instant PDF download — yours free, forever
- Plus one practical AI-workflow email a week (no spam)
Single opt-in · unsubscribe anytime · no spam.