Linux Disk Full / Inode Exhaustion Diagnosis Prompt
Diagnose why a Linux filesystem is full or out of inodes — including deleted-but-held files, journal bloat, reserved blocks, and hidden mount-shadowed data.
- Target user
- Linux sysadmins and on-call engineers responding to disk-pressure alerts
- Difficulty
- Intermediate
- Tools
- Claude, ChatGPT
The prompt
You are a senior Linux sysadmin who has debugged hundreds of "disk full" pages — including the kinds where `df` and `du` disagree. I will provide: - Output of `df -hT` and `df -i` - Output of `du -sh /* 2>/dev/null | sort -h` (or for a specific path) - The OS / distro / kernel version - Whether the alert is on `bytes used`, `inodes used`, or both - Any application that's complaining (e.g., postgres can't write, journald can't flush) Your job: 1. **Compare `df` vs `du`**: if they disagree by >5%, the most likely cause is **deleted-but-held files** (`lsof +L1`) or **mount shadowing** (data hidden under an active mount point). 2. **Distinguish bytes-full vs inodes-full**: many small files exhaust inodes long before bytes (mail queues, session files, image caches). 3. **Check filesystem-level overhead**: - ext4 reserved blocks (default 5% for root) — `tune2fs -l` shows reserved count - XFS metadata + log - btrfs unbalanced data/metadata chunks (`btrfs filesystem usage`) - ZFS snapshots holding "deleted" data 4. **Identify the top consumers** without recommending blind deletion. For each candidate path, explain what owns it and whether it's safe to clean. 5. **List the SAFE cleanup commands** in order: clear caches, rotate logs, trim journal, vacuum package cache. Mark anything DESTRUCTIVE. 6. **Recommend a permanent fix** (logrotate config, journal `SystemMaxUse`, monitoring, dedicated partition for `/var/log`). Common surprises to surface: - **`/var/log/journal` unbounded** when `SystemMaxUse=` is not set (default ~10% of FS) - **`/var/lib/docker`** or **`/var/lib/containerd`** with unreferenced overlay layers - **Deleted file held open by long-running process** (logrotate without `copytruncate`, or a missing SIGHUP) - **Snap revisions** at `/var/lib/snapd/snaps/` - **`/tmp` not being tmpfs** and filling slowly with leftover sockets/files - **`/var/cache/apt/archives`** holding old `.deb` packages - **Inode exhaustion** in `/var/spool/postfix` or `/var/spool/mail` - **A bind mount or NFS mount shadowing real data** at the same path --- Filesystem(s) under pressure: [e.g., / on /dev/nvme0n1p1] Pressure type: [bytes / inodes / both] Distro + kernel: [e.g., Ubuntu 22.04, 5.15.0-...] `df -hT`: ``` [PASTE] ``` `df -i`: ``` [PASTE] ``` `du -sh /* 2>/dev/null | sort -h`: ``` [PASTE] ``` Affected application (if any): [DESCRIBE]
Why this prompt works
“Disk full” is rarely just “too much data” — it’s almost always one of: a held-open deleted file, an unbounded log, an orphaned container layer, or a snapshot. Naive cleanup of /var/log/* doesn’t free space and sometimes breaks logging. This prompt forces the model to do the reconciliation work (df vs du, lsof +L1, journal sizing) before suggesting rm.
How to use it
- Always paste both
df -handdf -i. Inode exhaustion looks like “plenty of space free, writes still fail.” - Run
dufrom/first to find the top-level offender, then drill down. Don’t paste a tree — paste sorted output. - If
dfanddudisagree on a mount, runsudo lsof +L1 | head -50and paste — that’s the deleted-but-held list. - Mention what triggered the alert. PostgreSQL refusing writes is a different urgency level than nagios warning at 85%.
Useful commands
# Big picture
df -hT
df -i
mount | column -t
# Find top consumers (won't cross mount points)
sudo du -shx /* 2>/dev/null | sort -h | tail -20
sudo du -shx /var/* 2>/dev/null | sort -h | tail -20
sudo du -shx /var/log/* 2>/dev/null | sort -h | tail -20
# Deleted-but-held files
sudo lsof +L1 | head -50
# Inode hogs
sudo find / -xdev -type f 2>/dev/null | awk -F/ '{print $2"/"$3}' | sort | uniq -c | sort -n | tail
# Systemd journal
sudo journalctl --disk-usage
sudo journalctl --vacuum-size=500M # trim to 500M
sudo journalctl --vacuum-time=7d # keep last 7 days only
# Docker / containerd
sudo docker system df
sudo docker system prune -af --volumes # DESTRUCTIVE
sudo crictl rmi --prune # for containerd
# Package caches
sudo apt-get clean # Debian/Ubuntu
sudo dnf clean all # RHEL/Fedora
sudo journalctl --rotate
# Filesystem-specific
sudo tune2fs -l /dev/sda1 | grep -i reserved # ext4
sudo btrfs filesystem usage / # btrfs
sudo zfs list -o name,used,available,refer,usedsnap # ZFS
# Mount shadowing — check if /var/lib has data under it that's mount-shadowed
sudo systemctl stop <service-using-mount>
sudo umount /mnt/whatever # then check du again
Common findings this catches
dusays 12 GB,dfsays 80 GB used →lsof +L1lists 68 GB of deleted-but-held log files; restart logrotate’s victim service./100% full,/var/log/journalis 9 GB → noSystemMaxUse=injournald.conf. Vacuum + set to 1G./var/lib/dockeris 60 GB → dangling images and unreferenced volumes;docker system prune -a --volumes.- Inodes 100% on
/var→ millions of files in/var/spool/postfix/maildropor/var/spool/exim/inputfrom a runaway sender. /tmpis not tmpfs and is full → user processes left dotfiles after crashes; mount tmpfs in fstab.- ZFS pool 95% full despite recent
rms → snapshots holding the deletes;zfs list -t snapshot.
Permanent fixes worth applying after recovery
# /etc/systemd/journald.conf
[Journal]
SystemMaxUse=1G
SystemMaxFileSize=100M
SystemKeepFree=2G
# /etc/logrotate.d/myapp — sample
/var/log/myapp/*.log {
weekly
rotate 4
compress
delaycompress
missingok
notifempty
copytruncate # do NOT use if app reopens on SIGHUP
}
When to escalate
- ZFS / btrfs balance operations on production storage — coordinate with storage team.
xfs_growfs/resize2fsafter partition resize — confirm the partition table change took effect first.- Anything that involves
--vacuum-time=0on the only log source — archive first or you’ll lose forensic data.
Related prompts
-
Linux OOM Kill & Memory Pressure Investigation Prompt
Diagnose OOM kills, memory pressure, swap thrashing, slab bloat, and cgroup memory limit failures on Linux servers from dmesg OOM banners and /proc data.
-
Linux Server Troubleshooting Prompt
Help diagnose CPU, memory, disk, network, and service issues on Ubuntu or RHEL servers from raw command output.
-
systemd Unit Failure Debugging Prompt
Diagnose systemd unit failures — dependency cycles, mount/target failures, exit codes, journalctl filtering, drop-in overrides, and silent service flapping.