Skip to content
CloudOps
All prompts
AI for Linux Admins Difficulty: Intermediate ClaudeChatGPT

Linux Disk Full / Inode Exhaustion Diagnosis Prompt

Diagnose why a Linux filesystem is full or out of inodes — including deleted-but-held files, journal bloat, reserved blocks, and hidden mount-shadowed data.

Target user
Linux sysadmins and on-call engineers responding to disk-pressure alerts
Difficulty
Intermediate
Tools
Claude, ChatGPT

The prompt

You are a senior Linux sysadmin who has debugged hundreds of "disk full" pages — including the kinds where `df` and `du` disagree.

I will provide:
- Output of `df -hT` and `df -i`
- Output of `du -sh /* 2>/dev/null | sort -h` (or for a specific path)
- The OS / distro / kernel version
- Whether the alert is on `bytes used`, `inodes used`, or both
- Any application that's complaining (e.g., postgres can't write, journald can't flush)

Your job:

1. **Compare `df` vs `du`**: if they disagree by >5%, the most likely cause is **deleted-but-held files** (`lsof +L1`) or **mount shadowing** (data hidden under an active mount point).
2. **Distinguish bytes-full vs inodes-full**: many small files exhaust inodes long before bytes (mail queues, session files, image caches).
3. **Check filesystem-level overhead**:
   - ext4 reserved blocks (default 5% for root) — `tune2fs -l` shows reserved count
   - XFS metadata + log
   - btrfs unbalanced data/metadata chunks (`btrfs filesystem usage`)
   - ZFS snapshots holding "deleted" data
4. **Identify the top consumers** without recommending blind deletion. For each candidate path, explain what owns it and whether it's safe to clean.
5. **List the SAFE cleanup commands** in order: clear caches, rotate logs, trim journal, vacuum package cache. Mark anything DESTRUCTIVE.
6. **Recommend a permanent fix** (logrotate config, journal `SystemMaxUse`, monitoring, dedicated partition for `/var/log`).

Common surprises to surface:
- **`/var/log/journal` unbounded** when `SystemMaxUse=` is not set (default ~10% of FS)
- **`/var/lib/docker`** or **`/var/lib/containerd`** with unreferenced overlay layers
- **Deleted file held open by long-running process** (logrotate without `copytruncate`, or a missing SIGHUP)
- **Snap revisions** at `/var/lib/snapd/snaps/`
- **`/tmp` not being tmpfs** and filling slowly with leftover sockets/files
- **`/var/cache/apt/archives`** holding old `.deb` packages
- **Inode exhaustion** in `/var/spool/postfix` or `/var/spool/mail`
- **A bind mount or NFS mount shadowing real data** at the same path

---

Filesystem(s) under pressure: [e.g., / on /dev/nvme0n1p1]
Pressure type: [bytes / inodes / both]
Distro + kernel: [e.g., Ubuntu 22.04, 5.15.0-...]
`df -hT`:
```
[PASTE]
```
`df -i`:
```
[PASTE]
```
`du -sh /* 2>/dev/null | sort -h`:
```
[PASTE]
```
Affected application (if any):
[DESCRIBE]

Why this prompt works

“Disk full” is rarely just “too much data” — it’s almost always one of: a held-open deleted file, an unbounded log, an orphaned container layer, or a snapshot. Naive cleanup of /var/log/* doesn’t free space and sometimes breaks logging. This prompt forces the model to do the reconciliation work (df vs du, lsof +L1, journal sizing) before suggesting rm.

How to use it

  1. Always paste both df -h and df -i. Inode exhaustion looks like “plenty of space free, writes still fail.”
  2. Run du from / first to find the top-level offender, then drill down. Don’t paste a tree — paste sorted output.
  3. If df and du disagree on a mount, run sudo lsof +L1 | head -50 and paste — that’s the deleted-but-held list.
  4. Mention what triggered the alert. PostgreSQL refusing writes is a different urgency level than nagios warning at 85%.

Useful commands

# Big picture
df -hT
df -i
mount | column -t

# Find top consumers (won't cross mount points)
sudo du -shx /* 2>/dev/null | sort -h | tail -20
sudo du -shx /var/* 2>/dev/null | sort -h | tail -20
sudo du -shx /var/log/* 2>/dev/null | sort -h | tail -20

# Deleted-but-held files
sudo lsof +L1 | head -50

# Inode hogs
sudo find / -xdev -type f 2>/dev/null | awk -F/ '{print $2"/"$3}' | sort | uniq -c | sort -n | tail

# Systemd journal
sudo journalctl --disk-usage
sudo journalctl --vacuum-size=500M     # trim to 500M
sudo journalctl --vacuum-time=7d        # keep last 7 days only

# Docker / containerd
sudo docker system df
sudo docker system prune -af --volumes  # DESTRUCTIVE
sudo crictl rmi --prune                  # for containerd

# Package caches
sudo apt-get clean                       # Debian/Ubuntu
sudo dnf clean all                       # RHEL/Fedora
sudo journalctl --rotate

# Filesystem-specific
sudo tune2fs -l /dev/sda1 | grep -i reserved   # ext4
sudo btrfs filesystem usage /                  # btrfs
sudo zfs list -o name,used,available,refer,usedsnap   # ZFS

# Mount shadowing — check if /var/lib has data under it that's mount-shadowed
sudo systemctl stop <service-using-mount>
sudo umount /mnt/whatever  # then check du again

Common findings this catches

  • du says 12 GB, df says 80 GB usedlsof +L1 lists 68 GB of deleted-but-held log files; restart logrotate’s victim service.
  • / 100% full, /var/log/journal is 9 GB → no SystemMaxUse= in journald.conf. Vacuum + set to 1G.
  • /var/lib/docker is 60 GB → dangling images and unreferenced volumes; docker system prune -a --volumes.
  • Inodes 100% on /var → millions of files in /var/spool/postfix/maildrop or /var/spool/exim/input from a runaway sender.
  • /tmp is not tmpfs and is full → user processes left dotfiles after crashes; mount tmpfs in fstab.
  • ZFS pool 95% full despite recent rms → snapshots holding the deletes; zfs list -t snapshot.

Permanent fixes worth applying after recovery

# /etc/systemd/journald.conf
[Journal]
SystemMaxUse=1G
SystemMaxFileSize=100M
SystemKeepFree=2G
# /etc/logrotate.d/myapp — sample
/var/log/myapp/*.log {
    weekly
    rotate 4
    compress
    delaycompress
    missingok
    notifempty
    copytruncate   # do NOT use if app reopens on SIGHUP
}

When to escalate

  • ZFS / btrfs balance operations on production storage — coordinate with storage team.
  • xfs_growfs / resize2fs after partition resize — confirm the partition table change took effect first.
  • Anything that involves --vacuum-time=0 on the only log source — archive first or you’ll lose forensic data.

Related prompts

Newsletter

Get weekly AI workflows for DevOps engineers

Practical prompts, automation ideas, and tool reviews for infrastructure engineers. One email per week. No spam.