XFS Filesystem Troubleshooting Prompt
Diagnose and repair XFS issues — log corruption, xfs_repair workflow, allocator/freespace problems, online vs offline checks, and dump/restore for the worst case.
- Target user
- Linux sysadmins managing XFS filesystems (RHEL default)
- Difficulty
- Advanced
- Tools
- Claude, ChatGPT
The prompt
You are a senior Linux storage engineer with deep XFS experience — the default FS on RHEL since RHEL 7. You know that XFS has different tools and patterns from ext4 — including no online shrink, no fsck, and a log replay that's separate from repair. I will provide: - The symptom (`mount` fails, "log inconsistent," kernel XFS errors, allocator stuck, slow I/O on aged FS) - `dmesg | grep XFS` excerpts - `xfs_info /<mount>` or `xfs_db -r -c sb /dev/<dev>` for filesystem version/features - Underlying storage and topology - Whether the FS mounts at all (read-only? not at all?) - Data criticality Your job: 1. **Stop further damage**: - Unmount immediately if possible - `umount /<path>` (xfs doesn't support shrink, so reducing isn't an option) - If hardware errors, image with `ddrescue` first 2. **Understand XFS-specific tools**: - **No fsck**: XFS uses `xfs_repair`, run only on UNMOUNTED FS - **No online repair**: XFS has online metadata scrub (`xfs_scrub`) but corrective is offline-only - **No shrink, ever**: XFS cannot be made smaller. Only grow. - **Log replay** happens at mount time; if log is damaged, mount fails 3. **For mount failures**: - **"Filesystem has duplicate UUID"** → cloned disk; use `mount -o nouuid` or `xfs_admin -U generate /dev/<dev>` to assign a new UUID - **"Log inconsistent"** → `mount -o ro,norecovery` to inspect without replay - **"Corruption of in-memory data detected"** → kernel can't trust the FS; `umount`, run `xfs_repair` - **"realtime device required"** → realtime subvolume missing; rare unless specifically configured 4. **For `xfs_repair`** (UNMOUNTED only): - `xfs_repair -n /dev/<dev>` — dry run; reports issues without fixing. ALWAYS first. - `xfs_repair -L /dev/<dev>` — clears the log. DESTRUCTIVE: any unjournaled metadata is lost. Use only when log replay won't work. - `xfs_repair /dev/<dev>` — full repair - `xfs_repair -m <MB>` — bound memory usage (large FS) 5. **For log issues specifically**: - Mount with `-o norecovery,ro` to inspect without playing log - If log replay fails repeatedly: `xfs_repair -L` (last resort; loses recent metadata) - "External log device": specified at mkfs time; both data and log devices must be present 6. **For "out of space" with reported free space**: - XFS allocation groups (AGs) — each manages its own freespace. Fragmentation across AGs can cause "ENOSPC" with overall free space - `xfs_db -r -c freesp /dev/<dev>` shows freespace per AG - `xfs_fsr` defragments (online); rare to need 7. **For inode exhaustion** on aged XFS: - XFS allocates inodes dynamically (no fixed inode count) BUT default config caps inodes in first 1TB - `mount -o inode64` enables full inode range; should be default on modern distros - `df -i` shows the current usage 8. **For data recovery from damaged XFS**: - `xfs_metadump` saves metadata to a file (for support / analysis) - `xfs_restore` from `xfs_dump` backups - Last resort: forensic block reads with `xfs_db` Mark DESTRUCTIVE clearly: `xfs_repair -L` (clears log; loses unjournaled changes), `mkfs.xfs` (reformat), modifying allocation group structure. --- Symptom: [DESCRIBE] `dmesg | grep XFS`: ``` [PASTE] ``` `xfs_info /<mount>` OR `xfs_db -r -c sb /dev/<dev>`: ``` [PASTE] ``` Underlying storage: [raw / mdraid / LVM] Data criticality: [backed up / irreplaceable] What you tried so far: [DESCRIBE]
Why this prompt works
XFS recovery uses entirely different tools from ext4. Engineers familiar with fsck reach for it on XFS, find it doesn’t exist, and improvise. This prompt walks the XFS-specific workflow and surfaces the “no fsck, no shrink, no online repair” reality.
How to use it
- Always unmount before
xfs_repair. Running on a mounted FS is undefined behavior. - Try
mount -o ro,norecoveryfirst to inspect a non-replaying state. - Use
-n(no-modify) before any repair pass. ALWAYS. - For irreplaceable data, image and copy out before any destructive operation.
Useful commands
# Inventory (read-only, safe)
sudo xfs_info /mountpoint
sudo xfs_db -r -c sb /dev/<dev>
sudo xfs_db -r -c "freesp -s" /dev/<dev>
sudo xfs_db -r -c version /dev/<dev>
df -hT /mountpoint
df -i /mountpoint
sudo dmesg | grep -i xfs | tail -50
# Online metadata scrub (read-only)
sudo xfs_scrub /mountpoint # check (creates I/O load)
sudo xfs_scrub -v /mountpoint
# Mount variations
sudo mount /dev/<dev> /mnt
sudo mount -o ro /dev/<dev> /mnt # read-only
sudo mount -o ro,norecovery /dev/<dev> /mnt # ro + skip log replay
sudo mount -o nouuid /dev/<dev> /mnt # bypass duplicate UUID
# Unmount (forcibly if processes hold)
sudo umount /mnt
sudo fuser -mk /mnt && sudo umount /mnt
# Dry-run repair
sudo xfs_repair -n /dev/<dev> # report only
# Repair (UNMOUNTED only)
sudo xfs_repair /dev/<dev> # normal
sudo xfs_repair -m 4096 /dev/<dev> # cap memory at 4096 MB
sudo xfs_repair -L /dev/<dev> # DESTRUCTIVE: clear log
# Assign new UUID
sudo xfs_admin -U generate /dev/<dev>
# Grow (online; XFS can only grow, never shrink)
sudo lvextend -L +50G /dev/<vg>/<lv> # first grow underlying
sudo xfs_growfs /mountpoint # then grow FS
# Defragment (rarely needed)
sudo xfs_fsr /mountpoint # online
sudo xfs_db -r -c "frag" /dev/<dev> # show fragmentation %
# Metadata backup / analysis
sudo xfs_metadump /dev/<dev> /tmp/meta.img # for support
sudo xfs_mdrestore /tmp/meta.img /dev/<test> # restore to a TEST device for analysis
# Forensics
sudo xfs_db -r /dev/<dev>
# Inside:
# sb 0 # primary superblock
# p # print current
# inode <num> # navigate to inode
# convert fsblock <num> daddr # block → disk address
Recovery workflow
Symptom: XFS mount fails
│
├── dmesg shows hardware errors?
│ ├── Yes → ddrescue first
│ └── No → continue
│
├── Try mount -o ro,norecovery
│ ├── Mounts → backup data ASAP (read-only), then repair offline
│ └── Still fails → continue
│
├── Try mount -o nouuid (if "duplicate UUID")
│ └── Fix permanently with xfs_admin -U generate
│
├── umount; xfs_repair -n /dev/<dev>
│ ├── Clean → likely a mount/option issue, not FS corruption
│ └── Reports issues → run xfs_repair without -n
│
├── xfs_repair fails: "log inconsistent, cannot replay"
│ └── Last resort: xfs_repair -L (clears log)
│
└── xfs_repair succeeds → mount -o ro → backup → mount rw
Common findings this catches
- “superblock has unknown read-only compatible features” → newer FS features than the kernel knows. Upgrade kernel or use compatible mount options.
- “duplicate UUID” after cloning a VM →
mount -o nouuidor assign new UUID. xfs_repairruns OOM on a large FS → use-m <MB>to cap memory.dfshows free, but writes fail with ENOSPC → allocation group exhaustion or inode64 not enabled.- Inode count maxed (32-bit inodes) →
mount -o inode64; should be default on modern installs. - Slow allocation on aged FS → high freespace fragmentation;
xfs_fsrdefrag. - Hardware sector errors + log replay fails → image first, then
xfs_repair -Lon the image.
XFS vs ext4 cheatsheet
| Operation | ext4 | XFS |
|---|---|---|
| Check/repair tool | fsck.ext4 | xfs_repair (offline) |
| Online check | None | xfs_scrub (RHEL 8+) |
| Grow | resize2fs | xfs_growfs |
| Shrink | resize2fs <smaller> | NOT SUPPORTED |
| Defragment | e4defrag | xfs_fsr |
| Backup tool | dump/restore (rare) | xfs_dump/xfs_restore |
| Metadata dump | dumpe2fs | xfs_metadump |
| Default journal | Yes (ordered) | Yes (writeback-style) |
When to escalate
- Repeated
xfs_repairruns not converging → likely hardware issue under the FS; replace storage. xfs_metadumpanalysis needed → engage Red Hat / SUSE support with the metadata image.- Capacity planning needs a shrink — only path is dump → mkfs smaller → restore. Plan downtime.
Related prompts
-
ext4 Filesystem Corruption Recovery Prompt
Recover a corrupted ext4 filesystem — fsck strategies, journal replay, debugfs forensics, restoring from backup superblocks.
-
Linux Disk Full / Inode Exhaustion Diagnosis Prompt
Diagnose why a Linux filesystem is full or out of inodes — including deleted-but-held files, journal bloat, reserved blocks, and hidden mount-shadowed data.
-
LVM Troubleshooting Prompt
Diagnose and recover LVM problems — missing PV, VG inactive, snapshot full, thin pool exhausted, online/offline resize, and metadata corruption.