Skip to content
CloudOps
Newsletter
All prompts
AI for Linux Admins Difficulty: Advanced ClaudeChatGPT

XFS Filesystem Troubleshooting Prompt

Diagnose and repair XFS issues — log corruption, xfs_repair workflow, allocator/freespace problems, online vs offline checks, and dump/restore for the worst case.

Target user
Linux sysadmins managing XFS filesystems (RHEL default)
Difficulty
Advanced
Tools
Claude, ChatGPT

The prompt

You are a senior Linux storage engineer with deep XFS experience — the default FS on RHEL since RHEL 7. You know that XFS has different tools and patterns from ext4 — including no online shrink, no fsck, and a log replay that's separate from repair.

I will provide:
- The symptom (`mount` fails, "log inconsistent," kernel XFS errors, allocator stuck, slow I/O on aged FS)
- `dmesg | grep XFS` excerpts
- `xfs_info /<mount>` or `xfs_db -r -c sb /dev/<dev>` for filesystem version/features
- Underlying storage and topology
- Whether the FS mounts at all (read-only? not at all?)
- Data criticality

Your job:

1. **Stop further damage**:
   - Unmount immediately if possible
   - `umount /<path>` (xfs doesn't support shrink, so reducing isn't an option)
   - If hardware errors, image with `ddrescue` first
2. **Understand XFS-specific tools**:
   - **No fsck**: XFS uses `xfs_repair`, run only on UNMOUNTED FS
   - **No online repair**: XFS has online metadata scrub (`xfs_scrub`) but corrective is offline-only
   - **No shrink, ever**: XFS cannot be made smaller. Only grow.
   - **Log replay** happens at mount time; if log is damaged, mount fails
3. **For mount failures**:
   - **"Filesystem has duplicate UUID"** → cloned disk; use `mount -o nouuid` or `xfs_admin -U generate /dev/<dev>` to assign a new UUID
   - **"Log inconsistent"** → `mount -o ro,norecovery` to inspect without replay
   - **"Corruption of in-memory data detected"** → kernel can't trust the FS; `umount`, run `xfs_repair`
   - **"realtime device required"** → realtime subvolume missing; rare unless specifically configured
4. **For `xfs_repair`** (UNMOUNTED only):
   - `xfs_repair -n /dev/<dev>` — dry run; reports issues without fixing. ALWAYS first.
   - `xfs_repair -L /dev/<dev>` — clears the log. DESTRUCTIVE: any unjournaled metadata is lost. Use only when log replay won't work.
   - `xfs_repair /dev/<dev>` — full repair
   - `xfs_repair -m <MB>` — bound memory usage (large FS)
5. **For log issues specifically**:
   - Mount with `-o norecovery,ro` to inspect without playing log
   - If log replay fails repeatedly: `xfs_repair -L` (last resort; loses recent metadata)
   - "External log device": specified at mkfs time; both data and log devices must be present
6. **For "out of space" with reported free space**:
   - XFS allocation groups (AGs) — each manages its own freespace. Fragmentation across AGs can cause "ENOSPC" with overall free space
   - `xfs_db -r -c freesp /dev/<dev>` shows freespace per AG
   - `xfs_fsr` defragments (online); rare to need
7. **For inode exhaustion** on aged XFS:
   - XFS allocates inodes dynamically (no fixed inode count) BUT default config caps inodes in first 1TB
   - `mount -o inode64` enables full inode range; should be default on modern distros
   - `df -i` shows the current usage
8. **For data recovery from damaged XFS**:
   - `xfs_metadump` saves metadata to a file (for support / analysis)
   - `xfs_restore` from `xfs_dump` backups
   - Last resort: forensic block reads with `xfs_db`

Mark DESTRUCTIVE clearly: `xfs_repair -L` (clears log; loses unjournaled changes), `mkfs.xfs` (reformat), modifying allocation group structure.

---

Symptom: [DESCRIBE]
`dmesg | grep XFS`:
```
[PASTE]
```
`xfs_info /<mount>` OR `xfs_db -r -c sb /dev/<dev>`:
```
[PASTE]
```
Underlying storage: [raw / mdraid / LVM]
Data criticality: [backed up / irreplaceable]
What you tried so far:
[DESCRIBE]

Why this prompt works

XFS recovery uses entirely different tools from ext4. Engineers familiar with fsck reach for it on XFS, find it doesn’t exist, and improvise. This prompt walks the XFS-specific workflow and surfaces the “no fsck, no shrink, no online repair” reality.

How to use it

  1. Always unmount before xfs_repair. Running on a mounted FS is undefined behavior.
  2. Try mount -o ro,norecovery first to inspect a non-replaying state.
  3. Use -n (no-modify) before any repair pass. ALWAYS.
  4. For irreplaceable data, image and copy out before any destructive operation.

Useful commands

# Inventory (read-only, safe)
sudo xfs_info /mountpoint
sudo xfs_db -r -c sb /dev/<dev>
sudo xfs_db -r -c "freesp -s" /dev/<dev>
sudo xfs_db -r -c version /dev/<dev>
df -hT /mountpoint
df -i /mountpoint
sudo dmesg | grep -i xfs | tail -50

# Online metadata scrub (read-only)
sudo xfs_scrub /mountpoint                   # check (creates I/O load)
sudo xfs_scrub -v /mountpoint

# Mount variations
sudo mount /dev/<dev> /mnt
sudo mount -o ro /dev/<dev> /mnt              # read-only
sudo mount -o ro,norecovery /dev/<dev> /mnt   # ro + skip log replay
sudo mount -o nouuid /dev/<dev> /mnt          # bypass duplicate UUID

# Unmount (forcibly if processes hold)
sudo umount /mnt
sudo fuser -mk /mnt && sudo umount /mnt

# Dry-run repair
sudo xfs_repair -n /dev/<dev>                 # report only

# Repair (UNMOUNTED only)
sudo xfs_repair /dev/<dev>                    # normal
sudo xfs_repair -m 4096 /dev/<dev>            # cap memory at 4096 MB
sudo xfs_repair -L /dev/<dev>                 # DESTRUCTIVE: clear log

# Assign new UUID
sudo xfs_admin -U generate /dev/<dev>

# Grow (online; XFS can only grow, never shrink)
sudo lvextend -L +50G /dev/<vg>/<lv>           # first grow underlying
sudo xfs_growfs /mountpoint                    # then grow FS

# Defragment (rarely needed)
sudo xfs_fsr /mountpoint                       # online
sudo xfs_db -r -c "frag" /dev/<dev>            # show fragmentation %

# Metadata backup / analysis
sudo xfs_metadump /dev/<dev> /tmp/meta.img    # for support
sudo xfs_mdrestore /tmp/meta.img /dev/<test>  # restore to a TEST device for analysis

# Forensics
sudo xfs_db -r /dev/<dev>
# Inside:
#   sb 0                           # primary superblock
#   p                              # print current
#   inode <num>                    # navigate to inode
#   convert fsblock <num> daddr    # block → disk address

Recovery workflow

Symptom: XFS mount fails

├── dmesg shows hardware errors?
│   ├── Yes → ddrescue first
│   └── No  → continue

├── Try mount -o ro,norecovery
│   ├── Mounts → backup data ASAP (read-only), then repair offline
│   └── Still fails → continue

├── Try mount -o nouuid (if "duplicate UUID")
│   └── Fix permanently with xfs_admin -U generate

├── umount; xfs_repair -n /dev/<dev>
│   ├── Clean → likely a mount/option issue, not FS corruption
│   └── Reports issues → run xfs_repair without -n

├── xfs_repair fails: "log inconsistent, cannot replay"
│   └── Last resort: xfs_repair -L (clears log)

└── xfs_repair succeeds → mount -o ro → backup → mount rw

Common findings this catches

  • “superblock has unknown read-only compatible features” → newer FS features than the kernel knows. Upgrade kernel or use compatible mount options.
  • “duplicate UUID” after cloning a VM → mount -o nouuid or assign new UUID.
  • xfs_repair runs OOM on a large FS → use -m <MB> to cap memory.
  • df shows free, but writes fail with ENOSPC → allocation group exhaustion or inode64 not enabled.
  • Inode count maxed (32-bit inodes)mount -o inode64; should be default on modern installs.
  • Slow allocation on aged FS → high freespace fragmentation; xfs_fsr defrag.
  • Hardware sector errors + log replay fails → image first, then xfs_repair -L on the image.

XFS vs ext4 cheatsheet

Operationext4XFS
Check/repair toolfsck.ext4xfs_repair (offline)
Online checkNonexfs_scrub (RHEL 8+)
Growresize2fsxfs_growfs
Shrinkresize2fs <smaller>NOT SUPPORTED
Defragmente4defragxfs_fsr
Backup tooldump/restore (rare)xfs_dump/xfs_restore
Metadata dumpdumpe2fsxfs_metadump
Default journalYes (ordered)Yes (writeback-style)

When to escalate

  • Repeated xfs_repair runs not converging → likely hardware issue under the FS; replace storage.
  • xfs_metadump analysis needed → engage Red Hat / SUSE support with the metadata image.
  • Capacity planning needs a shrink — only path is dump → mkfs smaller → restore. Plan downtime.

Related prompts

Newsletter

Free: the DevOps AI Incident-Triage Cheat Sheet

Subscribe and we’ll send you the one-page cheat sheet — plus weekly AI prompts, automation ideas, and tool reviews for infrastructure engineers. One email a week. No spam, unsubscribe anytime.

  • AI Incident-Triage Cheat Sheet (PDF)
  • Access to 1,603 DevOps AI prompts
  • One practical workflow email per week