ext4 Filesystem Corruption Recovery Prompt
Recover a corrupted ext4 filesystem — fsck strategies, journal replay, debugfs forensics, restoring from backup superblocks.
- Target user
- Linux sysadmins recovering data from a damaged ext4 filesystem
- Difficulty
- Advanced
- Tools
- Claude, ChatGPT
The prompt
You are a senior Linux storage engineer who has rescued data from countless corrupted ext4 filesystems — bad blocks, lost superblock, orphaned inodes, journal damage. You know that `fsck -y` is sometimes the right answer and sometimes "irreversibly destroy data fast." I will provide: - The symptom (mount fails, kernel "EXT4-fs error", read errors, files reading garbage, missing files after crash) - `dmesg` excerpts around the failure - The output of `mount` (currently mounted? read-only? not at all?) - The block device (`lsblk -f`), partition layout, and underlying storage (mdraid/LVM/raw) - Whether the data is critical and irreplaceable, or backed up elsewhere Your job: 1. **Stop further damage first**: - Unmount the FS (`umount` or `fuser -mk` if needed) - Mark device read-only at the kernel level if needed (`blockdev --setro`) - If hardware errors in `dmesg`, image the device first (`ddrescue`) before any fsck 2. **Pick the right fsck approach**: - **`fsck.ext4 -n` (no changes)** — dry run; reports what it would fix. ALWAYS first. - **`fsck.ext4 -f -p`** — auto-fix things that need no judgment (preen). Refuses on uncertain cases. - **`fsck.ext4 -f -y`** — answer yes to all. Used after `-n` shows acceptable changes. - **`fsck.ext4 -f -y -b <backup-sb>`** — if primary superblock is bad, use a backup - **`debugfs -R 'ls -l /' /dev/<dev>`** — inspect read-only without fsck (forensics) 3. **For "Superblock invalid"**: - Find backup superblocks: `mke2fs -n /dev/<dev>` (with `-n` it doesn't make; just shows) - Or `dumpe2fs /dev/<dev> | grep -i "backup superblock"` - Retry fsck with `-b <backup-block-number>` (e.g., 32768, 98304) 4. **For "Journal recovery failed"**: - Try `mount -o ro` to inspect without journal replay - `e2fsck -y -E journal_only /dev/<dev>` to replay only - In extreme cases: `tune2fs -O ^has_journal /dev/<dev>` to remove the journal (DESTRUCTIVE for ordered/journal data modes; converts to ext2) 5. **For "orphan inode" / "i_blocks_hi should be zero"**: - Almost always safe to fsck with `-y` after `-n` review - Lost+found will collect detached inodes; rename / re-attach manually 6. **For "bad magic number in superblock"**: - Likely wrong device (partition vs whole disk confusion) OR severe head corruption - Verify partition table is intact with `gdisk -l` / `fdisk -l` - Try backup superblock; if all fail, the FS metadata may be lost 7. **For read errors during fsck**: - Underlying disk failure; image with `ddrescue` first, then fsck the image - **NEVER `fsck -y` a disk with hardware errors** — fsck will write to bad sectors and propagate damage 8. **For data files reading garbage**: - May be FS metadata corruption (block map wrong) or actual data corruption - `debugfs` `dump_extents <inode>` shows the block map; cross-reference with `dd` reads - If the FS is on RAID5 with a known-failed-and-resynced member, suspect silent corruption from rebuild Mark DESTRUCTIVE clearly: `fsck -y` (auto-confirms ALL changes), `tune2fs -O ^has_journal` (removes journal), `mkfs` (reformats). --- Symptom: [DESCRIBE] `dmesg` excerpts: ``` [PASTE] ``` `mount | grep <fs>` and `lsblk -f`: ``` [PASTE] ``` Underlying storage: [raw / mdraid / LVM / LUKS] Data criticality: [backed up / partially backed up / irreplaceable] What you tried so far: [DESCRIBE]
Why this prompt works
ext4 fsck has many options and the wrong sequence destroys data. “Just run fsck -y” advice from forums is often the worst answer for a recovery scenario. This prompt forces a triage: stop damage, dry-run, image if hardware is suspect, then act.
How to use it
- Unmount immediately (read-only at minimum). Continued writes worsen corruption.
- Run
fsck -nfirst — always. Review what would change. - If
dmesgshows hardware errors, image withddrescueBEFORE further fsck. - For irreplaceable data, copy what you can BEFORE running corrective fsck.
Useful commands
# Inventory (safe, read-only)
sudo dmesg | tail -100
sudo dumpe2fs -h /dev/<dev> | head -40
sudo tune2fs -l /dev/<dev> | head -30
lsblk -f
fdisk -l /dev/<dev>
# Find backup superblocks
sudo mke2fs -n /dev/<dev> # -n: don't create; shows layout
# Output includes "Superblock backups stored on blocks: 32768, 98304, ..."
# Dry-run fsck
sudo fsck.ext4 -n -f /dev/<dev> # tells you what it would do
# Image a failing device (BEFORE any write attempts)
sudo apt install gddrescue
sudo ddrescue -d -r3 /dev/<failing> /dev/<replacement> ddrescue.log
sudo ddrescue -d -r3 -R /dev/<failing> /dev/<replacement> ddrescue.log # 2nd pass, reverse
# Forensics without fsck (read-only inspection)
sudo debugfs /dev/<dev>
# Inside:
# ls -l /
# stat <inode>
# icheck <block> # which inode owns this block
# ncheck <inode> # filename for this inode
# dump <inode> /tmp/out # extract file by inode (recovers deleted)
# Repair (after dry-run review)
sudo fsck.ext4 -f -p /dev/<dev> # preen — auto-fix safe things; bails on uncertain
sudo fsck.ext4 -f -y /dev/<dev> # yes to all
# With backup superblock
sudo fsck.ext4 -f -y -b 32768 /dev/<dev>
# After fsck: mount read-only first
sudo mount -o ro /dev/<dev> /mnt/recovery
sudo rsync -aHAX --partial /mnt/recovery/ /backup/
# Look for orphaned files
ls -la /mnt/recovery/lost+found/
sudo file /mnt/recovery/lost+found/* # identify by content
Recovery decision tree
Symptom: FS won't mount, EXT4-fs error in dmesg
│
├── Hardware errors in dmesg (UNC, sector errors)?
│ ├── Yes → ddrescue to healthy disk → continue on image
│ └── No → continue on device
│
├── fsck -n -f /dev/<dev>
│ ├── Reports "clean" → not FS corruption; check mount opts, kernel version
│ ├── Reports few fixable issues → fsck -y -f
│ └── Reports superblock invalid → continue
│
├── Superblock invalid
│ ├── mke2fs -n → list backup SB blocks
│ ├── fsck -y -f -b <backup-sb>
│ └── If all backups fail → debugfs forensics; consider data recovery service
│
└── fsck succeeds → mount -o ro → rsync data out → then mount rw
Common findings this catches
- fsck -n reports millions of changes → likely wrong device, not corruption. Re-verify
lsblk -f. - fsck -p exits early “INCONSISTENCY MANUALLY” → safe to retry with
-yafter-nreview. - Journal recovery succeeds but FS mounts read-only → kernel detected error after replay; fsck offline.
- Files in
lost+foundwith names like#12345→ orphan inodes; identify byfile, restore by content. - Bad block at superblock → use backup superblock with
-b. - Repeated “Superblock has an invalid journal” after replay → journal device damaged; consider removing journal (data ext2) as last resort.
- fsck loops repeatedly fixing same issue → likely hardware writing back errors; image first.
Verification after recovery
# Mount read-only first, check
sudo mount -o ro /dev/<dev> /mnt/check
sudo find /mnt/check -type f -exec file {} \; | grep -v ASCII | head
# Compare to backup if available
sudo rsync -avn --checksum /backup/ /mnt/check/
# Check FS features
sudo tune2fs -l /dev/<dev> | grep -E "Filesystem features|Last checked|Mount count"
# Schedule periodic checks
sudo tune2fs -c 30 /dev/<dev> # check every 30 mounts
sudo tune2fs -i 6m /dev/<dev> # check every 6 months
When to escalate
- Irreplaceable data + hardware failure → professional data recovery service. Don’t trust forum advice with originals.
- Repeated FS errors after fsck — underlying storage problem (controller, cable, disk); replace before trusting again.
- Corruption pattern matching a known kernel bug — check distro CVE/bug tracker; may need a downgrade or specific patch.
- Encrypted volume (LUKS) where the FS appears clean to debugfs but unreadable through dm-crypt — LUKS header issue; see LUKS recovery prompt.
Related prompts
-
Linux Disk Full / Inode Exhaustion Diagnosis Prompt
Diagnose why a Linux filesystem is full or out of inodes — including deleted-but-held files, journal bloat, reserved blocks, and hidden mount-shadowed data.
-
LVM Troubleshooting Prompt
Diagnose and recover LVM problems — missing PV, VG inactive, snapshot full, thin pool exhausted, online/offline resize, and metadata corruption.
-
Linux mdraid Software RAID Recovery Prompt
Recover from degraded or failed mdraid arrays — failed disk, missing member, resync stuck, replacing drives without losing data.