Skip to content
CloudOps
Newsletter
All prompts
AI for Linux Admins Difficulty: Advanced ClaudeChatGPT

ext4 Filesystem Corruption Recovery Prompt

Recover a corrupted ext4 filesystem — fsck strategies, journal replay, debugfs forensics, restoring from backup superblocks.

Target user
Linux sysadmins recovering data from a damaged ext4 filesystem
Difficulty
Advanced
Tools
Claude, ChatGPT

The prompt

You are a senior Linux storage engineer who has rescued data from countless corrupted ext4 filesystems — bad blocks, lost superblock, orphaned inodes, journal damage. You know that `fsck -y` is sometimes the right answer and sometimes "irreversibly destroy data fast."

I will provide:
- The symptom (mount fails, kernel "EXT4-fs error", read errors, files reading garbage, missing files after crash)
- `dmesg` excerpts around the failure
- The output of `mount` (currently mounted? read-only? not at all?)
- The block device (`lsblk -f`), partition layout, and underlying storage (mdraid/LVM/raw)
- Whether the data is critical and irreplaceable, or backed up elsewhere

Your job:

1. **Stop further damage first**:
   - Unmount the FS (`umount` or `fuser -mk` if needed)
   - Mark device read-only at the kernel level if needed (`blockdev --setro`)
   - If hardware errors in `dmesg`, image the device first (`ddrescue`) before any fsck
2. **Pick the right fsck approach**:
   - **`fsck.ext4 -n` (no changes)** — dry run; reports what it would fix. ALWAYS first.
   - **`fsck.ext4 -f -p`** — auto-fix things that need no judgment (preen). Refuses on uncertain cases.
   - **`fsck.ext4 -f -y`** — answer yes to all. Used after `-n` shows acceptable changes.
   - **`fsck.ext4 -f -y -b <backup-sb>`** — if primary superblock is bad, use a backup
   - **`debugfs -R 'ls -l /' /dev/<dev>`** — inspect read-only without fsck (forensics)
3. **For "Superblock invalid"**:
   - Find backup superblocks: `mke2fs -n /dev/<dev>` (with `-n` it doesn't make; just shows)
   - Or `dumpe2fs /dev/<dev> | grep -i "backup superblock"`
   - Retry fsck with `-b <backup-block-number>` (e.g., 32768, 98304)
4. **For "Journal recovery failed"**:
   - Try `mount -o ro` to inspect without journal replay
   - `e2fsck -y -E journal_only /dev/<dev>` to replay only
   - In extreme cases: `tune2fs -O ^has_journal /dev/<dev>` to remove the journal (DESTRUCTIVE for ordered/journal data modes; converts to ext2)
5. **For "orphan inode" / "i_blocks_hi should be zero"**:
   - Almost always safe to fsck with `-y` after `-n` review
   - Lost+found will collect detached inodes; rename / re-attach manually
6. **For "bad magic number in superblock"**:
   - Likely wrong device (partition vs whole disk confusion) OR severe head corruption
   - Verify partition table is intact with `gdisk -l` / `fdisk -l`
   - Try backup superblock; if all fail, the FS metadata may be lost
7. **For read errors during fsck**:
   - Underlying disk failure; image with `ddrescue` first, then fsck the image
   - **NEVER `fsck -y` a disk with hardware errors** — fsck will write to bad sectors and propagate damage
8. **For data files reading garbage**:
   - May be FS metadata corruption (block map wrong) or actual data corruption
   - `debugfs` `dump_extents <inode>` shows the block map; cross-reference with `dd` reads
   - If the FS is on RAID5 with a known-failed-and-resynced member, suspect silent corruption from rebuild

Mark DESTRUCTIVE clearly: `fsck -y` (auto-confirms ALL changes), `tune2fs -O ^has_journal` (removes journal), `mkfs` (reformats).

---

Symptom: [DESCRIBE]
`dmesg` excerpts:
```
[PASTE]
```
`mount | grep <fs>` and `lsblk -f`:
```
[PASTE]
```
Underlying storage: [raw / mdraid / LVM / LUKS]
Data criticality: [backed up / partially backed up / irreplaceable]
What you tried so far:
[DESCRIBE]

Why this prompt works

ext4 fsck has many options and the wrong sequence destroys data. “Just run fsck -y” advice from forums is often the worst answer for a recovery scenario. This prompt forces a triage: stop damage, dry-run, image if hardware is suspect, then act.

How to use it

  1. Unmount immediately (read-only at minimum). Continued writes worsen corruption.
  2. Run fsck -n first — always. Review what would change.
  3. If dmesg shows hardware errors, image with ddrescue BEFORE further fsck.
  4. For irreplaceable data, copy what you can BEFORE running corrective fsck.

Useful commands

# Inventory (safe, read-only)
sudo dmesg | tail -100
sudo dumpe2fs -h /dev/<dev> | head -40
sudo tune2fs -l /dev/<dev> | head -30
lsblk -f
fdisk -l /dev/<dev>

# Find backup superblocks
sudo mke2fs -n /dev/<dev>           # -n: don't create; shows layout
# Output includes "Superblock backups stored on blocks: 32768, 98304, ..."

# Dry-run fsck
sudo fsck.ext4 -n -f /dev/<dev>     # tells you what it would do

# Image a failing device (BEFORE any write attempts)
sudo apt install gddrescue
sudo ddrescue -d -r3 /dev/<failing> /dev/<replacement> ddrescue.log
sudo ddrescue -d -r3 -R /dev/<failing> /dev/<replacement> ddrescue.log   # 2nd pass, reverse

# Forensics without fsck (read-only inspection)
sudo debugfs /dev/<dev>
# Inside:
#   ls -l /
#   stat <inode>
#   icheck <block>            # which inode owns this block
#   ncheck <inode>            # filename for this inode
#   dump <inode> /tmp/out     # extract file by inode (recovers deleted)

# Repair (after dry-run review)
sudo fsck.ext4 -f -p /dev/<dev>     # preen — auto-fix safe things; bails on uncertain
sudo fsck.ext4 -f -y /dev/<dev>     # yes to all

# With backup superblock
sudo fsck.ext4 -f -y -b 32768 /dev/<dev>

# After fsck: mount read-only first
sudo mount -o ro /dev/<dev> /mnt/recovery
sudo rsync -aHAX --partial /mnt/recovery/ /backup/

# Look for orphaned files
ls -la /mnt/recovery/lost+found/
sudo file /mnt/recovery/lost+found/*    # identify by content

Recovery decision tree

Symptom: FS won't mount, EXT4-fs error in dmesg

├── Hardware errors in dmesg (UNC, sector errors)?
│   ├── Yes → ddrescue to healthy disk → continue on image
│   └── No  → continue on device

├── fsck -n -f /dev/<dev>
│   ├── Reports "clean" → not FS corruption; check mount opts, kernel version
│   ├── Reports few fixable issues → fsck -y -f
│   └── Reports superblock invalid → continue

├── Superblock invalid
│   ├── mke2fs -n → list backup SB blocks
│   ├── fsck -y -f -b <backup-sb>
│   └── If all backups fail → debugfs forensics; consider data recovery service

└── fsck succeeds → mount -o ro → rsync data out → then mount rw

Common findings this catches

  • fsck -n reports millions of changes → likely wrong device, not corruption. Re-verify lsblk -f.
  • fsck -p exits early “INCONSISTENCY MANUALLY” → safe to retry with -y after -n review.
  • Journal recovery succeeds but FS mounts read-only → kernel detected error after replay; fsck offline.
  • Files in lost+found with names like #12345 → orphan inodes; identify by file, restore by content.
  • Bad block at superblock → use backup superblock with -b.
  • Repeated “Superblock has an invalid journal” after replay → journal device damaged; consider removing journal (data ext2) as last resort.
  • fsck loops repeatedly fixing same issue → likely hardware writing back errors; image first.

Verification after recovery

# Mount read-only first, check
sudo mount -o ro /dev/<dev> /mnt/check
sudo find /mnt/check -type f -exec file {} \; | grep -v ASCII | head

# Compare to backup if available
sudo rsync -avn --checksum /backup/ /mnt/check/

# Check FS features
sudo tune2fs -l /dev/<dev> | grep -E "Filesystem features|Last checked|Mount count"

# Schedule periodic checks
sudo tune2fs -c 30 /dev/<dev>     # check every 30 mounts
sudo tune2fs -i 6m /dev/<dev>     # check every 6 months

When to escalate

  • Irreplaceable data + hardware failure → professional data recovery service. Don’t trust forum advice with originals.
  • Repeated FS errors after fsck — underlying storage problem (controller, cable, disk); replace before trusting again.
  • Corruption pattern matching a known kernel bug — check distro CVE/bug tracker; may need a downgrade or specific patch.
  • Encrypted volume (LUKS) where the FS appears clean to debugfs but unreadable through dm-crypt — LUKS header issue; see LUKS recovery prompt.

Related prompts

Newsletter

Free: the DevOps AI Incident-Triage Cheat Sheet

Subscribe and we’ll send you the one-page cheat sheet — plus weekly AI prompts, automation ideas, and tool reviews for infrastructure engineers. One email a week. No spam, unsubscribe anytime.

  • AI Incident-Triage Cheat Sheet (PDF)
  • Access to 1,603 DevOps AI prompts
  • One practical workflow email per week