Recovering Corrupted Linux Filesystems with fsck (and AI)

The scariest two minutes in sysadmin work is a server that drops you to an (initramfs) prompt or an emergency shell with a message about an unclean filesystem. Your instinct is to fix it fast, and that instinct is exactly what corrupts data permanently. fsck on a mounted filesystem, or fsck -y on a disk you don’t understand, can turn a recoverable problem into a restore-from-backup problem.

I’ve recovered enough of these to know the only real skill is staying calm and reading carefully. That’s also where AI helps most: it’s a level-headed second reader of dense fsck output when my own adrenaline is up. It drafts interpretations and suggests next steps, and I treat every one of them as a hypothesis to verify — never as a command to run, because filesystem repair is the last place you want to be fast and wrong.

First rule: never fsck a mounted filesystem

Running fsck on a mounted, writable filesystem is how you destroy data. Always confirm the target is unmounted or read-only first:

mount | grep /dev/sdb1
umount /dev/sdb1            # if it's mounted

If it’s your root filesystem, you can’t unmount it while booted normally. Either boot from rescue media, or remount read-only and force a check on next boot:

sudo tune2fs -l /dev/sda1 | grep "Filesystem state"
sudo touch /forcefsck       # legacy but still works on many distros

Confirm the state before you act. A “clean” filesystem doesn’t need a check, and running one anyway just adds risk for no benefit.

Know your filesystem — ext4 and XFS are not the same

This trips people up constantly. fsck.ext4 and xfs_repair are completely different tools with different rules:

sudo blkid /dev/sdb1        # tells you the TYPE

For ext4:

sudo fsck.ext4 -n /dev/sdb1    # -n = read-only, answer "no" to all

For XFS, fsck is essentially a no-op — you use xfs_repair, and you check first with -n:

sudo xfs_repair -n /dev/sdb1   # dry run, reports without changing

The -n flag is your best friend. It runs the full check and tells you what’s wrong without touching a single block. Capture that output. It’s also the perfect thing to hand an AI:

Here’s the output of fsck.ext4 -n on an unmounted ext4 volume. Explain in plain terms what’s damaged, whether this looks like metadata-only corruption or possible data loss, and what the safest repair command would be. Don’t assume I’ve run anything yet.

This is the kind of reading the model is good at. I keep a few of these recovery prompts saved with my other linux admin prompts.

Image the disk before you repair anything important

If the data matters, take a copy before you let any tool write to the disk. ddrescue is purpose-built for failing or suspect drives:

sudo ddrescue -d -r3 /dev/sdb1 /mnt/spare/sdb1.img /mnt/spare/rescue.log

Now you can run repair tools against the image, or keep the image as a safety net while you repair the original. Pro Tip: If you hear clicking or dmesg shows I/O errors and pending sectors, stop using the drive immediately and image it first. Every fsck pass on dying hardware accelerates the failure — copy the bits off before you do anything else.

Read the errors, then decide

fsck errors fall into a few buckets: bad inodes, unattached inodes (they land in lost+found), incorrect block counts, and superblock damage. The interpretation matters because the fix differs. For superblock corruption specifically, ext4 keeps backups:

sudo dumpe2fs /dev/sdb1 | grep -i superblock   # lists backup locations
sudo fsck.ext4 -b 32768 /dev/sdb1              # use a backup superblock

This is a great moment to use AI as an interpreter. Paste the error block and ask it to classify the damage and explain what a backup superblock even is. The incident response helper is also built for exactly this — feed it the symptoms and it suggests an investigation path. But the model interprets; you decide whether to write to the disk, because only you know whether there’s a backup and what the data is worth.

Run the repair deliberately

Once you understand the damage and have an image, run the real repair. For ext4, run it without -y the first time if you can stand to answer prompts — blindly answering yes to “clone multiply-claimed blocks?” can be exactly wrong:

sudo fsck.ext4 /dev/sdb1       # interactive, you approve each fix

For XFS, drop the -n once you’ve reviewed the dry run:

sudo xfs_repair /dev/sdb1

If xfs_repair complains it can’t proceed because of a dirty log, it’ll tell you to mount and unmount to replay the log first — do exactly that, don’t reach straight for -L, which zeroes the log and will lose in-flight data. This is precisely where an over-eager AI suggestion (“just run xfs_repair -L”) can ruin your day. Verify the recommendation against the tool’s own guidance before you destroy the log.

After the repair: verify and watch the hardware

A clean fsck run doesn’t mean a healthy disk. Check what lost+found caught, remount read-only, and look at SMART data:

sudo mount -o ro /dev/sdb1 /mnt/check
ls -la /mnt/check/lost+found
sudo smartctl -a /dev/sdb | grep -iE "reallocated|pending|uncorrect"

If reallocated or pending sector counts are climbing, the filesystem was a symptom and the disk is the disease — plan a replacement. You can hand the SMART output to the AI to summarize the failure indicators, then take that summary to your own judgment about replacement. The monitoring alerts helper is useful for turning recurring I/O errors into an alert rule so the next failing disk announces itself before it drops you to a rescue shell.

Keep AI out of the write path

The through-line of every filesystem recovery: AI reads, humans write. The model can classify damage, explain a backup superblock, and draft the repair sequence — all enormously helpful when you’re stressed. It must never have credentials to your box and must never be the thing that runs a command that mutates a disk. Filesystem repair is irreversible in a way config edits aren’t, and that’s exactly the work that needs a human who understands the stakes holding the keyboard. I keep my vetted recovery checklists in the prompt packs and broader prompts library so the next emergency starts from a calm, known-good sequence.

Conclusion

Corrupted filesystems are recoverable far more often than people fear — but only if you slow down. Confirm the type, unmount, dry-run with -n, image anything valuable, and read before you write. AI is a superb calm second reader for the dense output and a fast drafter of the repair plan. The decision to actually write to the disk stays with you, every time.