Skip to content
DevOps AI ToolKit
Newsletter
All guides
AI for Linux Admins By James Joyner IV · · 9 min read

Linux Error Guide: 'EXT4-fs error ... bad extent' Filesystem Corruption and fsck

Fix the EXT4-fs error 'bad extent/header' that remounts your filesystem read-only. Diagnose ext4 metadata corruption, recover the journal, and run e2fsck safely.

  • #linux-admins
  • #troubleshooting
  • #errors
  • #filesystem

Exact Error Message

When the kernel detects corrupted ext4 metadata, it logs a cluster of messages to the ring buffer and, by default, drops the filesystem to read-only. A typical sequence looks like this:

EXT4-fs error (device sda1): ext4_find_extent:925: inode #1835012: comm nginx: pblk 0 bad header/extent: invalid magic - magic 0x5a3e, entries 41233, max 4 (340), depth 0(0)
EXT4-fs error (device sda1): ext4_ext_check_inode:512: inode #1835012: comm nginx: pblk 0 bad header/extent: invalid magic
EXT4-fs error (device sda1) in ext4_reserve_inode_write:5876: IO failure
EXT4-fs (sda1): Remounting filesystem read-only
Aborting journal on device sda1-8.
EXT4-fs (sda1): Delayed block allocation failed for inode 1835012 with -30

The key signals are bad header/extent: invalid magic, Remounting filesystem read-only, and Aborting journal. Together they mean the kernel can no longer trust the on-disk structure of this filesystem.

What the Error Means

ext4 stores file block locations in an extent tree, a B-tree-like structure rooted in each inode. Every extent node begins with a header whose magic number must be 0xF30A. When the kernel reads an extent block and finds something else - in the example above, magic 0x5a3e with an absurd entries 41233 against max 4 - it concludes the metadata is corrupt and refuses to act on it. Acting on a bad extent could mean reading or writing arbitrary disk blocks, so the kernel fails loudly instead.

Because the filesystem was mounted with the errors=remount-ro option (the default for most root and data filesystems), the kernel responds to the corruption by remounting the volume read-only. This freezes writes to prevent further damage. The journal is aborted because the kernel can no longer guarantee that committing it would produce a consistent result. From that point on, every write returns EROFS (read-only filesystem), and applications like nginx, databases, or log writers start failing.

This is a protective halt, not the original fault. The corruption already happened; the read-only remount is the kernel telling you to stop and investigate.

Common Causes

  • Failing physical media. Bad sectors, reallocated sectors, or an SSD nearing wear-out can flip or lose the bytes that hold extent headers. This is the single most common root cause.
  • Unclean shutdowns and power loss. A crash mid-write can leave metadata partially updated, though the journal usually replays cleanly. Repeated unclean shutdowns raise the odds of damage.
  • Storage controller or cable faults. A flaky SATA cable, HBA, or RAID controller can silently corrupt data in transit.
  • Hypervisor or SAN issues. On VMs, a thin-provisioned volume that ran out of backing space, a snapshot rollback, or a misbehaving storage backend can present a corrupt block device.
  • Memory errors. Non-ECC RAM bit-flips can corrupt metadata before it is ever written to disk.
  • Software bugs. Rare kernel or driver bugs, or forcibly killing a resize/conversion operation, can damage the extent tree.

How to Reproduce the Error

You should never deliberately corrupt a production filesystem, but you can safely observe this behavior on a throwaway loopback image in a lab:

# Create a small image and an ext4 filesystem on it
dd if=/dev/zero of=/tmp/test.img bs=1M count=64
mkfs.ext4 /tmp/test.img
sudo mount -o loop,errors=remount-ro /tmp/test.img /mnt/lab

Writing garbage over the metadata region of that image and then accessing files will trigger bad header/extent messages and a read-only remount, exactly as on a real disk. Destroy the image afterward. This confirms the mechanism without risking real data.

Diagnostic Commands

All of the commands below are read-only. They inspect logs, mount state, filesystem headers, and disk health without modifying anything. Run these first, before any repair.

# Pull EXT4-related messages from the kernel ring buffer with timestamps
dmesg -T | grep -i ext4

# Same view from the persistent journal, with a grep pattern
journalctl -k -g EXT4

# Confirm the filesystem is now mounted read-only
mount | grep sda1
cat /proc/mounts | grep sda1

# Read the superblock header only (-h), which does NOT write
sudo dumpe2fs -h /dev/sda1

# List all filesystem parameters from the superblock (read-only)
sudo tune2fs -l /dev/sda1

# Check the physical disk's SMART health (read-only)
sudo smartctl -a /dev/sda

# See whether the volume has filled up or gone read-only
df -h

A dumpe2fs -h header on an affected volume often shows the filesystem flagged as having errors:

Filesystem volume name:   <none>
Filesystem UUID:          7c2e4b9a-1f3d-4a90-9c11-22a0d6f1e8b4
Filesystem state:         clean with errors
Errors behavior:          Remount read-only
Last error time:          Sat Jun 27 09:14:02 2026
First error time:         Sat Jun 27 09:14:02 2026
Last error function:      ext4_find_extent
FS Error count:           37

The Filesystem state: clean with errors line and the populated error fields confirm the kernel recorded corruption. The Errors behavior: Remount read-only line confirms why the volume went read-only.

SMART output is where you find out whether the underlying disk is dying:

ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      RAW_VALUE
  5 Reallocated_Sector_Ct   0x0033   088   088   010    Pre-fail  Always       412
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       24
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      24

A nonzero and growing Reallocated_Sector_Ct, plus Current_Pending_Sector, tells you the drive is reallocating bad sectors. A repaired filesystem on this disk will likely corrupt again - replace the hardware.

Step-by-Step Resolution

Warning: e2fsck and fsck modify the filesystem. Never run them on a mounted or read-write filesystem - doing so can cause catastrophic, unrecoverable corruption. The target must be fully unmounted first.

  1. Stop writers and unmount. Stop services using the volume (databases, web servers), then unmount it:

    sudo systemctl stop nginx
    sudo umount /dev/sda1

    If the volume is the root filesystem, you cannot unmount it while booted from it. Boot from a rescue ISO, a live USB, or drop to the initramfs/(initramfs) rescue shell so the target is offline.

  2. Image the disk first if data is precious. Before repairing, clone the device with ddrescue to another disk so you can retry repairs without further degrading a failing drive.

  3. Run e2fsck on the unmounted device. Force a full check even if the superblock looks clean:

    sudo e2fsck -f /dev/sda1

    To let it apply all repairs non-interactively after you understand the risk, use -y:

    sudo e2fsck -fy /dev/sda1

    e2fsck replays the journal first, then walks inodes and extent trees, fixing bad headers, orphaned inodes, and reference counts. Damaged files may be moved to lost+found.

  4. Recover from a backup superblock if needed. If the primary superblock is too damaged for e2fsck to start, find a backup location and point e2fsck at it:

    sudo dumpe2fs /dev/sda1 | grep -i superblock   # read-only, lists backups
    sudo e2fsck -fy -b 32768 /dev/sda1             # repair using a backup
  5. Re-check, then remount. Run e2fsck -f a second time; a clean pass should report no errors. Then remount and start services:

    sudo mount /dev/sda1 /mnt && sudo systemctl start nginx
  6. Verify the disk before trusting the repair. Re-run smartctl -a /dev/sda. If SMART shows reallocated or pending sectors climbing, schedule the disk for replacement regardless of a clean fsck.

Prevention and Best Practices

  • Monitor SMART proactively. Run smartctl checks on a schedule and alert on rising Reallocated_Sector_Ct and Current_Pending_Sector. Replace disks before they corrupt filesystems.
  • Keep errors=remount-ro. This default is your friend - it limits damage. Avoid errors=continue on important volumes.
  • Use ECC memory on servers to eliminate RAM-induced corruption.
  • Ensure clean shutdowns. Protect against power loss with a UPS, and let writes flush before powering down.
  • Run periodic e2fsck on a schedule. Set a sane mount-count or time interval with tune2fs -c/tune2fs -i, or check during maintenance windows.
  • Back up, and test restores. fsck recovers structure, not your data’s correctness. A current backup is the only guaranteed recovery path. See our broader Linux administration guides for monitoring and backup patterns, and our notes on observability for infrastructure for alerting on disk health.
  • On VMs, watch the backend. Avoid over-thin-provisioning and verify the hypervisor storage is healthy.
  • EXT4-fs error: ext4_lookup: deleted inode referenced - directory metadata corruption, also fixed by e2fsck.
  • JBD2: Detected IO errors while flushing file data - the journal layer hitting disk I/O errors, often a precursor to a read-only remount.
  • Buffer I/O error on dev sda1, logical block N - raw block device read failures pointing at failing media.
  • mount: wrong fs type, bad option, bad superblock - a damaged primary superblock; recover with a backup superblock as shown above.
  • EXT4-fs (sda1): error count since last fsck - a reminder that recorded errors are accumulating and a check is overdue.

Frequently Asked Questions

Can I just run fsck on the mounted filesystem to fix it quickly? No. Running fsck on a mounted, writable filesystem can destroy it. The filesystem must be unmounted (or read-only and idle, but unmounting is far safer). For a root filesystem, boot from rescue media or the initramfs shell.

Why did my filesystem go read-only instead of crashing the server? The errors=remount-ro mount option tells the kernel to remount read-only on metadata corruption rather than continue and risk spreading damage. It is a deliberate safety mechanism; the corruption occurred just before the remount.

e2fsck reported “clean” - does that mean my data is safe? Not necessarily. fsck repairs filesystem structure, which can mean discarding or truncating damaged files. Files moved to lost+found or zero-filled blocks may have lost content. Always validate critical data and restore from backup if in doubt.

Do I really need to replace the disk if fsck fixed everything? If SMART shows nonzero, growing Reallocated_Sector_Ct or Current_Pending_Sector, yes. A clean fsck on dying media is temporary - corruption will recur. Treat the SMART numbers as the deciding factor.

What does the “invalid magic” message actually mean? Every ext4 extent tree node starts with the magic number 0xF30A. When the kernel reads a value other than that, the extent metadata is corrupt, so it refuses to trust the block pointers and reports bad header/extent: invalid magic.

Free download · 368-page PDF

Download the Free 500-Prompt DevOps AI Toolkit

500 battle-tested, copy-paste AI prompts engineered by a senior systems engineer — every one with fill-in placeholders and safety/back-out notes. Drop your email and it's yours.

  • 500 prompts: Linux · Kubernetes · Terraform · OpenStack · GitLab · Docker · Monitoring · Incident Response
  • Instant PDF download — yours free, forever
  • Plus one practical AI-workflow email a week (no spam)

Single opt-in · unsubscribe anytime · no spam.