Skip to content
DevOps AI ToolKit
Newsletter
All guides
AI for Linux Admins By James Joyner IV · · 9 min read

Linux Error Guide: 'Buffer I/O error on dev' Disk I/O Errors and Bad Sectors

Fix the 'Buffer I/O error on dev' and blk_update_request I/O error kernel messages: diagnose bad sectors, medium errors, EIO in apps, and SMART data on a failing disk.

  • #linux-admins
  • #troubleshooting
  • #errors
  • #storage

Exact Error Message

These messages appear in dmesg and the kernel journal, usually as a cluster fired off by the same underlying failure:

blk_update_request: I/O error, dev sdb, sector 1953120 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
Buffer I/O error on dev sdb1, logical block 244139, async page read
sd 0:0:1:0: [sdb] tag#12 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=8s
sd 0:0:1:0: [sdb] tag#12 Sense Key : Medium Error [current]
sd 0:0:1:0: [sdb] tag#12 Add. Sense: Unrecovered read error
sd 0:0:1:0: [sdb] tag#12 CDB: Read(10) 28 00 00 1d cf a0 00 00 08 00

At the same time, an application trying to touch the bad region fails with EIO:

cp: error reading 'file': Input/output error

What the Error Means

The kernel block layer (blk_update_request) tried to complete a read or write to a sector and the device returned an error. Buffer I/O error on dev sdb1, logical block ... is the page-cache layer reporting that the buffered I/O it queued never completed successfully. The sd ... Sense Key : Medium Error / Unrecovered read error lines are the SCSI layer translating the raw sense data from the drive: the physical media at that location could not be read back.

The key distinction: hostbyte=DID_OK driverbyte=DRIVER_OK means the transport (cable, controller, driver) delivered the command fine, and the drive itself answered “I cannot read this sector.” That points at the media. If you instead see hostbyte=DID_BAD_TARGET, timeouts, link resets, or DRIVER_TIMEOUT, the problem is more likely the cable, port, or controller rather than the platter or flash cell.

When the kernel gives up on the I/O, the failure does not stay buried. It propagates up as the POSIX error code EIO, which surfaces to userspace as the string “Input/output error.” That is why cp, tar, a database, or your application abruptly fails reading one specific file: the bytes live on a sector the drive can no longer return.

Common Causes

  • Bad sectors on a spinning disk. A few sectors developed read errors. The drive may have spare sectors and can remap them, but it cannot recover the data in a sector that fails to read. This can be a localized, stable defect rather than a dying disk.
  • A dying / failing disk. Rising counts of pending and uncorrectable sectors, growing reallocation, or a SMART overall-health FAILED verdict mean the drive is degrading and will likely keep getting worse. Treat the data as at risk now.
  • Dying SSD. Flash wears out. As cells exhaust their program/erase cycles or the controller runs out of spare blocks, reads start returning uncorrectable errors. SSD failures are often more sudden than HDD failures.
  • Loose cables, backplane, or controller issues. A marginal SATA/SAS cable, a flaky backplane connector, or an overheating HBA produces I/O errors that look like medium errors but move around or disappear after reseating hardware. Look for DRIVER_TIMEOUT, link resets, or DID_* transport errors as the tell.
  • SCSI/SATA transport faults. Power issues, electrical noise, or a failing port can cause intermittent command failures across the whole device rather than at fixed sectors.
  • Multipath / SAN path failures. On a SAN, one path to a LUN can fail while others stay healthy. You will see I/O errors on a single sdX path device, but multipath should fail over to a surviving path so the mapped device keeps working.

How to Reproduce the Error

You usually do not need to “reproduce” this — it is a hardware symptom, not a configuration mistake — but you can confirm it is tied to specific media. Reading the whole device sequentially will hit the bad region again:

# Read the entire device to /dev/null; watch dmesg light up at the bad offset.
# This is a read-only operation, but on a failing disk heavy reads add stress.
dd if=/dev/sdb of=/dev/null bs=1M status=progress

If the same sectors fail every time you read them, it is media. If the failures wander or vanish after you reseat cables, suspect the transport. Do not attempt to “reproduce” by writing to the disk — that risks losing data you may still be able to rescue.

Diagnostic Commands

All of the following are read-only. Run these before touching anything.

# Pull the relevant kernel messages with human timestamps
dmesg -T | grep -i 'i/o error\|medium error\|sense key'

# Same from the persistent journal (survives reboot)
journalctl -k -g 'I/O error'

# Full SMART attributes (read-only) — the most important diagnostic
smartctl -a /dev/sdb

# Quick overall health verdict (read-only)
smartctl -H /dev/sdb

# See the device, partitions, and what is mounted where
lsblk

# Is the kernel still treating the device as running?
cat /sys/block/sdb/device/state

# RAID arrays, if this disk is a member
cat /proc/mdstat

# SAN multipath topology and per-path state (read-only)
multipath -ll

A typical smartctl -a excerpt from a disk with real media trouble:

SMART overall-health self-assessment test result: PASSED

ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      RAW_VALUE
  5 Reallocated_Sector_Ct   0x0033   094   094   010    Pre-fail  72
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   16
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   16
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   0

Here SMART still says PASSED, but Current_Pending_Sector and Offline_Uncorrectable are nonzero — sectors the drive could not read and has not yet been able to remap. Reallocated_Sector_Ct of 72 shows it has already swapped out spares. A PASSED overall verdict with pending sectors is not a clean bill of health; it means failure is in progress. A drive that has tipped over fully looks like this:

SMART overall-health self-assessment test result: FAILED!

Note UDMA_CRC_Error_Count is 0 above — a nonzero CRC count would instead implicate the cable or controller rather than the media.

Step-by-Step Resolution

  1. Stop and back up first. Before any repair, fsck, or heavy scan, copy the data off while the disk still responds. Use a tool that skips errors instead of aborting, such as ddrescue, which logs bad regions and rescues everything readable: ddrescue /dev/sdb /dev/sdc rescue.log. Running fsck or a full read pass on a failing disk can be the read that finally kills it.

  2. Identify whether it is media or transport. Review the sense data. Medium Error / Unrecovered read error with hostbyte=DID_OK means the platters or flash are at fault. Timeouts, link resets, or a nonzero UDMA_CRC_Error_Count point at cables, the backplane, or the HBA — reseat or replace the cable and re-test before condemning the drive.

  3. Read the SMART verdict honestly. FAILED or rising Current_Pending_Sector / Offline_Uncorrectable / Reallocated_Sector_Ct over time means replace the disk. A single stable pending sector on an otherwise healthy drive may just be one defect.

  4. For a SAN LUN, confirm failover. Run multipath -ll. If one path shows failed/faulty but the multipath device still has an active path, your data is safe and you only need to fix or replace the failed path/HBA. The mapped device should keep serving I/O.

  5. Force reallocation only as a last resort, and only after backups. A pending sector is remapped when it is written. Overwriting just that LBA (carefully targeted, e.g. with hdparm --write-sector or by rewriting the file) can clear a single bad sector and let the spare take over. This destroys the data in that sector and assumes the rest of the disk is sound — it is a stopgap on a disk you still do not fully trust.

  6. Optionally run a read-only surface test. badblocks -sv /dev/sdb does a non-destructive read scan and lists bad blocks with progress. Avoid the write-mode options (-w / -n) on a disk that holds data — they write to the device. Treat even the read scan with caution on a failing disk because of the I/O stress.

  7. Replace the disk and restore. If SMART says the drive is failing, swap it out, then rebuild from RAID (/proc/mdstat) or restore from backup. Do not return a disk with growing uncorrectable counts to production.

Prevention and Best Practices

  • Run smartctl short/long self-tests on a schedule via smartd and alert on rising Reallocated_Sector_Ct, Current_Pending_Sector, and Offline_Uncorrectable.
  • Keep tested, restorable backups. The first time you learn a disk is dying should not be the first time you need the backup.
  • Use redundancy — RAID or multipath — so a single drive or path failure is survivable instead of an outage.
  • Monitor dmesg/journalctl -k for I/O error and Medium Error and treat the first occurrence as actionable, not noise.
  • For SSDs, watch wear indicators (media wearout / percentage used) and replace proactively before exhaustion.
  • Keep good airflow and reseat marginal cabling; transport errors masquerade as media errors and waste good drives.
  • EXT4-fs error (device sdb1): ... Input/output error — the filesystem layer reporting the same underlying I/O failure.
  • end_request: I/O error — the older kernel phrasing of blk_update_request: I/O error.
  • READ FPDMA QUEUED / failed command ATA errors — SATA-level errors that often accompany medium errors on the same drive.
  • multipathd: path checker failed — the SAN counterpart, where a path goes faulty and failover kicks in. See our Linux admin guides and troubleshooting articles for more.

Frequently Asked Questions

Is a single bad sector the same as a failing disk? No. One stable, remapped sector on a drive that is otherwise healthy can be a localized defect. A failing disk shows growing Current_Pending_Sector and Offline_Uncorrectable counts over time, or a SMART FAILED verdict. Watch the trend, not a single number.

Why does my application get “Input/output error” instead of a disk message? The kernel cannot read the sector, so it returns EIO to the read/write call. Userspace translates EIO to the string “Input/output error.” The detailed Buffer I/O error and Medium Error lines are in dmesg/journalctl -k — always check there when an app reports EIO on one file.

Can I just run fsck to fix it? Not as your first move. fsck repairs filesystem metadata, not bad media, and a full pass adds heavy I/O that can finish off a dying disk. Back up the data first (ddrescue), then run fsck only once your data is safe.

Will writing to the bad sector fix it? Sometimes. A pending sector is reallocated to a spare when it is written, which can clear that one error. But it destroys whatever was in that sector and only makes sense on a disk you have already backed up and that is not broadly failing. On a degrading drive, replace it.

One path on my SAN disk shows I/O errors — is my data gone? Usually not. Run multipath -ll. If another path is still active, multipath has failed over and the mapped device keeps working. Fix or replace the failed path/HBA, and your data on the LUN is unaffected.

Free download · 368-page PDF

Download the Free 500-Prompt DevOps AI Toolkit

500 battle-tested, copy-paste AI prompts engineered by a senior systems engineer — every one with fill-in placeholders and safety/back-out notes. Drop your email and it's yours.

  • 500 prompts: Linux · Kubernetes · Terraform · OpenStack · GitLab · Docker · Monitoring · Incident Response
  • Instant PDF download — yours free, forever
  • Plus one practical AI-workflow email a week (no spam)

Single opt-in · unsubscribe anytime · no spam.