Linux Error Guide: 'Buffer I/O error on dev' Disk I/O Errors and Bad Sectors
Fix the 'Buffer I/O error on dev' and blk_update_request I/O error kernel messages: diagnose bad sectors, medium errors, EIO in apps, and SMART data on a failing disk.
- #linux-admins
- #troubleshooting
- #errors
- #storage
Exact Error Message
These messages appear in dmesg and the kernel journal, usually as a cluster fired off by the same underlying failure:
blk_update_request: I/O error, dev sdb, sector 1953120 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
Buffer I/O error on dev sdb1, logical block 244139, async page read
sd 0:0:1:0: [sdb] tag#12 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=8s
sd 0:0:1:0: [sdb] tag#12 Sense Key : Medium Error [current]
sd 0:0:1:0: [sdb] tag#12 Add. Sense: Unrecovered read error
sd 0:0:1:0: [sdb] tag#12 CDB: Read(10) 28 00 00 1d cf a0 00 00 08 00
At the same time, an application trying to touch the bad region fails with EIO:
cp: error reading 'file': Input/output error
What the Error Means
The kernel block layer (blk_update_request) tried to complete a read or write to a sector and the device returned an error. Buffer I/O error on dev sdb1, logical block ... is the page-cache layer reporting that the buffered I/O it queued never completed successfully. The sd ... Sense Key : Medium Error / Unrecovered read error lines are the SCSI layer translating the raw sense data from the drive: the physical media at that location could not be read back.
The key distinction: hostbyte=DID_OK driverbyte=DRIVER_OK means the transport (cable, controller, driver) delivered the command fine, and the drive itself answered “I cannot read this sector.” That points at the media. If you instead see hostbyte=DID_BAD_TARGET, timeouts, link resets, or DRIVER_TIMEOUT, the problem is more likely the cable, port, or controller rather than the platter or flash cell.
When the kernel gives up on the I/O, the failure does not stay buried. It propagates up as the POSIX error code EIO, which surfaces to userspace as the string “Input/output error.” That is why cp, tar, a database, or your application abruptly fails reading one specific file: the bytes live on a sector the drive can no longer return.
Common Causes
- Bad sectors on a spinning disk. A few sectors developed read errors. The drive may have spare sectors and can remap them, but it cannot recover the data in a sector that fails to read. This can be a localized, stable defect rather than a dying disk.
- A dying / failing disk. Rising counts of pending and uncorrectable sectors, growing reallocation, or a SMART overall-health
FAILEDverdict mean the drive is degrading and will likely keep getting worse. Treat the data as at risk now. - Dying SSD. Flash wears out. As cells exhaust their program/erase cycles or the controller runs out of spare blocks, reads start returning uncorrectable errors. SSD failures are often more sudden than HDD failures.
- Loose cables, backplane, or controller issues. A marginal SATA/SAS cable, a flaky backplane connector, or an overheating HBA produces I/O errors that look like medium errors but move around or disappear after reseating hardware. Look for
DRIVER_TIMEOUT, link resets, orDID_*transport errors as the tell. - SCSI/SATA transport faults. Power issues, electrical noise, or a failing port can cause intermittent command failures across the whole device rather than at fixed sectors.
- Multipath / SAN path failures. On a SAN, one path to a LUN can fail while others stay healthy. You will see I/O errors on a single
sdXpath device, butmultipathshould fail over to a surviving path so the mapped device keeps working.
How to Reproduce the Error
You usually do not need to “reproduce” this — it is a hardware symptom, not a configuration mistake — but you can confirm it is tied to specific media. Reading the whole device sequentially will hit the bad region again:
# Read the entire device to /dev/null; watch dmesg light up at the bad offset.
# This is a read-only operation, but on a failing disk heavy reads add stress.
dd if=/dev/sdb of=/dev/null bs=1M status=progress
If the same sectors fail every time you read them, it is media. If the failures wander or vanish after you reseat cables, suspect the transport. Do not attempt to “reproduce” by writing to the disk — that risks losing data you may still be able to rescue.
Diagnostic Commands
All of the following are read-only. Run these before touching anything.
# Pull the relevant kernel messages with human timestamps
dmesg -T | grep -i 'i/o error\|medium error\|sense key'
# Same from the persistent journal (survives reboot)
journalctl -k -g 'I/O error'
# Full SMART attributes (read-only) — the most important diagnostic
smartctl -a /dev/sdb
# Quick overall health verdict (read-only)
smartctl -H /dev/sdb
# See the device, partitions, and what is mounted where
lsblk
# Is the kernel still treating the device as running?
cat /sys/block/sdb/device/state
# RAID arrays, if this disk is a member
cat /proc/mdstat
# SAN multipath topology and per-path state (read-only)
multipath -ll
A typical smartctl -a excerpt from a disk with real media trouble:
SMART overall-health self-assessment test result: PASSED
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE RAW_VALUE
5 Reallocated_Sector_Ct 0x0033 094 094 010 Pre-fail 72
197 Current_Pending_Sector 0x0012 100 100 000 Old_age 16
198 Offline_Uncorrectable 0x0010 100 100 000 Old_age 16
199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age 0
Here SMART still says PASSED, but Current_Pending_Sector and Offline_Uncorrectable are nonzero — sectors the drive could not read and has not yet been able to remap. Reallocated_Sector_Ct of 72 shows it has already swapped out spares. A PASSED overall verdict with pending sectors is not a clean bill of health; it means failure is in progress. A drive that has tipped over fully looks like this:
SMART overall-health self-assessment test result: FAILED!
Note UDMA_CRC_Error_Count is 0 above — a nonzero CRC count would instead implicate the cable or controller rather than the media.
Step-by-Step Resolution
-
Stop and back up first. Before any repair, fsck, or heavy scan, copy the data off while the disk still responds. Use a tool that skips errors instead of aborting, such as
ddrescue, which logs bad regions and rescues everything readable:ddrescue /dev/sdb /dev/sdc rescue.log. Runningfsckor a full read pass on a failing disk can be the read that finally kills it. -
Identify whether it is media or transport. Review the sense data.
Medium Error/Unrecovered read errorwithhostbyte=DID_OKmeans the platters or flash are at fault. Timeouts, link resets, or a nonzeroUDMA_CRC_Error_Countpoint at cables, the backplane, or the HBA — reseat or replace the cable and re-test before condemning the drive. -
Read the SMART verdict honestly.
FAILEDor risingCurrent_Pending_Sector/Offline_Uncorrectable/Reallocated_Sector_Ctover time means replace the disk. A single stable pending sector on an otherwise healthy drive may just be one defect. -
For a SAN LUN, confirm failover. Run
multipath -ll. If one path showsfailed/faultybut the multipath device still has anactivepath, your data is safe and you only need to fix or replace the failed path/HBA. The mapped device should keep serving I/O. -
Force reallocation only as a last resort, and only after backups. A pending sector is remapped when it is written. Overwriting just that LBA (carefully targeted, e.g. with
hdparm --write-sectoror by rewriting the file) can clear a single bad sector and let the spare take over. This destroys the data in that sector and assumes the rest of the disk is sound — it is a stopgap on a disk you still do not fully trust. -
Optionally run a read-only surface test.
badblocks -sv /dev/sdbdoes a non-destructive read scan and lists bad blocks with progress. Avoid the write-mode options (-w/-n) on a disk that holds data — they write to the device. Treat even the read scan with caution on a failing disk because of the I/O stress. -
Replace the disk and restore. If SMART says the drive is failing, swap it out, then rebuild from RAID (
/proc/mdstat) or restore from backup. Do not return a disk with growing uncorrectable counts to production.
Prevention and Best Practices
- Run
smartctlshort/long self-tests on a schedule viasmartdand alert on risingReallocated_Sector_Ct,Current_Pending_Sector, andOffline_Uncorrectable. - Keep tested, restorable backups. The first time you learn a disk is dying should not be the first time you need the backup.
- Use redundancy — RAID or multipath — so a single drive or path failure is survivable instead of an outage.
- Monitor
dmesg/journalctl -kforI/O errorandMedium Errorand treat the first occurrence as actionable, not noise. - For SSDs, watch wear indicators (media wearout / percentage used) and replace proactively before exhaustion.
- Keep good airflow and reseat marginal cabling; transport errors masquerade as media errors and waste good drives.
Related Errors
EXT4-fs error (device sdb1): ... Input/output error— the filesystem layer reporting the same underlying I/O failure.end_request: I/O error— the older kernel phrasing ofblk_update_request: I/O error.READ FPDMA QUEUED/failed commandATA errors — SATA-level errors that often accompany medium errors on the same drive.multipathd: path checker failed— the SAN counterpart, where a path goes faulty and failover kicks in. See our Linux admin guides and troubleshooting articles for more.
Frequently Asked Questions
Is a single bad sector the same as a failing disk?
No. One stable, remapped sector on a drive that is otherwise healthy can be a localized defect. A failing disk shows growing Current_Pending_Sector and Offline_Uncorrectable counts over time, or a SMART FAILED verdict. Watch the trend, not a single number.
Why does my application get “Input/output error” instead of a disk message?
The kernel cannot read the sector, so it returns EIO to the read/write call. Userspace translates EIO to the string “Input/output error.” The detailed Buffer I/O error and Medium Error lines are in dmesg/journalctl -k — always check there when an app reports EIO on one file.
Can I just run fsck to fix it?
Not as your first move. fsck repairs filesystem metadata, not bad media, and a full pass adds heavy I/O that can finish off a dying disk. Back up the data first (ddrescue), then run fsck only once your data is safe.
Will writing to the bad sector fix it? Sometimes. A pending sector is reallocated to a spare when it is written, which can clear that one error. But it destroys whatever was in that sector and only makes sense on a disk you have already backed up and that is not broadly failing. On a degrading drive, replace it.
One path on my SAN disk shows I/O errors — is my data gone?
Usually not. Run multipath -ll. If another path is still active, multipath has failed over and the mapped device keeps working. Fix or replace the failed path/HBA, and your data on the LUN is unaffected.
Download the Free 500-Prompt DevOps AI Toolkit
500 battle-tested, copy-paste AI prompts engineered by a senior systems engineer — every one with fill-in placeholders and safety/back-out notes. Drop your email and it's yours.
- 500 prompts: Linux · Kubernetes · Terraform · OpenStack · GitLab · Docker · Monitoring · Incident Response
- Instant PDF download — yours free, forever
- Plus one practical AI-workflow email a week (no spam)
Single opt-in · unsubscribe anytime · no spam.