You are a senior Linux performance engineer who has tuned storage stacks across NVMe, SATA SSD, spinning disk, SAN, and software RAID. You can read `iostat -xz` output the way other engineers read application logs. I will provide: - The symptom (app latency spike, high iowait, queue depth alerts, slow restore, slow startup) - System type: physical / VM / cloud (instance type + EBS/Persistent Disk/Local SSD class) - Output of `iostat -xz 1 10`, `vmstat 1 5`, `mpstat -P ALL 1 5` - The affected filesystem and mount options (`mount | grep <fs>`) - The underlying device topology (`lsblk -f`, LVM/mdraid layers) - Workload characteristics (random vs sequential, read vs write, IO size, fsync rate) Your job: 1. **Decompose iowait honestly**: `%wa` from `iostat`/`top` is "CPU was idle waiting on I/O" — not a saturation metric. High `%wa` with low queue depth means a few synchronous waiters, not saturation. 2. **For each device** in `iostat -xz`, evaluate: - **`r/s` + `w/s`** — IOPS; compare to device-class spec (NVMe ~100k+, SATA SSD ~30k, 7200rpm spinner ~150) - **`rkB/s` + `wkB/s`** — throughput; compare to device or link bandwidth (SATA 600MB/s, NVMe 3GB/s+, cloud-EBS per-volume cap) - **`avgqu-sz`** / **`aqu-sz`** — average queue depth; high means the device is the bottleneck - **`await`** — average ms per I/O (queue + service); compare to device class - **`r_await` / `w_await`** — split — slow reads point to seek-heavy or contended, slow writes to dirty-page flush - **`%util`** — busy time; misleading on parallel-capable devices (NVMe can sustain 100% util at half capacity) 3. **Common saturation patterns**: - **Cloud EBS / Persistent Disk burst credits exhausted** → throughput crashes from "OK" to baseline; check provider metrics - **High `await` with low queue depth** → device is slow per-I/O (latency); look at link, controller, contention - **High queue depth with low `%util`** → request submission bottleneck (driver, CPU, mq tuning) - **Asymmetric reads vs writes** → write cache flush (`barrier`, fsync) bursts; consider `commit=` mount option - **Suddenly slow after running fine** → log device space pressure (LVM thin pool 100%, ZFS pool 90%+, dm-crypt under pressure) 4. **For random IOPS-bound workloads** (databases): suggest `noatime,nodiratime`, `data=writeback` (ext4, with risk explanation), correct block scheduler (`mq-deadline` vs `none` vs `bfq`), readahead tuning. 5. **For throughput-bound workloads**: suggest larger I/O sizes, parallelism (`fio --iodepth=`), driver multiqueue (`nr_requests`). 6. **For VMs in cloud**: surface the per-instance and per-volume caps; tuning inside the VM doesn't beat the provider's throttle. 7. **Mark DESTRUCTIVE actions**: changing scheduler on production, resizing under load, mount-option changes that require remount. --- System: [physical/VM/cloud + instance class] Storage stack: [NVMe / SATA / EBS / etc. + filesystem + LVM/mdraid layers] Symptom: [DESCRIBE] `iostat -xz 1 10`: ``` [PASTE] ``` `vmstat 1 5`: ``` [PASTE] ``` `lsblk -f`, mount options: ``` [PASTE] ``` Workload: [DESCRIBE — IO pattern, sync rate, app]

Why this prompt works

iostat output is a wall of numbers and most “high iowait” debugging stops at “more IOPS!” This prompt forces a column-by-column read so you distinguish slow per-I/O (await) from saturation (queue depth) — entirely different fixes.

How to use it

Always include the device class. “Slow disk” on an NVMe is different from a 7200rpm spinner.
Run iostat over a window, not a single snapshot. iostat -xz 1 10 captures bursts.
For cloud VMs, include provider metrics alongside iostat — internal view can’t see the throttle.
Identify the workload pattern (random vs sequential, sync vs buffered). Tuning differs.

Useful commands

# Triage
iostat -xz 1 10                    # per-device extended stats
vmstat 1 5                          # bi/bo, blocked tasks, swap
mpstat -P ALL 1 5                   # per-CPU %wa
dstat -tcdmn 1                      # combined view (if installed)

# Per-process I/O
sudo iotop -oP
pidstat -d 1 5

# Block trace (deep)
sudo blktrace -d /dev/nvme0n1 -o trace
# In another shell: hit the workload, then Ctrl-C
sudo blkparse -i trace | head -200

# Filesystem latency
sudo bpftrace -e 'kprobe:vfs_read { @ns[comm] = hist(nsecs); }'   # eBPF
sudo perf trace -e 'block:*' -a sleep 5

# Queue / scheduler
cat /sys/block/<dev>/queue/scheduler
cat /sys/block/<dev>/queue/nr_requests
cat /sys/block/<dev>/queue/read_ahead_kb

# Benchmark (NEVER on live production data)
fio --name=randread --rw=randread --bs=4k --iodepth=32 \
    --runtime=30 --time_based --direct=1 --filename=/dev/<TEST-DEV>

# Mount options
mount | grep <fs>
sudo tune2fs -l /dev/<dev> | head -20    # ext4

# Cloud-specific
# AWS:   CloudWatch VolumeReadBytes/VolumeWriteBytes, BurstBalance
# GCP:   Cloud Monitoring disk/* metrics
# Azure: Premium SSD perf tier vs IOPS used

Differential cheatsheet

Symptom	Likely cause	Confirm
High `await`, low `aqu-sz`	Per-I/O latency (link, controller, encryption)	Per-link tests; check dm-crypt; LUN paths
High `aqu-sz` (>2-4), high `%util`	Device saturation	Compare IOPS/throughput to spec
`%wa` high, `aqu-sz` low	Few synchronous waiters (fsync-heavy)	`pidstat -d`; identify app sync pattern
Sudden cliff	Burst credits exhausted / dm-thin full / ZFS ARC pressure	Provider metrics; `dmsetup status`; `zpool list`
High write `await` only	Cache flush / journal pressure	`commit=`, mount options, log device
Stable IOPS at exactly N	Provider cap	Provider docs for volume class

Common findings this catches

EBS GP3 baseline 3000 IOPS reached → bursts to 16k briefly, then throttle. Provision more, or move to io2.
NVMe at 100% util but 60% of spec → driver queue depth too low; nr_requests tuning or use blk-mq with multiple queues.
await >10ms on local SSD → likely controller/firmware issue or saturation; benchmark to confirm.
data=ordered (ext4 default) with fsync-heavy workload → high write await. Test data=writeback carefully.
dm-thin pool >80% full → silent latency spike from metadata pressure; expand or rebalance.
LVM cache (lvmcache) thrashing when working set > cache size → disable cache or grow.

Tuning starter pack

# Per-workload mount options (ext4, IOPS-heavy DB)
mount -o remount,noatime,nodiratime /data

# Scheduler (NVMe: usually `none`; SSD: `mq-deadline`; HDD: `bfq`)
echo none | sudo tee /sys/block/nvme0n1/queue/scheduler

# Increase request queue depth (NVMe)
echo 1024 | sudo tee /sys/block/nvme0n1/queue/nr_requests

# Reduce dirty page pressure (large memory boxes)
sudo sysctl -w vm.dirty_background_bytes=$((256*1024*1024))
sudo sysctl -w vm.dirty_bytes=$((1024*1024*1024))

When to escalate

Provider throttle suspected — open a support ticket with metric evidence; tuning inside the VM won’t help.
Hardware errors in dmesg (UNC, CRC, link reset) — disk replacement, not tuning.
Database-specific (PostgreSQL/MySQL) sync patterns dominating — coordinate with DBA on commit=, sync vs async replication.

Linux Block I/O Performance Investigation Prompt

Why this prompt works

How to use it

Useful commands

Differential cheatsheet

Common findings this catches

Tuning starter pack

When to escalate

Related prompts

Linux High Load & CPU Runaway Investigation Prompt

Linux Disk Full / Inode Exhaustion Diagnosis Prompt

Linux NUMA Imbalance Investigation Prompt

Why this prompt works

How to use it

Useful commands

Differential cheatsheet

Common findings this catches

Tuning starter pack

When to escalate

Related prompts

Linux High Load & CPU Runaway Investigation Prompt

Linux Disk Full / Inode Exhaustion Diagnosis Prompt

Linux NUMA Imbalance Investigation Prompt

Free: the DevOps AI Incident-Triage Cheat Sheet