Skip to content
CloudOps
Newsletter
All prompts
AI for Linux Admins Difficulty: Advanced ClaudeChatGPT

Linux Block I/O Performance Investigation Prompt

Diagnose slow disk I/O, high iowait, queue depth saturation, and storage performance regressions using iostat, blktrace, fio, and per-device metrics.

Target user
Linux sysadmins, SREs, and DBAs debugging storage performance
Difficulty
Advanced
Tools
Claude, ChatGPT

The prompt

You are a senior Linux performance engineer who has tuned storage stacks across NVMe, SATA SSD, spinning disk, SAN, and software RAID. You can read `iostat -xz` output the way other engineers read application logs.

I will provide:
- The symptom (app latency spike, high iowait, queue depth alerts, slow restore, slow startup)
- System type: physical / VM / cloud (instance type + EBS/Persistent Disk/Local SSD class)
- Output of `iostat -xz 1 10`, `vmstat 1 5`, `mpstat -P ALL 1 5`
- The affected filesystem and mount options (`mount | grep <fs>`)
- The underlying device topology (`lsblk -f`, LVM/mdraid layers)
- Workload characteristics (random vs sequential, read vs write, IO size, fsync rate)

Your job:

1. **Decompose iowait honestly**: `%wa` from `iostat`/`top` is "CPU was idle waiting on I/O" — not a saturation metric. High `%wa` with low queue depth means a few synchronous waiters, not saturation.
2. **For each device** in `iostat -xz`, evaluate:
   - **`r/s` + `w/s`** — IOPS; compare to device-class spec (NVMe ~100k+, SATA SSD ~30k, 7200rpm spinner ~150)
   - **`rkB/s` + `wkB/s`** — throughput; compare to device or link bandwidth (SATA 600MB/s, NVMe 3GB/s+, cloud-EBS per-volume cap)
   - **`avgqu-sz`** / **`aqu-sz`** — average queue depth; high means the device is the bottleneck
   - **`await`** — average ms per I/O (queue + service); compare to device class
   - **`r_await` / `w_await`** — split — slow reads point to seek-heavy or contended, slow writes to dirty-page flush
   - **`%util`** — busy time; misleading on parallel-capable devices (NVMe can sustain 100% util at half capacity)
3. **Common saturation patterns**:
   - **Cloud EBS / Persistent Disk burst credits exhausted** → throughput crashes from "OK" to baseline; check provider metrics
   - **High `await` with low queue depth** → device is slow per-I/O (latency); look at link, controller, contention
   - **High queue depth with low `%util`** → request submission bottleneck (driver, CPU, mq tuning)
   - **Asymmetric reads vs writes** → write cache flush (`barrier`, fsync) bursts; consider `commit=` mount option
   - **Suddenly slow after running fine** → log device space pressure (LVM thin pool 100%, ZFS pool 90%+, dm-crypt under pressure)
4. **For random IOPS-bound workloads** (databases): suggest `noatime,nodiratime`, `data=writeback` (ext4, with risk explanation), correct block scheduler (`mq-deadline` vs `none` vs `bfq`), readahead tuning.
5. **For throughput-bound workloads**: suggest larger I/O sizes, parallelism (`fio --iodepth=`), driver multiqueue (`nr_requests`).
6. **For VMs in cloud**: surface the per-instance and per-volume caps; tuning inside the VM doesn't beat the provider's throttle.
7. **Mark DESTRUCTIVE actions**: changing scheduler on production, resizing under load, mount-option changes that require remount.

---

System: [physical/VM/cloud + instance class]
Storage stack: [NVMe / SATA / EBS / etc. + filesystem + LVM/mdraid layers]
Symptom: [DESCRIBE]
`iostat -xz 1 10`:
```
[PASTE]
```
`vmstat 1 5`:
```
[PASTE]
```
`lsblk -f`, mount options:
```
[PASTE]
```
Workload: [DESCRIBE — IO pattern, sync rate, app]

Why this prompt works

iostat output is a wall of numbers and most “high iowait” debugging stops at “more IOPS!” This prompt forces a column-by-column read so you distinguish slow per-I/O (await) from saturation (queue depth) — entirely different fixes.

How to use it

  1. Always include the device class. “Slow disk” on an NVMe is different from a 7200rpm spinner.
  2. Run iostat over a window, not a single snapshot. iostat -xz 1 10 captures bursts.
  3. For cloud VMs, include provider metrics alongside iostat — internal view can’t see the throttle.
  4. Identify the workload pattern (random vs sequential, sync vs buffered). Tuning differs.

Useful commands

# Triage
iostat -xz 1 10                    # per-device extended stats
vmstat 1 5                          # bi/bo, blocked tasks, swap
mpstat -P ALL 1 5                   # per-CPU %wa
dstat -tcdmn 1                      # combined view (if installed)

# Per-process I/O
sudo iotop -oP
pidstat -d 1 5

# Block trace (deep)
sudo blktrace -d /dev/nvme0n1 -o trace
# In another shell: hit the workload, then Ctrl-C
sudo blkparse -i trace | head -200

# Filesystem latency
sudo bpftrace -e 'kprobe:vfs_read { @ns[comm] = hist(nsecs); }'   # eBPF
sudo perf trace -e 'block:*' -a sleep 5

# Queue / scheduler
cat /sys/block/<dev>/queue/scheduler
cat /sys/block/<dev>/queue/nr_requests
cat /sys/block/<dev>/queue/read_ahead_kb

# Benchmark (NEVER on live production data)
fio --name=randread --rw=randread --bs=4k --iodepth=32 \
    --runtime=30 --time_based --direct=1 --filename=/dev/<TEST-DEV>

# Mount options
mount | grep <fs>
sudo tune2fs -l /dev/<dev> | head -20    # ext4

# Cloud-specific
# AWS:   CloudWatch VolumeReadBytes/VolumeWriteBytes, BurstBalance
# GCP:   Cloud Monitoring disk/* metrics
# Azure: Premium SSD perf tier vs IOPS used

Differential cheatsheet

SymptomLikely causeConfirm
High await, low aqu-szPer-I/O latency (link, controller, encryption)Per-link tests; check dm-crypt; LUN paths
High aqu-sz (>2-4), high %utilDevice saturationCompare IOPS/throughput to spec
%wa high, aqu-sz lowFew synchronous waiters (fsync-heavy)pidstat -d; identify app sync pattern
Sudden cliffBurst credits exhausted / dm-thin full / ZFS ARC pressureProvider metrics; dmsetup status; zpool list
High write await onlyCache flush / journal pressurecommit=, mount options, log device
Stable IOPS at exactly NProvider capProvider docs for volume class

Common findings this catches

  • EBS GP3 baseline 3000 IOPS reached → bursts to 16k briefly, then throttle. Provision more, or move to io2.
  • NVMe at 100% util but 60% of spec → driver queue depth too low; nr_requests tuning or use blk-mq with multiple queues.
  • await >10ms on local SSD → likely controller/firmware issue or saturation; benchmark to confirm.
  • data=ordered (ext4 default) with fsync-heavy workload → high write await. Test data=writeback carefully.
  • dm-thin pool >80% full → silent latency spike from metadata pressure; expand or rebalance.
  • LVM cache (lvmcache) thrashing when working set > cache size → disable cache or grow.

Tuning starter pack

# Per-workload mount options (ext4, IOPS-heavy DB)
mount -o remount,noatime,nodiratime /data

# Scheduler (NVMe: usually `none`; SSD: `mq-deadline`; HDD: `bfq`)
echo none | sudo tee /sys/block/nvme0n1/queue/scheduler

# Increase request queue depth (NVMe)
echo 1024 | sudo tee /sys/block/nvme0n1/queue/nr_requests

# Reduce dirty page pressure (large memory boxes)
sudo sysctl -w vm.dirty_background_bytes=$((256*1024*1024))
sudo sysctl -w vm.dirty_bytes=$((1024*1024*1024))

When to escalate

  • Provider throttle suspected — open a support ticket with metric evidence; tuning inside the VM won’t help.
  • Hardware errors in dmesg (UNC, CRC, link reset) — disk replacement, not tuning.
  • Database-specific (PostgreSQL/MySQL) sync patterns dominating — coordinate with DBA on commit=, sync vs async replication.

Related prompts

Newsletter

Free: the DevOps AI Incident-Triage Cheat Sheet

Subscribe and we’ll send you the one-page cheat sheet — plus weekly AI prompts, automation ideas, and tool reviews for infrastructure engineers. One email a week. No spam, unsubscribe anytime.

  • AI Incident-Triage Cheat Sheet (PDF)
  • Access to 1,603 DevOps AI prompts
  • One practical workflow email per week