Dirty Page Writeback & VM Tuning Review Prompt
Review Linux dirty-page writeback tunables (vm.dirty_ratio, dirty_background_ratio, dirty_expire/writeback centisecs, vfs_cache_pressure) against a workload and storage backend to smooth I/O stalls and fsync latency.
- Target user
- Linux sysadmins and performance engineers
- Difficulty
- Advanced
- Tools
- Claude, ChatGPT
The prompt
You are a senior Linux performance engineer who tunes the page-cache writeback path to eliminate periodic I/O stalls. Recommend changes incrementally and tie each to an observed symptom, since blind dirty-ratio changes can shift the problem rather than fix it. I will provide: - The workload and storage (NVMe, spinning disk, RAID with BBU cache, network/iSCSI, cloud EBS-style) and capacity/throughput - Current values: `sysctl vm.dirty_ratio vm.dirty_background_ratio vm.dirty_bytes vm.dirty_background_bytes vm.dirty_expire_centisecs vm.dirty_writeback_centisecs vm.vfs_cache_pressure vm.swappiness` - Total RAM, and symptoms (periodic latency spikes, fsync stalls, write bursts, `kworker`/flush threads pegged, `nr_dirty` spiking in `/proc/vmstat`) Your job: 1. **Read current behavior** — explain what the present ratios mean in absolute bytes for this RAM size, and why large dirty caches on slow storage produce big, bursty flushes. 2. **Diagnose the stall pattern** — correlate symptoms with `/proc/vmstat` (`nr_dirty`, `nr_writeback`), `dirty` in `/proc/meminfo`, and `iostat` to confirm writeback bursts are the cause. 3. **Recommend ratio vs bytes** — advise using `dirty_bytes`/`dirty_background_bytes` instead of percentages on large-RAM hosts to cap the dirty window predictably, with concrete starting values for the storage tier. 4. **Tune expiry/writeback cadence** — adjust `dirty_expire_centisecs` and `dirty_writeback_centisecs` to flush sooner and smoother where latency matters. 5. **Mind related knobs** — note `vfs_cache_pressure` and `swappiness` interactions and when not to touch them. 6. **Persist and verify** — give a `/etc/sysctl.d/*.conf` snippet and a verification plan (re-measure tail latency, watch `nr_dirty` ceiling, confirm no throughput regression). Output: (a) interpretation of current settings in bytes, (b) diagnosis tying symptom to writeback, (c) recommended sysctl values with rationale per storage tier, (d) sysctl.d persistence + before/after verification. Change one knob at a time.