Linux Context Switch & Lock Contention Diagnosis Prompt
Diagnose context-switch storms, futex contention, kernel-level lock waits, and CPU scheduling pathologies that masquerade as 'app is slow.'
- Target user
- Linux performance engineers and SREs
- Difficulty
- Advanced
- Tools
- Claude, ChatGPT
The prompt
You are a senior Linux performance engineer who can spot context-switch storms (>100k/sec) and futex contention from `vmstat`, `pidstat -w`, and `perf sched` output, and tell whether the contention is voluntary (waiting for I/O / lock) or involuntary (preempted by scheduler). I will provide: - The symptom (app slow under load, latency spike, CPU% low but throughput poor, "too many threads" report) - System info: vCPU count, kernel version, container/VM/bare metal - Output of `vmstat 1 5` (focus on `cs` and `in` columns) - `pidstat -w 1 5` (voluntary / involuntary switches per process) - `pidstat -t -w 1 5` (per-thread) - For a specific process: `cat /proc/<pid>/status | grep ctxt` to see lifetime totals - Optional: `perf stat -e context-switches,cs,cpu-migrations,page-faults <pid>`, `perf sched record`, futex stats Your job: 1. **Classify the rate**: - `vmstat cs` < 10k/sec → normal - 10-50k/sec → notable; investigate if correlated with latency - >50-100k/sec → high; likely a problem - >500k/sec → pathological; almost always lock contention or runaway thread pool 2. **Voluntary vs involuntary switches** (from `pidstat -w`): - **Voluntary (`cswch/s`)** — thread willingly gives up CPU (sleeping for I/O, futex wait, condition variable) - **Involuntary (`nvcswch/s`)** — kernel preempted thread (timeslice expired, higher-priority work) - High voluntary + low CPU utilization → contention or I/O blocking - High involuntary + high CPU → too many runnable threads competing for CPU (over-threading) 3. **For futex contention**: `perf stat -e futex:* -p <pid>` shows futex syscall rates; high counts with low forward progress indicate contended mutex/condvar. 4. **For runqueue pressure**: `vmstat r` column > vCPU count means threads waiting for CPU. Pair with high involuntary switches. 5. **Common root causes**: - **Over-threaded app**: thread pool size > vCPU count × 2-4 = constant preemption. Reduce pool. - **Hot mutex**: one lock serializes all threads. Refactor (lock-free / striped locking) or accept the cap. - **Spin-loops**: app polling for state; CPU-bound, frequent kernel transitions - **Excessive timer interrupts** (`vmstat in`): high-resolution timers from monitoring, kernel tick rate - **NUMA cross-node migration**: `perf stat cpu-migrations` shows migration rate; pin threads if hot - **Container with low CPU quota**: cgroup throttling causes mass involuntary switches at the period boundary 6. **For databases / app servers**: link to specific tuning (connection pool size = vCPU × 1.5-2 for I/O bound, = vCPU for CPU bound). 7. **For Java apps**: GC pauses look like voluntary-switch storms; check GC logs alongside. 8. **For Go apps**: GOMAXPROCS = container vCPU (Go 1.19+ respects cgroup); too high = futex contention in runtime. Mark DESTRUCTIVE: changing thread pool size live (may queue/drop work), `nice`-ing a critical process, `taskset` while a load is running. --- Symptom: [DESCRIBE] vCPU count: [N] `vmstat 1 5`: ``` [PASTE] ``` `pidstat -w 1 5` (top processes): ``` [PASTE] ``` `pidstat -t -w -p <pid> 1 5` (per-thread for suspect): ``` [PASTE] ``` App context: [language, thread/worker pool size, expected concurrency]
Why this prompt works
“App is slow but CPU is low” is the most-misdiagnosed performance issue. The answer is usually contention — futex, hot mutex, or scheduler pressure from over-threading — visible in cs rates and voluntary/involuntary split, not in top CPU%. This prompt forces those metrics into focus.
How to use it
- Always capture
pidstat -walongsidevmstat. The per-process split is diagnostic. - Capture during a load, not at rest.
- For container workloads, include cgroup CPU stats (
cat /sys/fs/cgroup/cpu.stat). Throttling shows up there. - Mention language runtime — JVM/Go/Python each have distinct contention patterns.
Useful commands
# Triage
vmstat 1 10 # cs (switches) + in (interrupts) + r (runq) + b (blocked)
pidstat -w 1 5 # per-process voluntary/involuntary
pidstat -t -w -p <pid> 1 5 # per-thread
# Lifetime totals
grep ctxt /proc/<pid>/status
# voluntary_ctxt_switches: N
# nonvoluntary_ctxt_switches: N
# Perf counters (low overhead)
perf stat -e context-switches,cs,cpu-migrations -p <pid> sleep 10
perf stat -e futex:futex_wait,futex:futex_wake -p <pid> sleep 10
# Sched detail (heavier)
sudo perf sched record -p <pid> -- sleep 5
sudo perf sched latency | head -20
sudo perf sched timehist | head -50
# Lock-contention specific (Java, with -XX:+PrintConcurrentLocks or jstack)
jstack <pid> | grep -A2 "BLOCKED"
# Off-CPU profile (where is time spent NOT running?)
sudo perf record -e sched:sched_switch -p <pid> -g -- sleep 5
sudo perf script | awk '/sched_switch/ { print $0 }' | head
# eBPF (more efficient on busy systems)
sudo bpftrace -e 'tracepoint:sched:sched_switch { @[comm] = count(); }'
# Cgroup throttling
cat /sys/fs/cgroup/<slice>/cpu.stat # nr_throttled, throttled_usec
cat /sys/fs/cgroup/<slice>/cpu.max # quota / period
# CPU migration rate
perf stat -e cpu-migrations -p <pid> sleep 10
Common findings this catches
- Java thread pool = 200, vCPU = 4 → massive involuntary switches; reduce pool to 8-16 for CPU-bound, 32-64 for I/O-bound.
- Go service GOMAXPROCS unbounded on container with 2 vCPU quota → runtime treats it as 64; futex storms. Set
GOMAXPROCS=2. - Container at 100% CPU quota for 100ms then idle 900ms → cgroup CPU throttling causing periodic latency. Raise quota or remove.
- MySQL
Threads_running>> CPU count under load → connection pool over-sized; latency goes up, not throughput. vmstat in> 100k/sec → interrupt storm; check/proc/interruptsfor runaway NIC IRQ or timer.- Voluntary switches at 50k/sec, app idle by CPU → blocked on I/O or hot lock; correlate with
iostatorperf futex.
Differential cheatsheet
| Pattern | Likely | Fix |
|---|---|---|
| High voluntary, low CPU | I/O blocking or lock contention | Profile blocked threads; reduce contention |
| High involuntary, high CPU | Over-threaded | Reduce thread/worker pool |
High cs + spiky latency | GC pauses or stop-the-world | GC tuning |
High in (interrupts) | IRQ storm | Check /proc/interrupts, NIC pinning |
r > vCPU consistently | Runqueue pressure | Scale up CPU, reduce thread count |
| Throttled time > 0 | Container CPU cap | Raise quota |
When to escalate
- Hot kernel-side lock contention (visible in
perf topshowing_raw_spin_locknear top) — kernel-level issue; engage platform team. - App-internal lock contention requiring code change — escalate to app owner with
perfevidence. - Suspected hypervisor steal hidden as “involuntary switches” — check
%stintop; cloud-side fix.
Related prompts
-
Linux High Load & CPU Runaway Investigation Prompt
Diagnose high load average, CPU saturation, run-queue pressure, IRQ storms, and steal time on Linux servers — distinguish user CPU vs system CPU vs I/O wait vs steal.
-
Linux NUMA Imbalance Investigation Prompt
Diagnose NUMA-related performance issues — cross-node memory access, allocation imbalance, scheduler migration, and how to pin workloads to nodes.
-
Linux `perf` & Flame Graph Profiling Prompt
Profile a Linux process with `perf record` and generate flame graphs to find CPU hotspots, off-CPU waits, and frequent stack patterns.