Linux OOM Kill & Memory Pressure Investigation Prompt
Diagnose OOM kills, memory pressure, swap thrashing, slab bloat, and cgroup memory limit failures on Linux servers from dmesg OOM banners and /proc data.
- Target user
- Linux sysadmins, SREs, and on-call engineers
- Difficulty
- Advanced
- Tools
- Claude, ChatGPT
The prompt
You are a senior Linux performance engineer who can read an OOM-killer banner in `dmesg` like a book — process scores, RSS columns, slab stats, cgroup boundaries. You know which "memory pressure" alerts are real and which are just page cache doing its job. I will provide: - The symptom (OOM kill of a specific process, swap thrashing, app reports allocation failures, latency spikes traced to memory pressure) - System info: total RAM, swap config, whether running in a container/VM, cgroup v1 vs v2 - `dmesg | grep -A30 -i "out of memory"` or `journalctl -k --since "1h ago" | grep -A30 -i oom` - `free -h` and `cat /proc/meminfo` - `ps auxf --sort=-rss | head -20` (top RSS consumers) - For container/k8s: `cat /sys/fs/cgroup/memory.max` (cgroup v2) and `memory.current` for the affected cgroup - App-side context: GC logs (if JVM/.NET), language runtime (Python, Go, Node) Your job: 1. **Decode the OOM banner** if present: - Which cgroup triggered the OOM (`oom-kill: ... oom_memcg=...`)? - Was it system-wide OOM or container/cgroup OOM? - Which process was killed and what was its `oom_score`? - Was the kill due to RSS exceeding limit, or due to host RAM exhaustion? 2. **Account for memory honestly** using `/proc/meminfo`: - `MemTotal - MemAvailable` = real "in use" memory - `Buffers + Cached + SReclaimable` = page cache (will shrink under pressure, NOT lost) - `Slab - SReclaimable` = SUnreclaim (kernel data structures, won't free) - `AnonPages` + `Mapped` = process anonymous memory (the actual scary number) 3. **Distinguish "low MemFree" from "low MemAvailable"** — page cache is good, not bad. Alerting on `MemFree` is almost always wrong. 4. **Check for slab bloat** (`SUnreclaim` growth) — usually kernel object leak (dentry cache, inode cache, network connections). 5. **Check swap behavior** — `swappiness`, `pswpin/pswpout` rates. Thrashing (constant in+out) is worse than no swap. 6. **For cgroup v2**: check `memory.events` (`high`, `max`, `oom`, `oom_kill` counters) and `memory.pressure` (PSI). 7. **Identify the leak source**: process-level RSS growth, kernel slab growth, anonymous huge pages, transparent hugepage compaction overhead, or zombie/orphaned memory. 8. **Recommend the fix**: cgroup limit adjust, swappiness tuning, THP off, oom_score_adj, application memory tuning. Mark anything DESTRUCTIVE. Common failure classes to surface: - Container OOM killed but host has free RAM → cgroup limit set too low - Host OOM under "plenty of free memory" → page cache being reclaimed too slowly (zone reclaim, NUMA imbalance) - Slow memory leak in production app → RSS grows steadily, GC unable to recover - Slab cache bloat from container churn → millions of dentries from short-lived files - THP compaction storms → high `%sy` CPU during memory pressure - The OOM killer picked the wrong process → `oom_score_adj` not set on critical service --- System type: [bare metal / VM / container] Total RAM: [N GB] Swap config: [N GB / none / zram] Cgroup version: [v1 / v2] Distro + kernel: [e.g., Ubuntu 22.04, 5.15.0-...] Symptom: [DESCRIBE] OOM banner (`dmesg` or `journalctl -k`): ``` [PASTE] ``` `free -h`: ``` [PASTE] ``` `cat /proc/meminfo`: ``` [PASTE first 30 lines] ``` Top RSS processes: ``` [PASTE `ps auxf --sort=-rss | head -20`] ``` Cgroup limits (if applicable): ``` [PASTE memory.max, memory.current, memory.events] ```
Why this prompt works
OOM kills are usually misdiagnosed. The most common wrong answer is “we need more RAM” when the actual problem is a cgroup limit too low, a slab leak, or page cache being mistaken for “used” memory. This prompt forces honest memory accounting via /proc/meminfo and decodes the OOM banner properly.
How to use it
- Always include the OOM banner. It tells you which cgroup, what limit, which process, and the oom_score table. Without it you’re guessing.
free -halone is not enough. Includecat /proc/meminfo | head -30. The page cache vs anon split matters.- For containers: include the cgroup files. Without them, you can’t distinguish container OOM from host OOM.
- If you suspect a slab leak, capture
slabtop -o | head -30— top kernel slab consumers tell you what’s leaking (oftendentryorkmalloc-*).
Useful commands
# OOM evidence
sudo dmesg -T | grep -A30 -i "out of memory"
sudo journalctl -k --since "1 hour ago" | grep -A30 -i oom
sudo journalctl _TRANSPORT=kernel | grep -A20 oom
# Honest memory accounting
free -h
cat /proc/meminfo | head -30
cat /proc/vmstat | grep -E "pgscan|pgsteal|pgfault|pswp|oom"
# Top RSS consumers
ps auxf --sort=-rss | head -20
# Sum RSS by command name
ps -eo rss,comm | sort -k1 -n | awk '{a[$2]+=$1} END {for (i in a) print a[i], i}' | sort -n | tail
# Process-detailed memory
cat /proc/<pid>/status | grep -E "Vm|Rss"
cat /proc/<pid>/smaps_rollup
pmap -X <pid> | tail -5
# Slab cache
sudo slabtop -o | head -30
sudo cat /proc/slabinfo | sort -k2 -n -r | head -20
# Cgroup v2 (modern systemd / k8s)
cat /sys/fs/cgroup/<slice>/memory.max
cat /sys/fs/cgroup/<slice>/memory.current
cat /sys/fs/cgroup/<slice>/memory.events
cat /sys/fs/cgroup/<slice>/memory.pressure
# Cgroup v1 (legacy)
cat /sys/fs/cgroup/memory/<slice>/memory.limit_in_bytes
cat /sys/fs/cgroup/memory/<slice>/memory.usage_in_bytes
# Swap activity
vmstat 1 5 # si/so columns
sar -B 1 5 # paging stats
sar -W 1 5 # swap rate
# THP
cat /sys/kernel/mm/transparent_hugepage/enabled
grep -i AnonHugePages /proc/meminfo
# Per-process OOM score
cat /proc/<pid>/oom_score
cat /proc/<pid>/oom_score_adj
Common findings this catches
- Container OOM but host has 20G free →
memory.maxset too low. Either raise the limit or fix the app’s working set. - Slow leak in
kmalloc-128slab → kernel object leak; often a driver bug. Checkslabtopdeltas over time. - “OOM” but
Cached: 80%of RAM → page cache wasn’t reclaimed fast enough. Often NUMA-zone issue; checknumastat -m. - OOM killed sshd →
oom_score_adjnot set on critical services. Add to systemd unit:[Service] OOMScoreAdjust=-900 - THP compaction storms →
%syCPU spikes during memory pressure. Disable for databases:echo never | sudo tee /sys/kernel/mm/transparent_hugepage/enabled - Swap thrashing (
si/soboth > 1000 sec) → setvm.swappiness=1and consider zram for ephemeral nodes.
Memory accounting cheatsheet
MemTotal - total RAM (excludes a few reserved regions)
MemFree - completely unused (NOT what you should alert on)
MemAvailable - free + reclaimable cache (THIS is "free for new allocs")
Buffers - block device cache
Cached - file page cache (good thing!)
SReclaimable - reclaimable slab (dentry, inode caches)
SUnreclaim - non-reclaimable slab (kernel objects; leaks live here)
AnonPages - anonymous (process heap/stack); the real "memory in use"
Mapped - mmap'd files (in Cached but charged here too)
Shmem - tmpfs / shared memory (counts as "used")
PageTables - kernel page-table overhead (grows with processes × VM)
Slab - SReclaimable + SUnreclaim
Permanent fixes worth applying after recovery
# systemd unit hardening for a critical service
[Service]
OOMScoreAdjust=-900
MemoryHigh=4G # soft pressure throttle (cgroup v2)
MemoryMax=6G # hard limit (cgroup v2)
# sysctl baseline (review per workload)
vm.swappiness = 10
vm.min_free_kbytes = 524288 # 512 MB reserve on a 16+ GB box
vm.dirty_ratio = 10
vm.dirty_background_ratio = 5
When to escalate
- Suspected kernel slab leak (
SUnreclaimgrowing for days with no userspace correlation) — engage kernel team; reproduce, captureslabinfodeltas. - Repeated OOMs on a container with stable working set — application change or limit is wrong; coordinate with app owner.
- OOM killing system-critical processes (sshd, systemd, kubelet) — fix
OOMScoreAdjusturgently and root-cause the leaker.
Related prompts
-
Linux High Load & CPU Runaway Investigation Prompt
Diagnose high load average, CPU saturation, run-queue pressure, IRQ storms, and steal time on Linux servers — distinguish user CPU vs system CPU vs I/O wait vs steal.
-
Linux Disk Full / Inode Exhaustion Diagnosis Prompt
Diagnose why a Linux filesystem is full or out of inodes — including deleted-but-held files, journal bloat, reserved blocks, and hidden mount-shadowed data.
-
Linux Server Troubleshooting Prompt
Help diagnose CPU, memory, disk, network, and service issues on Ubuntu or RHEL servers from raw command output.