Taming the Linux OOM Killer: Tuning Out-of-Memory Behavior
The OOM killer always seems to kill the wrong process. Here's how Linux decides what to kill, and how to tune oom_score, cgroups, and overcommit to control it.
- #linux
- #memory
- #oom
- #cgroups
- #performance
- #tuning
There’s a particular flavor of 3am page where a server is up, the load is fine, but your main service is just… gone. No crash log, no panic, just an entry in dmesg that says the kernel killed it. That’s the OOM killer, and the cruel irony is that it almost always seems to kill the one process you cared about and spare the leaky one that caused the problem. After 25 years I’ve made peace with it, mostly by learning how it actually decides. Here’s that, plus how to bend it to your will.
When the OOM killer fires (and when it doesn’t)
The OOM killer is invoked when the kernel cannot satisfy a memory allocation and cannot reclaim enough by other means. Crucially, it fires based on committed allocations the kernel must back with real pages — not on the friendly “free memory” number people watch. A box with 90% memory used can be perfectly healthy if most of that is reclaimable cache.
First, confirm it was actually OOM and find the victim:
dmesg -T | grep -i -E 'killed process|out of memory|oom'
journalctl -k | grep -i oom
The kernel prints a table of every process, its RSS, and its oom_score at kill time. That table is gold — it tells you exactly what was consuming memory at the moment of death, which is often a different process than the one that died.
How the kernel picks a victim
Each process gets an oom_score derived mainly from how much memory it uses, adjusted by oom_score_adj (range -1000 to +1000). Higher score, more likely to die.
cat /proc/<pid>/oom_score # current computed score
cat /proc/<pid>/oom_score_adj # your tunable bias
Because the score scales with memory footprint, big well-behaved processes (databases, JVMs) are natural targets even when they’re the victim, not the cause. That’s the root of the “it killed the wrong thing” feeling.
Protecting a critical process
Lower a process’s chance of being chosen by setting a negative oom_score_adj. Setting it to -1000 makes a process effectively unkillable by OOM:
# Protect a running process
echo -800 | sudo tee /proc/$(pgrep -f myservice)/oom_score_adj
For services, do it declaratively in the systemd unit so it survives restarts:
[Service]
OOMScoreAdjust=-800
Conversely, you can mark a known-greedy batch job as more killable with a positive value, so the kernel sacrifices it first and leaves your database alone. This is the single most effective OOM tuning move: don’t try to stop OOM, just steer it toward the disposable process.
Use cgroups to contain the leak instead
Tuning oom_score_adj is triage. The real fix is to stop one process from being able to starve the whole box. cgroup v2 memory limits do that — when a cgroup hits its limit, the kernel reclaims or OOM-kills within that cgroup, leaving the rest of the system untouched.
With systemd, that’s two directives:
[Service]
MemoryMax=2G # hard cap; OOM-kills inside this unit at the limit
MemoryHigh=1500M # soft cap; throttles + reclaims before the hard limit
MemoryHigh is underrated. It puts back-pressure on the cgroup well before the hard limit, so a slow leak gets throttled and shows up as latency you can alert on, instead of a sudden kill. Check live usage:
systemctl show myservice -p MemoryCurrent
cat /sys/fs/cgroup/system.slice/myservice.service/memory.current
The overcommit knob
Linux lets processes allocate more virtual memory than physically exists, betting they won’t touch it all. That’s vm.overcommit_memory:
0(default) — heuristic; allows reasonable overcommit1— always overcommit; never refuse an allocation (risky, used by some in-memory DBs)2— strict; refuse allocations beyondswap + RAM * overcommit_ratio
sysctl vm.overcommit_memory
sysctl vm.overcommit_ratio
Mode 2 trades “random late OOM kill” for “malloc fails early and predictably.” For a single-purpose box running one critical service, that predictability can be worth it — the app gets a clean allocation error instead of a surprise execution. Test it hard before committing; many applications handle malloc failure poorly.
Don’t forget swap and the early-OOM idea
A little swap gives the kernel somewhere to push cold pages so it isn’t forced to kill on a transient spike. But a system that’s thrashing in swap is arguably worse than a clean OOM kill — it goes unresponsive for minutes. Tools like earlyoom or systemd-oomd watch pressure (PSI) and kill earlier and more selectively than the kernel’s last-resort killer, keeping the box responsive:
systemctl status systemd-oomd
cat /proc/pressure/memory # PSI: how stalled the system is on memory
/proc/pressure/memory is the metric to alert on. Rising some/full averages mean memory pressure before anything dies.
A practical playbook
- Confirm OOM and read the kill-time table in
dmesg. - Identify the cause process vs the victim process from that table.
- Cap the cause with cgroup
MemoryMax/MemoryHighso it can’t starve the box. - Protect the critical service with
OOMScoreAdjust. - Alert on
/proc/pressure/memory, not on free memory.
Where AI helps
The kill-time process table in dmesg is dense and easy to misread under pressure. Pasting it into a model and asking “which process actually caused this and which was collateral, ranked by RSS and oom_score” turns a wall of numbers into a ranked answer fast. I keep a few Linux admin prompts for exactly this kind of log triage.
The OOM killer isn’t your enemy; it’s a last-resort safety valve doing its best with bad information. Give it better information — cgroup limits and score adjustments — and it’ll start killing the right thing.
Generated commands and configs are assistive, not authoritative. Always verify against your own systems before applying changes in production.
Download the Free 500-Prompt DevOps AI Toolkit
500 battle-tested, copy-paste AI prompts engineered by a senior systems engineer — every one with fill-in placeholders and safety/back-out notes. Drop your email and it's yours.
- 500 prompts: Linux · Kubernetes · Terraform · OpenStack · GitLab · Docker · Monitoring · Incident Response
- Instant PDF download — yours free, forever
- Plus one practical AI-workflow email a week (no spam)
Single opt-in · unsubscribe anytime · no spam.