Linux Error: Out of memory: Killed process — Cause, Fix, and Troubleshooting Guide
How to fix Out of memory: Killed process on Linux: diagnose host OOM vs cgroup memory.max, oom_score, overcommit, swap, and container exit code 137.
- #linux
- #troubleshooting
- #errors
- #memory
Summary
Out of memory: Killed process is the Linux kernel’s OOM killer terminating a running process because an allocation could not be satisfied and there was nothing left to reclaim. It fires at the moment of a failing allocation — a growing heap, a fork, a page fault that cannot be backed — and the victim is chosen by oom_score, not necessarily the process that triggered it. This is the opposite of Cannot allocate memory/ENOMEM, which rejects an allocation up front before any process dies. The kill is either host-wide (CONSTRAINT_NONE) or scoped to a single cgroup hitting its memory.max/systemd MemoryMax (CONSTRAINT_MEMCG); containers killed this way exit with code 137 (128 + signal 9).
Common Symptoms
- A process disappears with no application-level error; the service restarts or simply vanishes.
dmesg/journal showsOut of memory: Killed processorMemory cgroup out of memory.- A container shows
OOMKilledstatus and exit code137. free -hshows near-zeroavailableand little or no swap right before the kill.- A service on a host with plenty of free RAM keeps dying — a tell for a cgroup-scoped kill.
Most Likely Causes of the ‘Out of memory: Killed process’ Error
Ordered by how often they cause Out of memory: Killed process in production:
- A cgroup hitting its
memory.max/ systemdMemoryMaxwhile the host still has free RAM — reported asCONSTRAINT_MEMCG. The single most common self-inflicted case. - A container with no or too-low a memory limit, OOM-killed with exit code 137 (
OOMKilled). - A leaking process accumulating the largest RSS and the highest
oom_score, selected as the victim. - Host RAM exhausted with no swap to absorb the spike —
CONSTRAINT_NONE, host-wide. - Page-cache and dirty-page pressure against large anonymous memory that reclaim cannot free fast enough.
- Too-strict overcommit (
vm.overcommit_memory=2) denying allocations at theCommitLimit.
Quick Triage
# Did an OOM kill occur, and was it host-wide or cgroup-scoped?
dmesg -T | grep -i -E 'oom|killed process'
journalctl -k | grep -i oom
# Memory state at the time of the kill.
free -h
swapon --show
In the kernel line, the constraint= field is the pivot: CONSTRAINT_NONE is host-wide, CONSTRAINT_MEMCG is a cgroup/container limit. That single word drives the whole investigation.
Diagnostic Commands
dmesg -T | grep -i -E 'oom|killed process' | tail -10
journalctl -k | grep -i oom
Note the victim pid/comm and the constraint= field. Example: oom-kill:constraint=CONSTRAINT_MEMCG,oom_memcg=/system.slice/app.service,task=python3,pid=8123.
free -h # available memory + swap
swapon --show # is there any swap to spill to?
cat /proc/meminfo # AnonPages, Cached, Dirty, Committed_AS
Swap: 0B with near-zero available means the next sizeable allocation triggers a host-wide kill.
# Highest oom_score = the kernel's next victim.
for p in /proc/[0-9]*; do s=$(cat "$p/oom_score" 2>/dev/null) || continue; \
printf '%5s %s %s\n' "$s" "${p#/proc/}" "$(cat "$p/comm")"; done | sort -rn | head -5
cat /proc/<pid>/oom_score /proc/<pid>/oom_score_adj
A single process far above the rest is your leaking or largest workload.
systemctl show <unit>.service -p MemoryMax -p MemoryCurrent
cat /sys/fs/cgroup/system.slice/<unit>.service/memory.max
cat /sys/fs/cgroup/system.slice/<unit>.service/memory.current
cat /sys/fs/cgroup/system.slice/<unit>.service/memory.events # oom / oom_kill counters
MemoryCurrent near MemoryMax with a non-zero oom_kill in memory.events pins the kill to that unit’s own limit (cgroup v2 layout, default on modern Ubuntu/Debian and RHEL/Rocky).
cat /proc/sys/vm/overcommit_memory # 2 = strict, denies at CommitLimit
cat /proc/sys/vm/overcommit_ratio
sysctl vm.swappiness
grep -E 'CommitLimit|Committed_AS' /proc/meminfo
Rule out a too-strict overcommit policy or a missing swap device before blaming the workload.
systemd-cgtop --order=memory -n1 # live per-cgroup memory ranking
vmstat 1 5 # si/so swap-in/out, r/b queues
smem -tk # PSS-accurate per-process memory (if installed)
# Containers: confirm the OOM kill and the limit.
docker inspect -f '{{.State.OOMKilled}} {{.State.ExitCode}} mem={{.HostConfig.Memory}}' <container>
docker stats --no-stream <container>
State.OOMKilled=true, ExitCode=137, and usage pinned at the limit confirm a container hit its own ceiling.
Fix / Remediation
Match the fix to the constraint you found — safe, non-destructive changes first.
-
Raise a cgroup / unit ceiling to the real working set (cgroup-scoped kill):
sudo systemctl edit <unit>.service # [Service] # MemoryMax=1G sudo systemctl daemon-reload sudo systemctl restart <unit>.service -
Raise a container’s limit to measured usage plus headroom:
docker run --memory=1g ..., or setresources.limits.memoryin Kubernetes. -
Protect critical daemons so the kernel kills the right workload first:
sudo systemctl edit sshd.service # [Service] # OOMScoreAdjust=-800 sudo systemctl daemon-reload -
Add swap so transient spikes spill instead of triggering an immediate kill:
sudo fallocate -l 2G /swapfile && sudo chmod 600 /swapfile sudo mkswap /swapfile && sudo swapon /swapfile echo '/swapfile none swap sw 0 0' | sudo tee -a /etc/fstab -
Relax strict overcommit if
vm.overcommit_memory=2denies valid allocations at theCommitLimit:echo 'vm.overcommit_memory = 0' | sudo tee /etc/sysctl.d/99-overcommit.conf sudo sysctl --system -
Fix the underlying leak — raising limits only buys time if RSS climbs without bound.
Warning: Manually killing the top-RSS process to recover a wedged host is destructive and can lose in-flight work. Prefer raising limits or adding swap. As a last resort, after identifying the offender:
sudo kill <pid>(escalate tokill -9only if it ignoresSIGTERM).
Validation
# oom_kill should stop incrementing after the fix.
cat /sys/fs/cgroup/system.slice/<unit>.service/memory.events
systemctl show <unit>.service -p MemoryMax -p MemoryCurrent
# No new OOM lines since the change.
journalctl -k --since "10 min ago" | grep -i oom
# Container should no longer report OOMKilled.
docker inspect -f '{{.State.OOMKilled}} {{.State.ExitCode}}' <container>
After a correct fix, memory.current peaks below memory.max, the oom_kill counter holds steady, and the restart loop ends. A counter that keeps climbing means the working set still exceeds the limit or the process is leaking.
Prevention
- Right-size cgroup limits (
MemoryMax, container--memory) to the measured working set plus headroom — under-sized limits cause self-inflicted OOM while the host sits idle. - Use systemd
MemoryHigh=as a soft throttle below the hardMemoryMax=so a workload is slowed before it is killed. - Tune
vm.overcommit_memory/overcommit_ratiodeliberately and always provision some swap orzram, even a modest swapfile, on hosts that lack it. - Protect critical daemons with
OOMScoreAdjust(e.g.-800for sshd/monitoring) so the kernel picks the right victim. - Alert on cgroup v2
memory.eventsoom_killcounters and on container exit code 137; a rising count is an early warning before a full outage. - Baseline leak-prone services with
systemd-cgtopandfree -htrends so a slow climb is caught before it reaches the ceiling.
Related Errors
- Cannot allocate memory — the
ENOMEMcounterpart that rejects allocations before any process dies. - fork: Resource temporarily unavailable — process/thread creation failing under memory or PID pressure.
- TCP: out of memory — kernel socket-buffer memory exhaustion.
- Too many open files — a related resource-limit failure.
- Segmentation fault (core dumped) — a different class of process crash.
- Taming the Linux OOM killer — deep dive on
oom_scoreand tuning the killer. - Linux Error Guides hub — the full index of Linux error walkthroughs.
Final Notes
Out of memory: Killed process means an allocation could not be satisfied and the kernel killed a process to recover — either host-wide or inside a single cgroup. Read the kernel message first and let the constraint= field decide the path: CONSTRAINT_MEMCG sends you to a unit/container limit, CONSTRAINT_NONE to host memory and swap. Then size limits and swap to the real working set — most OOM kills are simply a limit set smaller than the workload actually needs.
Want faster Linux incident response? Use DevOps AI Toolkit to turn production errors into clear diagnostics, remediation steps, and reusable runbooks.
Download the Free 500-Prompt DevOps AI Toolkit
500 battle-tested, copy-paste AI prompts engineered by a senior systems engineer — every one with fill-in placeholders and safety/back-out notes. Drop your email and it's yours.
- 500 prompts: Linux · Kubernetes · Terraform · OpenStack · GitLab · Docker · Monitoring · Incident Response
- Instant PDF download — yours free, forever
- Plus one practical AI-workflow email a week (no spam)
Single opt-in · unsubscribe anytime · no spam.