Skip to content
DevOps AI ToolKit
Newsletter
All guides
AI for Linux Admins By James Joyner IV · · 10 min read

Linux Error: Out of memory: Killed process — Cause, Fix, and Troubleshooting Guide

How to fix Out of memory: Killed process on Linux: diagnose host OOM vs cgroup memory.max, oom_score, overcommit, swap, and container exit code 137.

  • #linux
  • #troubleshooting
  • #errors
  • #memory

Summary

Out of memory: Killed process is the Linux kernel’s OOM killer terminating a running process because an allocation could not be satisfied and there was nothing left to reclaim. It fires at the moment of a failing allocation — a growing heap, a fork, a page fault that cannot be backed — and the victim is chosen by oom_score, not necessarily the process that triggered it. This is the opposite of Cannot allocate memory/ENOMEM, which rejects an allocation up front before any process dies. The kill is either host-wide (CONSTRAINT_NONE) or scoped to a single cgroup hitting its memory.max/systemd MemoryMax (CONSTRAINT_MEMCG); containers killed this way exit with code 137 (128 + signal 9).

Common Symptoms

  • A process disappears with no application-level error; the service restarts or simply vanishes.
  • dmesg/journal shows Out of memory: Killed process or Memory cgroup out of memory.
  • A container shows OOMKilled status and exit code 137.
  • free -h shows near-zero available and little or no swap right before the kill.
  • A service on a host with plenty of free RAM keeps dying — a tell for a cgroup-scoped kill.

Most Likely Causes of the ‘Out of memory: Killed process’ Error

Ordered by how often they cause Out of memory: Killed process in production:

  1. A cgroup hitting its memory.max / systemd MemoryMax while the host still has free RAM — reported as CONSTRAINT_MEMCG. The single most common self-inflicted case.
  2. A container with no or too-low a memory limit, OOM-killed with exit code 137 (OOMKilled).
  3. A leaking process accumulating the largest RSS and the highest oom_score, selected as the victim.
  4. Host RAM exhausted with no swap to absorb the spike — CONSTRAINT_NONE, host-wide.
  5. Page-cache and dirty-page pressure against large anonymous memory that reclaim cannot free fast enough.
  6. Too-strict overcommit (vm.overcommit_memory=2) denying allocations at the CommitLimit.

Quick Triage

# Did an OOM kill occur, and was it host-wide or cgroup-scoped?
dmesg -T | grep -i -E 'oom|killed process'
journalctl -k | grep -i oom

# Memory state at the time of the kill.
free -h
swapon --show

In the kernel line, the constraint= field is the pivot: CONSTRAINT_NONE is host-wide, CONSTRAINT_MEMCG is a cgroup/container limit. That single word drives the whole investigation.

Diagnostic Commands

dmesg -T | grep -i -E 'oom|killed process' | tail -10
journalctl -k | grep -i oom

Note the victim pid/comm and the constraint= field. Example: oom-kill:constraint=CONSTRAINT_MEMCG,oom_memcg=/system.slice/app.service,task=python3,pid=8123.

free -h                                  # available memory + swap
swapon --show                            # is there any swap to spill to?
cat /proc/meminfo                        # AnonPages, Cached, Dirty, Committed_AS

Swap: 0B with near-zero available means the next sizeable allocation triggers a host-wide kill.

# Highest oom_score = the kernel's next victim.
for p in /proc/[0-9]*; do s=$(cat "$p/oom_score" 2>/dev/null) || continue; \
  printf '%5s %s %s\n' "$s" "${p#/proc/}" "$(cat "$p/comm")"; done | sort -rn | head -5
cat /proc/<pid>/oom_score /proc/<pid>/oom_score_adj

A single process far above the rest is your leaking or largest workload.

systemctl show <unit>.service -p MemoryMax -p MemoryCurrent
cat /sys/fs/cgroup/system.slice/<unit>.service/memory.max
cat /sys/fs/cgroup/system.slice/<unit>.service/memory.current
cat /sys/fs/cgroup/system.slice/<unit>.service/memory.events   # oom / oom_kill counters

MemoryCurrent near MemoryMax with a non-zero oom_kill in memory.events pins the kill to that unit’s own limit (cgroup v2 layout, default on modern Ubuntu/Debian and RHEL/Rocky).

cat /proc/sys/vm/overcommit_memory       # 2 = strict, denies at CommitLimit
cat /proc/sys/vm/overcommit_ratio
sysctl vm.swappiness
grep -E 'CommitLimit|Committed_AS' /proc/meminfo

Rule out a too-strict overcommit policy or a missing swap device before blaming the workload.

systemd-cgtop --order=memory -n1         # live per-cgroup memory ranking
vmstat 1 5                               # si/so swap-in/out, r/b queues
smem -tk                                 # PSS-accurate per-process memory (if installed)

# Containers: confirm the OOM kill and the limit.
docker inspect -f '{{.State.OOMKilled}} {{.State.ExitCode}} mem={{.HostConfig.Memory}}' <container>
docker stats --no-stream <container>

State.OOMKilled=true, ExitCode=137, and usage pinned at the limit confirm a container hit its own ceiling.

Fix / Remediation

Match the fix to the constraint you found — safe, non-destructive changes first.

  1. Raise a cgroup / unit ceiling to the real working set (cgroup-scoped kill):

    sudo systemctl edit <unit>.service
    # [Service]
    # MemoryMax=1G
    sudo systemctl daemon-reload
    sudo systemctl restart <unit>.service
  2. Raise a container’s limit to measured usage plus headroom: docker run --memory=1g ..., or set resources.limits.memory in Kubernetes.

  3. Protect critical daemons so the kernel kills the right workload first:

    sudo systemctl edit sshd.service
    # [Service]
    # OOMScoreAdjust=-800
    sudo systemctl daemon-reload
  4. Add swap so transient spikes spill instead of triggering an immediate kill:

    sudo fallocate -l 2G /swapfile && sudo chmod 600 /swapfile
    sudo mkswap /swapfile && sudo swapon /swapfile
    echo '/swapfile none swap sw 0 0' | sudo tee -a /etc/fstab
  5. Relax strict overcommit if vm.overcommit_memory=2 denies valid allocations at the CommitLimit:

    echo 'vm.overcommit_memory = 0' | sudo tee /etc/sysctl.d/99-overcommit.conf
    sudo sysctl --system
  6. Fix the underlying leak — raising limits only buys time if RSS climbs without bound.

Warning: Manually killing the top-RSS process to recover a wedged host is destructive and can lose in-flight work. Prefer raising limits or adding swap. As a last resort, after identifying the offender: sudo kill <pid> (escalate to kill -9 only if it ignores SIGTERM).

Validation

# oom_kill should stop incrementing after the fix.
cat /sys/fs/cgroup/system.slice/<unit>.service/memory.events
systemctl show <unit>.service -p MemoryMax -p MemoryCurrent

# No new OOM lines since the change.
journalctl -k --since "10 min ago" | grep -i oom

# Container should no longer report OOMKilled.
docker inspect -f '{{.State.OOMKilled}} {{.State.ExitCode}}' <container>

After a correct fix, memory.current peaks below memory.max, the oom_kill counter holds steady, and the restart loop ends. A counter that keeps climbing means the working set still exceeds the limit or the process is leaking.

Prevention

  • Right-size cgroup limits (MemoryMax, container --memory) to the measured working set plus headroom — under-sized limits cause self-inflicted OOM while the host sits idle.
  • Use systemd MemoryHigh= as a soft throttle below the hard MemoryMax= so a workload is slowed before it is killed.
  • Tune vm.overcommit_memory/overcommit_ratio deliberately and always provision some swap or zram, even a modest swapfile, on hosts that lack it.
  • Protect critical daemons with OOMScoreAdjust (e.g. -800 for sshd/monitoring) so the kernel picks the right victim.
  • Alert on cgroup v2 memory.events oom_kill counters and on container exit code 137; a rising count is an early warning before a full outage.
  • Baseline leak-prone services with systemd-cgtop and free -h trends so a slow climb is caught before it reaches the ceiling.

Final Notes

Out of memory: Killed process means an allocation could not be satisfied and the kernel killed a process to recover — either host-wide or inside a single cgroup. Read the kernel message first and let the constraint= field decide the path: CONSTRAINT_MEMCG sends you to a unit/container limit, CONSTRAINT_NONE to host memory and swap. Then size limits and swap to the real working set — most OOM kills are simply a limit set smaller than the workload actually needs.

Want faster Linux incident response? Use DevOps AI Toolkit to turn production errors into clear diagnostics, remediation steps, and reusable runbooks.

Free download · 368-page PDF

Download the Free 500-Prompt DevOps AI Toolkit

500 battle-tested, copy-paste AI prompts engineered by a senior systems engineer — every one with fill-in placeholders and safety/back-out notes. Drop your email and it's yours.

  • 500 prompts: Linux · Kubernetes · Terraform · OpenStack · GitLab · Docker · Monitoring · Incident Response
  • Instant PDF download — yours free, forever
  • Plus one practical AI-workflow email a week (no spam)

Single opt-in · unsubscribe anytime · no spam.