Skip to content
DevOps AI ToolKit
Newsletter
All guides
AI for Linux Admins By James Joyner IV · · 10 min read

Linux Error: Cannot allocate memory — Cause, Fix, and Troubleshooting Guide

How to fix Cannot allocate memory on Linux: diagnose ENOMEM at fork/alloc time, overcommit policy, PID/thread ceilings, cgroup TasksMax, and per-user ulimits.

  • #linux
  • #troubleshooting
  • #errors
  • #memory

Summary

Cannot allocate memory (errno ENOMEM) is the kernel refusing an allocation or task-creation request up front — most often on a fork()/clone() or mmap(). Nothing is killed: the syscall simply fails and the shell frequently retries, printing fork: retry: Resource temporarily unavailable. This is the opposite of the OOM killer, which reaps a running process after memory is committed (see Out of memory: Killed process). Just as often as a raw shortage, Cannot allocate memory means a limit was hit — strict overcommit accounting, a PID/thread ceiling, a cgroup pids.max/TasksMax, a per-user nproc cap, or vm.max_map_count.

Common Symptoms

  • The shell prints fork: retry: Resource temporarily unavailable, sometimes looping before it gives up.
  • New SSH sessions for one user are refused while root logs in fine.
  • A service fails to spawn worker threads and logs Cannot allocate memory or pthread_create failed.
  • systemctl restart of a busy unit fails with Resource temporarily unavailable.
  • The same host forks fine as root but fails as a service account — a strong hint it is a per-user/per-cgroup limit, not the box running dry.

Most Likely Causes of the ‘Cannot allocate memory’ Error

Ordered by how often they cause Cannot allocate memory in production:

  1. A cgroup pids.max / systemd TasksMax hit by one unit. The classic fork: retry inside a single service while the rest of the host is idle.
  2. A per-user nproc (RLIMIT_NPROC) cap exhausted. One UID owns all its allowed processes; other users and root are unaffected.
  3. The system-wide PID/thread ceiling reached (kernel.pid_max / kernel.threads-max) — no process anywhere can fork.
  4. Strict overcommit (vm.overcommit_memory=2) with a low overcommit_ratio rejecting commits while free -h still shows headroom.
  5. vm.max_map_count or ulimit -v (RLIMIT_AS) exhausted — common with the JVM and Elasticsearch, returns ENOMEM from mmap().
  6. Genuine memory + swap exhaustion (MemAvailable/SwapFree near zero) — the true shortage case.

Quick Triage

# Real shortage, or a limit? Free memory vs commit accounting.
free -h
grep -E 'MemAvailable|SwapFree|Committed_AS|CommitLimit' /proc/meminfo

# Live thread/process count — compare against the ceilings below.
ps -eLf | wc -l

If MemAvailable is healthy and swap is non-trivial, this is a limit, not a shortage — skip to the ceilings. Thousands of threads against a low pid_max points straight at a count limit.

Diagnostic Commands

free -h                                  # available memory + swap at a glance
grep -E 'Committed_AS|CommitLimit' /proc/meminfo   # commit accounting vs cap

Committed_AS at CommitLimit with no swap means allocations are being rejected for lack of backing.

cat /proc/sys/vm/overcommit_memory       # 0 heuristic, 1 always, 2 strict
cat /proc/sys/vm/overcommit_ratio        # cap = swap + RAM*ratio/100 in mode 2

Mode 2 with a low ratio caps commitments well below physical RAM; malloc/fork returns ENOMEM even though memory looks free.

cat /proc/sys/kernel/pid_max             # system-wide PID ceiling
cat /proc/sys/kernel/threads-max         # system-wide thread ceiling
ps -eLf | wc -l                          # live thread count

A live count near either ceiling means the whole box is out of PIDs.

systemctl show -p TasksMax -p TasksCurrent <unit>.service
cat /sys/fs/cgroup/system.slice/<unit>.service/pids.current
cat /sys/fs/cgroup/system.slice/<unit>.service/pids.max
cat /sys/fs/cgroup/system.slice/<unit>.service/pids.events   # 'max' counter increments on each rejection

TasksCurrent == TasksMax (or pids.current == pids.max) confines just that unit. On cgroup v2 the path is under /sys/fs/cgroup/system.slice/....

ulimit -u                                # nproc / RLIMIT_NPROC for this user
ulimit -v                                # address space / RLIMIT_AS
cat /proc/sys/vm/max_map_count           # mmap region cap (JVM/Elasticsearch)
cat /proc/<pid>/limits                   # effective limits of a running service
ps -u <user> -L --no-headers | wc -l     # live threads owned by one UID

Compare the live count to the limit. On Ubuntu/Debian, per-user caps live in /etc/security/limits.d/; on RHEL/Rocky, check the same path plus /etc/security/limits.conf and any systemd drop-ins (LimitNPROC=).

vmstat 1 5                               # r/b queues, si/so swap activity
smem -tk -c "pid user command rss pss"   # per-process PSS if smem is installed

Fix / Remediation

Apply the fix that matches the ceiling you identified — safe changes first.

  1. Raise a unit’s task ceiling. Preferred when one service hits TasksMax:

    sudo mkdir -p /etc/systemd/system/<unit>.service.d
    printf '[Service]\nTasksMax=4096\n' | sudo tee /etc/systemd/system/<unit>.service.d/tasks.conf
    sudo systemctl daemon-reload
    sudo systemctl restart <unit>.service
  2. Raise a per-user process limit in /etc/security/limits.d/90-nproc.conf (both Ubuntu/Debian and RHEL/Rocky):

    appuser  soft  nproc  8192
    appuser  hard  nproc  16384

    For systemd services set LimitNPROC= in a drop-in instead. Re-login for a shell session to pick up the new limit.

  3. Raise vm.max_map_count for map-heavy apps (JVM, Elasticsearch):

    echo 'vm.max_map_count = 262144' | sudo tee /etc/sysctl.d/99-maps.conf
    sudo sysctl --system
  4. Relax strict overcommit if vm.overcommit_memory=2 is rejecting valid allocations — either raise the ratio or move to heuristic mode 0:

    echo 'vm.overcommit_memory = 0' | sudo tee /etc/sysctl.d/99-overcommit.conf
    sudo sysctl --system
  5. Raise the system-wide PID ceiling if the whole box is out of PIDs:

    echo 'kernel.pid_max = 4194304' | sudo tee /etc/sysctl.d/99-pidmax.conf
    sudo sysctl --system
  6. Add swap or free real memory only when it is a genuine shortage.

Warning: Killing processes to reclaim PIDs/memory is destructive and can lose in-flight work. Prefer raising the specific ceiling. Only as a last resort, and after identifying the offender, run sudo kill <pid> (escalate to kill -9 if it ignores SIGTERM).

Validation

# The relevant counter should now sit below its ceiling.
systemctl show -p TasksMax -p TasksCurrent <unit>.service
cat /proc/sys/vm/max_map_count
ps -eLf | wc -l

# Confirm the syscall now succeeds where it failed.
sudo -u appuser bash -c 'true & wait'    # a fork that previously errored

Fork failures should stop and TasksCurrent/thread counts should settle well below their limits. A recurring climb back to the ceiling signals a thread or process leak, not a number to keep bumping.

Prevention

  • Right-size TasksMax=/LimitNPROC= on services that spawn workers to their real concurrency, and treat a unit hitting its ceiling as a paging event — it usually means a leak.
  • Standardize nproc, nofile, and vm.max_map_count across hosts via config management; drift is the classic “works on one box, fails on another.”
  • Prefer systemd MemoryHigh= (soft throttle) over a hard MemoryMax= for spiky workloads, and keep some swap or zram so a transient spike returns a clean error path.
  • Monitor ps -eLf | wc -l against kernel.pid_max and alert on Committed_AS / CommitLimit and pids.events/memory.events under cgroup v2.
  • For map-heavy workloads (JVM, Elasticsearch), bake vm.max_map_count into provisioning rather than discovering it at an outage.

Final Notes

Cannot allocate memory / fork: retry: Resource temporarily unavailable is the kernel rejecting a request before anything new exists — distinct from the OOM killer, which kills after the fact. Check free -h first to separate a true shortage from a limit, then work the limits in order (unit TasksMax, per-user nproc, system pid_max, overcommit, max_map_count). Raise the specific ceiling being hit, and treat a recurring failure as a leak to fix, not just a number to increase.

Want faster Linux incident response? Use DevOps AI Toolkit to turn production errors into clear diagnostics, remediation steps, and reusable runbooks.

Free download · 368-page PDF

Download the Free 500-Prompt DevOps AI Toolkit

500 battle-tested, copy-paste AI prompts engineered by a senior systems engineer — every one with fill-in placeholders and safety/back-out notes. Drop your email and it's yours.

  • 500 prompts: Linux · Kubernetes · Terraform · OpenStack · GitLab · Docker · Monitoring · Incident Response
  • Instant PDF download — yours free, forever
  • Plus one practical AI-workflow email a week (no spam)

Single opt-in · unsubscribe anytime · no spam.