Linux Error: fork: Resource temporarily unavailable

Summary

fork: Resource temporarily unavailable (errno EAGAIN) is what the kernel returns when it refuses to create a new process or thread. It is almost never about RAM — that would be ENOMEM. It means you hit a count limit on tasks. Bash retries a few times, so you also see fork: retry: Resource temporarily unavailable, before commands stop working entirely. On Linux, threads are tasks too, so a single JVM or Go service spawning thousands of threads counts exactly like thousands of processes against the same ceiling.

Common Symptoms

Interactive commands fail: bash: fork: retry: Resource temporarily unavailable, then bash: fork: Resource temporarily unavailable.
A JVM logs java.lang.OutOfMemoryError: unable to create new native thread (despite plenty of free RAM).
sshd refuses logins: error: fork: Resource temporarily unavailable in /var/log/auth.log.
systemd journal shows Failed to fork: Resource temporarily unavailable and status=219/CGROUP.
The failures hit all users at once (global limit) or just one service (per-user/cgroup limit).

Most Likely Causes of the ‘fork: Resource temporarily unavailable’ Error

There are four independent ceilings, and any one triggers fork: Resource temporarily unavailable. Most common in production first:

RLIMIT_NPROC (ulimit -u) too low — a per-user cap on tasks (processes + threads). Shared hosts often set nproc to a few hundred in /etc/security/limits.conf.
A thread-pool or process leak — unbounded executors, leaked connections, or a thread-per-request server under load spawning tasks faster than they exit.
cgroup pids.max / systemd TasksMax — a restrictive per-cgroup cap on a container or unit.
kernel.pid_max — the largest PID the kernel will allocate, a global cap on concurrent tasks.
kernel.threads-max — a global cap on total threads system-wide.
Zombie or stuck processes never reaped, slowly filling the task table.

Quick Triage

# Per-user soft/hard task limit for the affected user
ulimit -u; ulimit -Hu

# Live, authoritative limit the kernel enforces for a running service
cat /proc/$(pgrep -n java)/limits | grep -E 'processes|Limit'

# Per-user thread count — the number that matters for RLIMIT_NPROC
ps -eLf | awk '{print $1}' | sort | uniq -c | sort -rn | head

If a user’s live thread count is at its ulimit -u, that is RLIMIT_NPROC. If it hits all users at once, suspect a global ceiling.

Diagnostic Commands

# Total tasks (threads included) on the host
ps -eLf | wc -l

# Threads per process — find the greedy one
ps -eo pid,nlwp,user,comm --sort=-nlwp | head

nlwp is the thread count per PID; a single java PID holding thousands of threads is the usual culprit.

# Global ceilings
cat /proc/sys/kernel/pid_max
cat /proc/sys/kernel/threads-max

Compare ps -eLf | wc -l against threads-max for a global thread exhaustion.

# cgroup / systemd task budget (the TasksMax = pids.max mapping)
systemctl show <SERVICE> -p TasksMax -p TasksCurrent
# cgroup v2 (modern Ubuntu/Debian and RHEL/Rocky):
cat /sys/fs/cgroup/system.slice/<SERVICE>/pids.max
cat /sys/fs/cgroup/system.slice/<SERVICE>/pids.current

When pids.current equals pids.max, the cgroup — not the user limit — is the bottleneck. Match the symptom to the ceiling: status=219/CGROUP in the journal points at TasksMax; EAGAIN with ulimit -u exceeded points at RLIMIT_NPROC; a threads-max/pid_max hit shows across all users.

Fix / Remediation

Identify which ceiling you hit by comparing live counts to each limit (steps above). Apply only the fix that matches.

Stop the bleeding. If one process is leaking threads, restart it through its manager so the cgroup task count resets:

watch -n 2 'ps -o pid,nlwp,comm -p <PID>'   # confirm growth first (read-only)
sudo systemctl restart <SERVICE>

Raise the per-user limit (RLIMIT_NPROC) via /etc/security/limits.d/:

appuser    soft    nproc    16384
appuser    hard    nproc    32768

For a systemd service, set the task cap and per-process limit in a drop-in (limits.conf does not apply — services skip PAM login):
```
# /etc/systemd/system/<SERVICE>.service.d/limits.conf
[Service]
TasksMax=8192
LimitNPROC=16384
```
Then sudo systemctl daemon-reload && sudo systemctl restart <SERVICE>.
Only when truly global, raise the kernel table and persist it in /etc/sysctl.d/99-limits.conf:

Warning: Raising kernel.threads-max / kernel.pid_max or setting TasksMax=infinity removes a guardrail. A leaking service can then exhaust the global table and take down the whole host instead of just itself. Prefer a sized limit plus fixing the leak.
```
sudo sysctl -w kernel.threads-max=1000000
sudo sysctl -w kernel.pid_max=4194304
```

Validation

# New per-process limit is live
cat /proc/$(pgrep -n <SERVICE>)/limits | grep -i 'processes'

# systemd task budget updated
systemctl show <SERVICE> -p TasksMax -p TasksCurrent

# Task count stays bounded under load (leak contained)
watch -n 5 'ps -o pid,nlwp,comm -p $(pgrep -n <SERVICE>)'

A limit reflecting the new value and a task count that plateaus well below it confirm the fix.

Prevention

Bound your thread pools. Most “unable to create new native thread” incidents are application bugs — use fixed-size executors and connection pools; raising kernel limits only delays the crash.
Set TasksMax deliberately per systemd unit, sized for the real workload, to contain a runaway service to its own cgroup.
Monitor the pids.current / pids.max ratio and per-user thread counts, and alert before saturation.
Watch DefaultTasksMax (often 15% of kernel.pid_max), which governs units without an explicit value — systemctl show -p DefaultTasksMax.
Reap zombies — ensure parents wait() on children; in containers run a proper init (PID 1) that reaps orphans.
Manage limits via config management (Ansible/Puppet) so per-service LimitNPROC/TasksMax and sysctl values are consistent and survive reboots.

Final Notes

fork: Resource temporarily unavailable is EAGAIN — a count limit, not a memory limit. Match the symptom to one of the four ceilings (RLIMIT_NPROC, cgroup pids.max/TasksMax, kernel.pid_max, kernel.threads-max) before changing anything, then raise only that limit. If a single process keeps climbing under steady load, the real fix is bounding its threads in the code, not lifting the guardrail.

Want faster Linux incident response? Use DevOps AI Toolkit to turn production errors into clear diagnostics, remediation steps, and reusable runbooks.

Linux Error: fork: Resource temporarily unavailable — Cause, Fix, and Troubleshooting Guide

Summary

Common Symptoms

Most Likely Causes of the ‘fork: Resource temporarily unavailable’ Error

Quick Triage

Diagnostic Commands

Fix / Remediation

Validation

Prevention

Final Notes

Download the Free 500-Prompt DevOps AI Toolkit

Summary

Common Symptoms

Most Likely Causes of the ‘fork: Resource temporarily unavailable’ Error

Quick Triage

Diagnostic Commands

Fix / Remediation

Validation

Prevention

Related Errors

Final Notes

Download the Free 500-Prompt DevOps AI Toolkit