NGINX Error Guide: 'recv() failed (104: Connection reset by

Exact Error Message

When NGINX proxies a request and the backend abruptly drops the TCP connection, it logs an entry like this in your error.log and returns a 502 Bad Gateway to the client:

2026/06/27 15:09:33 [error] 2841#2841: *10544 recv() failed (104: Connection reset by peer) while reading response header from upstream, client: 203.0.113.7, server: app.example.com, request: "POST /api/jobs HTTP/1.1", upstream: "http://127.0.0.1:9000/api/jobs", host: "app.example.com"

The key fragments are recv() failed (104: Connection reset by peer) and while reading response header from upstream. Together they tell you exactly what happened: NGINX called recv() to read the backend’s response, and the kernel returned errno 104 (ECONNRESET).

What the Error Means

NGINX had successfully connected to the upstream and sent the request. It was waiting to read the response headers when the backend’s TCP stack sent a RST (reset) packet, tearing down the connection immediately. The 104 is the POSIX ECONNRESET errno: the peer forcibly closed the socket.

This is distinct from a graceful shutdown. There are two ways a connection ends:

Clean close (FIN): The backend finishes (or chooses to close) and sends a TCP FIN. NGINX logs upstream prematurely closed connection while reading response header from upstream. This usually means the backend closed an idle keepalive connection or exited normally without writing a full response.
Reset (RST): The backend’s socket is torn down hard, no clean handshake. NGINX logs recv() failed (104: Connection reset by peer). This typically means the backend process died (crash, OOM kill, segfault), was killed by its own timeout/limit logic, or a stale keepalive socket was reused after the kernel had already discarded the connection.

Both produce a 502, but the cause and fix differ. A RST points at something violent happening to the connection, not a polite goodbye.

Common Causes

Backend worker crashed, was OOM-killed, or segfaulted. PHP-FPM, Gunicorn, a Node process, or a Java app died mid-response. The kernel resets any open sockets the dead process owned. OOM kills are the most common silent culprit.
Stale keepalive connection reused. NGINX keeps upstream connections alive and reuses them. If the backend closed its side (idle timeout, max_requests, restart) but NGINX still has the socket in its pool, the next write triggers a RST. This is the classic case when proxy_http_version 1.1 and proxy_set_header Connection "" are missing.
Backend request or timeout limit killed the connection. Gunicorn’s --timeout, PHP-FPM’s request_terminate_timeout, or a Java container’s request limit can kill the worker handling a slow request, resetting the socket before a response is sent.
Request body too large for the upstream. The backend (or an app-level limit like client_max_body_size on a second proxy, or a framework body cap) rejects an oversized upload by closing the connection hard instead of returning a 413.
MTU / network resets. Path MTU issues, a firewall or load balancer with an aggressive idle timeout, or a NAT table eviction can inject a RST on a connection NGINX believed was open. Common across container/overlay networks.

How to Reproduce the Error

The cleanest reproduction is killing the backend mid-request. With a Gunicorn or PHP-FPM app behind NGINX, send a request that the backend starts handling, then kill the worker:

# Terminal 1: send a slow request through NGINX
curl -s -X POST http://app.example.com/api/jobs -d '{"sleep":10}'

# Terminal 2: hard-kill the backend worker while the request is in flight
pkill -9 -f 'gunicorn: worker'

SIGKILL (-9) prevents a clean shutdown, so the kernel resets the open socket and NGINX logs recv() failed (104: Connection reset by peer). To reproduce the stale-keepalive variant, set a very short keepalive on the backend and a long one in NGINX, then send two requests spaced just beyond the backend’s idle timeout: the second reuses a dead socket and resets.

Diagnostic Commands

Start by confirming the NGINX config is valid and inspect the upstream/keepalive settings (all read-only):

# Validate config syntax
sudo nginx -t

# Dump the full effective config and check keepalive / version / Connection headers
sudo nginx -T 2>/dev/null | grep -nE 'proxy_http_version|keepalive|Connection|upstream|proxy_pass'

Hit the backend directly, bypassing NGINX, to see whether the backend itself is healthy:

# Talk to the upstream directly (adjust host:port to your proxy_pass target)
curl -sv http://127.0.0.1:9000/api/jobs -X POST -d '{"sleep":1}'

Confirm the backend is actually listening and look at connection states:

# Is the upstream port listening, and which process owns it?
ss -ltnp | grep ':9000'

# Look for connections stuck in CLOSE-WAIT / lots of resets toward the backend
ss -tan | grep ':9000'

Check logs on both sides, and crucially the kernel OOM killer:

# NGINX service + the backend service logs
journalctl -u nginx --since '15 min ago' --no-pager
journalctl -u gunicorn --since '15 min ago' --no-pager   # or php-fpm, your-app.service

# Backend application/error logs (read)
tail -n 100 /var/log/php-fpm/error.log
tail -n 100 /var/log/gunicorn/error.log

# The smoking gun for OOM kills
dmesg | grep -i 'oom\|killed process'

If dmesg shows Out of memory: Killed process ... (gunicorn) lines that line up with the 502 timestamps, you have your answer: the backend is being OOM-killed, and the RST is a symptom.

Step-by-Step Resolution

Correlate timestamps. Match the error.log 502 times against dmesg OOM lines and the backend’s own logs. If the backend logged a crash, traceback, or was OOM-killed at that instant, fix the backend, not NGINX.
If it is a crash or OOM kill: Raise the memory limit (or container memory cgroup), reduce per-worker memory (fewer threads, smaller caches), or add workers/instances. For PHP-FPM check pm.max_children and memory_limit; for Gunicorn watch worker memory growth and consider --max-requests with --max-requests-jitter to recycle leaky workers gracefully.
If it is stale keepalive reuse: This is the most common and most fixable case. In the location (or server) block that proxies to the upstream, ensure HTTP/1.1 and an empty Connection header so connections are reused correctly and not poisoned:
```
proxy_http_version 1.1;
proxy_set_header Connection "";
```
And in the upstream block, set a keepalive pool size (e.g. keepalive 32;). Without proxy_http_version 1.1 plus Connection "", NGINX sends Connection: close semantics or reuses sockets the backend already closed, producing resets. Make the NGINX upstream keepalive timeout shorter than the backend’s idle timeout so NGINX retires sockets first.
If a request/timeout limit is killing workers: Align timeouts. Raise the backend’s per-request timeout (Gunicorn --timeout, PHP-FPM request_terminate_timeout) for genuinely slow endpoints, or offload long work to a queue. Make sure NGINX proxy_read_timeout is consistent with the backend.
If oversized request bodies are the trigger: Set a sane client_max_body_size in NGINX so it returns a clean 413 before forwarding, and align the backend’s body limit so it does not close the socket hard.
Apply and reload. After editing the config, validate and reload without dropping connections:
```
sudo nginx -t && sudo systemctl reload nginx
```
Verify. Re-run the direct curl and a request through NGINX, then watch error.log to confirm the resets are gone.

Prevention and Best Practices

Always pair upstream keepalive with proxy_http_version 1.1; and proxy_set_header Connection "";. This single fix eliminates the most frequent reset cause.
Keep NGINX keepalive timeout below the backend’s idle timeout so NGINX never reuses a socket the backend has already closed.
Right-size memory and recycle workers. Use --max-requests (Gunicorn) or sensible pm.max_children (PHP-FPM) to bound memory and avoid OOM kills.
Monitor for OOM events. Alert on dmesg OOM lines and on 502 rate spikes so resets surface before users complain. See the /dashboard/incident-response/ tooling for triaging upstream 502 floods.
Log timestamps consistently across NGINX and the backend so correlation takes seconds, not minutes.

upstream prematurely closed connection while reading response header from upstream — the clean-FIN cousin of this error; the backend closed gracefully rather than resetting. Usually keepalive or a backend exit without a full response.
connect() failed (111: Connection refused) while connecting to upstream — the backend is not listening at all (down or wrong port), as opposed to dying mid-response.
upstream timed out (110: Connection timed out) while reading response header from upstream — the backend is alive but too slow; tune proxy_read_timeout and the backend’s processing.
For more NGINX guides, see /categories/nginx/.

Frequently Asked Questions

Is recv() failed (104) the same as upstream prematurely closed connection?

No. recv() failed (104: Connection reset by peer) means the backend sent a TCP RST — a hard, abnormal teardown, typically from a crash, OOM kill, or a stale socket. upstream prematurely closed connection means a clean TCP FIN — a graceful close. The reset version almost always points at the backend dying or a poisoned keepalive socket.

Why does adding proxy_http_version 1.1; and Connection ""; fix the resets?

Upstream keepalive only works correctly over HTTP/1.1 with no explicit Connection: close. Without these directives, NGINX either closes connections it should reuse or reuses ones the backend already abandoned, so the next write lands on a dead socket and the kernel replies with a RST. The two directives make NGINX manage the keepalive pool the way the backend expects.

How do I confirm the OOM killer is the cause?

Run dmesg | grep -i oom and look for Out of memory: Killed process lines, then match their timestamps to the 502s in error.log. If they line up with your backend process name, the kernel is killing workers under memory pressure and the reset is just the visible symptom.

Can a network device cause this without the backend crashing?

Yes. A firewall, NAT gateway, load balancer, or container overlay with an aggressive idle timeout can evict a connection and inject a RST while both NGINX and the backend believe it is still open. Check ss -tan for unexpected resets and align idle timeouts end to end, including any intermediate proxies.

NGINX Error Guide: 'recv() failed (104: Connection reset by peer)' from Upstream

Exact Error Message

What the Error Means

Common Causes

How to Reproduce the Error

Diagnostic Commands

Step-by-Step Resolution

Prevention and Best Practices

Frequently Asked Questions

Download the Free 500-Prompt DevOps AI Toolkit

Exact Error Message

What the Error Means

Common Causes

How to Reproduce the Error

Diagnostic Commands

Step-by-Step Resolution

Prevention and Best Practices

Related Errors

Frequently Asked Questions

Download the Free 500-Prompt DevOps AI Toolkit