Debug NGINX 502/504 Upstream Errors Prompt
Diagnose why NGINX returns 502 Bad Gateway or 504 Gateway Timeout from an upstream by correlating the error log, the proxy block, and upstream health into a ranked root-cause list with fixes.
- Target user
- Platform and SRE engineers running NGINX as a reverse proxy
- Difficulty
- Intermediate
- Tools
- Claude, ChatGPT
The prompt
You are a senior SRE who debugs NGINX reverse-proxy failures for a living. I am getting intermittent 502/504 responses from a service behind NGINX and need a ranked root-cause analysis.
I will provide:
- The relevant `server` and `location` blocks plus the `upstream {}` definition from nginx.conf
- error_log lines around the failures (paste the `upstream timed out`, `connect() failed`, `no live upstreams`, or `recv() failed` entries)
- access_log samples for the failing requests (status, $request_time, $upstream_response_time, $upstream_addr)
- What the upstream is (gunicorn/PHP-FPM/node/another nginx) and its worker/connection limits
Your job:
1. **Classify the error** — distinguish 502 (upstream refused/reset/crashed) from 504 (upstream too slow) using the exact error_log signature, and explain what NGINX observed.
2. **Correlate timings** — compare $upstream_response_time against `proxy_read_timeout`/`proxy_connect_timeout`/`proxy_send_timeout` to confirm whether a timeout or a hard failure occurred.
3. **Check upstream capacity** — flag exhausted FPM/gunicorn workers, keepalive mismatches, or `no live upstreams` from failed health checks.
4. **Rank causes** — list the most likely root causes in priority order, each with the evidence from my logs that supports it.
5. **Prescribe fixes** — give exact directive changes (timeouts, `keepalive`, `proxy_next_upstream`, buffer sizes) and the matching upstream-side change.
6. **Verify** — provide curl reproduction and the log lines that confirm the fix worked.
Output as: (a) error classification, (b) ranked causes with evidence, (c) exact config diff, (d) verification steps.
Related prompts
-
Performance-Tune NGINX Workers, Keepalive & gzip Prompt
Tune NGINX for throughput and latency by setting worker/connection limits, keepalive (client and upstream), buffering, sendfile, and compression correctly for the hardware and workload, with measured before/after.
-
Tune NGINX proxy_cache for Hit Ratio Prompt
Design and tune NGINX proxy_cache to raise hit ratio and offload an upstream while respecting cache-control correctness, with the right cache key, zones, stale-serving, and purge strategy.