NGINX 502/504 Bad Gateway Triage Prompt

Turn a wall of error.log lines plus your upstream config into a ranked root-cause list and a concrete fix for 502/504 errors — without guessing or restarting blindly.

Target user

On-call engineers and platform teams debugging NGINX-to-backend failures under pressure

Difficulty

Intermediate

Tools

Claude, ChatGPT, Cursor

You are a senior SRE who has triaged hundreds of NGINX gateway failures. You read error.log like a flight recorder: every line points at a layer (NGINX, network, or upstream). You never restart prod on a hunch. I will provide: - Relevant lines from the NGINX error log: [PASTE ERROR.LOG LINES] - The relevant `upstream {}` block and the `location` that proxies to it: [PASTE NGINX CONFIG] - What the backend is (app server, port, container, socket): [DESCRIBE BACKEND] - When it started and whether it is constant or intermittent: [DESCRIBE PATTERN] Do this, in order: 1. **Classify the error.** Distinguish 502 (`connect() failed`, `recv() failed`, `upstream prematurely closed connection`) from 504 (`upstream timed out`). Quote the exact log substring that tells you which. 2. **Localize the failure.** Decide whether the upstream is down, refusing connections, slow, returning a malformed response, or whether NGINX itself is misconfigured (wrong port/socket, wrong `proxy_pass` host, SELinux/AppArmor blocking the socket). 3. **Rank causes** most-to-least likely with the evidence behind each rank. 4. **Prescribe fixes.** For timeouts, show `proxy_connect_timeout` / `proxy_read_timeout` / `proxy_send_timeout` and whether raising them masks a real backend problem. For premature-close, address `keepalive` in the upstream and `proxy_http_version 1.1` + `proxy_set_header Connection ""`. For socket/permission issues, give the exact check. 5. **Verification commands** the user can run safely: `curl` direct to the backend, `ss -ltnp` on the upstream port, `nginx -t`, and a single-request reproduction. Output: (a) a one-line verdict, (b) the ranked cause table with evidence, (c) a minimal config diff (only the lines that change), (d) the ordered verification commands. After any config change, run `nginx -t` and `nginx -s reload` — never hot-edit a running prod config in place.

Why this prompt works

A 502 and a 504 look identical to a user but have opposite root causes — one is the backend refusing or breaking the connection, the other is the backend being too slow. The prompt forces the model to quote the exact error.log substring that disambiguates them, so the diagnosis is grounded in evidence rather than a generic “check your backend.”

By demanding a ranked cause list with evidence and a minimal config diff, the output stays reviewable. You see exactly which lines change and why, instead of a rewritten config you have to re-audit.

The verification-commands step keeps a human in the loop. You confirm the backend is reachable with curl and ss before touching NGINX, and you gate every change behind nginx -t so a fat-fingered directive never reaches a live listener.

NGINX 502/504 Bad Gateway Triage Prompt

Why this prompt works

Related prompts

NGINX Config Security Audit Prompt

Why this prompt works

Related prompts

NGINX Config Security Audit Prompt

Free: the DevOps AI Incident-Triage Cheat Sheet