NGINX Upstream Health Checks & Load Balancing Prompt

Design an upstream block with the right load-balancing algorithm, passive health checks, and failover tuning — so a single sick backend stops poisoning your error rate instead of getting quietly removed from rotation.

Target user

Engineers load-balancing multiple backend instances behind NGINX

Difficulty

Advanced

Tools

Claude, ChatGPT, Cursor

You are a senior infrastructure engineer who tunes NGINX upstreams for resilience. You know the difference between the load-balancing algorithms, you know passive health checks (open source) from active ones (NGINX Plus), and you never ship `max_fails 0` by accident. I will provide: - The backend instances (count, addresses, capacity differences): [DESCRIBE BACKENDS] - Whether sessions are sticky / stateful: [STATELESS / STICKY + DETAILS] - Your NGINX flavor: [open source / NGINX Plus] - Traffic shape (steady, bursty, long requests): [DESCRIBE] - How a backend typically fails (crash, slow, 5xx): [DESCRIBE] Build the config: 1. **Algorithm choice** — recommend round-robin, `least_conn`, or `ip_hash`/hash for stickiness, and justify it from the traffic shape and session needs. Explain the trade-off of each in one line. 2. **Weights** — if instances differ in capacity, show `weight=` and explain how it skews distribution. 3. **Passive health checks** — set `max_fails` and `fail_timeout` so a backend that returns errors or times out is taken out of rotation, then probed again. Explain exactly what counts as a "fail" and how `proxy_next_upstream` interacts with this. 4. **Failover behavior** — configure `proxy_next_upstream` (and its `_tries`/`_timeout` caps) so a failed request retries another backend without retrying forever or duplicating non-idempotent writes. 5. **Active checks (if NGINX Plus)** — show the `health_check` directive and a `match {}` block validating status and body; otherwise state clearly that open-source NGINX only has passive checks and suggest an external prober. Output: (a) the complete commented `upstream {}` block plus the relevant `location` directives, (b) a table of each resilience directive and the failure it guards against, (c) a note on how to observe ejections in the error log, plus the `nginx -t` line. Validate with `nginx -t` and reload — do not edit a live prod upstream in place.

Why this prompt works

Load balancing looks trivial — list a few servers in an upstream block and you’re done — which is exactly why it fails badly. The defaults give you plain round-robin with no health checking, so a backend that’s up but returning 500s keeps getting one-third of your traffic. This prompt forces the two decisions that actually matter: the algorithm (which depends on whether your sessions are sticky and your requests are uniform) and the passive health-check thresholds (max_fails/fail_timeout) that pull a sick instance out of rotation.

The retry behavior is the subtle trap. proxy_next_upstream is what lets NGINX retry a failed request on another backend, but if you enable it for POSTs you can duplicate a payment or a write when a backend is merely slow rather than dead. Making the model reason about idempotency and cap the retries turns a foot-gun into a deliberate, bounded policy.

The prompt also pins down a licensing reality that trips people up constantly: active health_check probes are an NGINX Plus feature. Asking the model to state your flavor and fall back to passive checks plus an external prober prevents the classic case of pasting a health_check directive into open-source NGINX and getting a config that fails nginx -t — or worse, one that quietly does nothing.

NGINX Upstream Health Checks & Load Balancing Prompt

Why this prompt works

Related prompts

NGINX 502/504 Bad Gateway Triage Prompt

NGINX Reverse-Proxy vhost Design Prompt

Why this prompt works

Related prompts

NGINX 502/504 Bad Gateway Triage Prompt

NGINX Reverse-Proxy vhost Design Prompt

Free: the DevOps AI Incident-Triage Cheat Sheet