AI-Assisted NGINX Performance Tuning Without Cargo-Culting

A teammate once pasted a 40-line nginx.conf snippet into our prod ingress because a chatbot told him it would “dramatically improve throughput.” It set worker_connections 65535, cranked proxy_buffers to absurd sizes, and turned on options that conflicted with our existing config. The reload failed nginx -t, and when we fixed the syntax and actually reloaded, the box’s memory footprint jumped while p99 latency didn’t move at all. The numbers were plausible. They were also pulled from nowhere. That afternoon taught me the rule I still use: AI is great for explaining what a directive does and drafting a starting config, but it does not know your traffic, your CPU count, or your upstream behavior. You do — and you measure.

This post walks through the directives I actually touch when tuning NGINX, how I get AI to help reason about each one, and why “the defaults are fine” is a perfectly good conclusion most of the time.

Start by measuring, not editing

Before you change a single line, get a baseline. You cannot claim an improvement if you never recorded the “before.” I keep this dead simple: a load generator hitting a representative endpoint, plus NGINX’s own stub status.

# In an internal-only server block
location /nginx_status {
    stub_status;
    allow 127.0.0.1;
    deny all;
}

# Baseline run — capture this BEFORE any tuning
wrk -t4 -c200 -d30s https://example.com/api/health
curl -s http://127.0.0.1/nginx_status

Note your active connections, requests/sec, and latency distribution. Save the output somewhere. Every change after this gets the same test, and you compare. If the numbers don’t move, you revert — even if the AI sounded confident. This is the whole discipline, and it is what separates tuning from guessing.

worker_processes and worker_connections

This is the pair people break first. worker_processes auto lets NGINX match the number of worker processes to available CPU cores, which is almost always what you want. Don’t hardcode it to some number a blog post used in 2014.

worker_processes auto;

events {
    worker_connections 4096;
    multi_accept on;
}

worker_connections is the max simultaneous connections per worker, not total. Your theoretical ceiling is worker_processes × worker_connections, but remember each proxied request uses one connection to the client and one to the upstream. Setting it to 65535 doesn’t make you faster; it just raises a limit you probably aren’t hitting and quietly increases memory reservations. Check your actual concurrent connections from stub status before inflating this.

multi_accept on lets a worker accept all pending connections at once instead of one per event loop iteration. It can help under bursty load but is genuinely marginal for most workloads — exactly the kind of thing to A/B test rather than assume.

Here’s the kind of AI exchange that’s actually useful — asking it to explain the tradeoff, not hand me a number:

Prompt: Explain what worker_connections does in NGINX and what happens to memory and file descriptors if I set it much higher than my real peak concurrency. Don’t suggest a value.

A good answer covers: each connection consumes a file descriptor and a slice of worker memory; the value must stay under your ulimit -n; oversizing wastes reserved resources without improving throughput; the right value is derived from observed peak concurrent connections plus headroom, not a fixed constant.

That’s AI doing what it’s good at: teaching the mechanism so I can pick the number.

Keepalive — both directions, and people forget the upstream one

There are two separate keepalive concerns and they get conflated constantly.

Client-side keepalive controls how long NGINX holds idle client connections:

http {
    keepalive_timeout 65;
    keepalive_requests 1000;
}

Upstream keepalive is the one teams forget, and it’s often the bigger win. Without it, NGINX opens a fresh TCP connection to your backend for every request, paying handshake cost each time. To enable connection reuse to upstreams you need a named upstream block, a keepalive count, and — critically — HTTP/1.1 with the Connection header cleared:

upstream app_backend {
    server 10.0.0.10:8080;
    server 10.0.0.11:8080;
    keepalive 32;
}

server {
    location / {
        proxy_pass http://app_backend;
        proxy_http_version 1.1;
        proxy_set_header Connection "";
    }
}

Miss either of those last two lines and your upstream keepalive silently does nothing. This is a fantastic thing to have AI sanity-check, because the failure mode is invisible — it just quietly reconnects every time.

Buffers — size to your responses, not to the max

proxy_buffers, proxy_buffer_size, and client_body_buffer_size decide whether NGINX holds responses and request bodies in memory or spills them to disk. Disk spillover is slow, but oversizing buffers multiplies memory by every concurrent connection.

location / {
    proxy_buffer_size      8k;
    proxy_buffers          16 8k;
    proxy_busy_buffers_size 16k;
    client_body_buffer_size 16k;
    proxy_pass http://app_backend;
}

proxy_buffer_size handles the first chunk (headers). proxy_buffers is count size for the body. The right values come from your actual response sizes — check proxy_temp_path for spillover, and if you’re never spilling, your buffers are already big enough. AI will happily suggest proxy_buffers 256 16k; ask it instead how to tell whether your current buffers are too small. The honest answer is “look for disk writes to the temp path,” not “make them bigger.”

gzip vs brotli — compress smart, not everything

Compression trades CPU for bandwidth. gzip is built in and fine. brotli (via ngx_brotli) compresses text a bit smaller at comparable cost but needs a module.

gzip on;
gzip_comp_level 5;
gzip_min_length 1024;
gzip_types text/plain text/css application/json application/javascript text/xml application/xml image/svg+xml;
gzip_vary on;

Two things AI gets wrong if you let it: don’t compress already-compressed formats (JPEG, PNG, MP4, gzip) — you burn CPU for nothing — and don’t max gzip_comp_level to 9. The jump from 5 to 9 costs noticeably more CPU for a sliver of size. brotli is worth it if you serve a lot of static text to browsers, but it’s a decision to measure, not a default to flip on because it’s newer.

sendfile, tcp_nopush, tcp_nodelay, and open_file_cache

These four go together for static content.

http {
    sendfile on;
    tcp_nopush on;
    tcp_nodelay on;

    open_file_cache max=10000 inactive=30s;
    open_file_cache_valid 60s;
    open_file_cache_min_uses 2;
    open_file_cache_errors on;
}

sendfile on lets the kernel copy file data straight to the socket, skipping userspace. tcp_nopush (only effective with sendfile) batches headers and the file start into full packets. tcp_nodelay disables Nagle’s algorithm so the last partial packet ships immediately — NGINX is smart enough to apply each at the right moment, so enabling both is correct, not contradictory. open_file_cache caches file descriptors and metadata so you’re not stat-ing the same files on every request — a real win for static-heavy sites, near-useless for a pure reverse proxy. Match the tool to the job.

Validate, then reload — every single time

No matter how clean the config looks, test it before it touches a worker. This is non-negotiable, and it’s the step the chatbot’s snippet skipped:

nginx -t          # parse and validate; never reload without a clean result
nginx -s reload   # graceful reload — existing connections drain, workers respawn

nginx -t catches syntax errors and bad directive contexts before they take down your listener. nginx -s reload swaps workers gracefully without dropping in-flight requests. If -t fails, you fix it; you do not force it.

Where AI actually earns its keep

The pattern that works: have AI explain directives, draft a candidate block, and review yours for the silent footguns (the missing proxy_set_header Connection "", compressing binaries, a worker_connections above your ulimit). Then you run the baseline, apply one change, re-test, and keep it only if the graph moves. If you want reusable starting points for those conversations, I keep a set in the prompts library, and there’s more NGINX-specific material under the NGINX category.

The uncomfortable truth is that for a lot of deployments, the defaults plus upstream keepalive and worker_processes auto get you 95% of the way, and the rest is noise you can measure to confirm. AI is a brilliant explainer and a fast drafter. It is a terrible source of magic numbers. Keep nginx -t in the loop, keep yourself in control, and never ship a number you can’t defend with a before-and-after.