AI-Assisted NGINX Rate Limiting and Abuse Control

The page that finally pushed me to take NGINX rate limiting seriously was a login endpoint. We were getting credential-stuffing traffic — thousands of POSTs an hour from a rotating pool of IPs, each one trying a handful of passwords and moving on. Our app-layer lockout worked, but the requests still hit PHP-FPM, still touched the database, still burned CPU we were paying for. The fix lived three layers down from where the pain was: a limit_req_zone at the edge, sized correctly, keyed correctly. I’d written limit_req before, but every time I came back to it I had to relearn the burst-versus-rate math, and the failure mode for getting it wrong is locking out real users. That’s exactly the kind of fiddly, well-documented-but-easy-to-misjudge config where AI earns its keep — drafting the directives, explaining the tradeoffs, and doing the burst arithmetic out loud. What it can’t do is tell you whether your tuning survives real traffic. That part is still yours.

What AI is actually good for here

NGINX limiting is a small set of directives with a lot of subtle interactions. limit_req_zone defines a shared memory zone and a rate. limit_req applies it in a context, with optional burst and nodelay. The leaky-bucket model underneath is simple once you internalize it, but the parameters don’t map intuitively to “how many requests can a user actually make.” That gap is where AI helps: it’ll write a correct first draft, explain what burst=20 nodelay does to a traffic spike, and walk the math when you ask “if my rate is 10r/s and burst is 20, what does a client see at 50 requests in one second?”

Here’s the kind of prompt I use. Be specific about the endpoint, the threat, and where NGINX sits in the network:

I run NGINX in front of a Node app. I want to rate-limit /api/login to stop credential stuffing without blocking legitimate users who retry a failed password a couple times. NGINX is behind an AWS ALB, so $remote_addr is the load balancer. Draft a limit_req_zone and limit_req config, explain the burst sizing, and tell me which key variable to use and why. Don’t use nodelay if it would let an attacker burst through.

The “NGINX is behind an ALB” line matters more than anything else in that prompt, and it’s the thing people forget to mention. Get it wrong and you rate-limit by load balancer IP, which means one abusive client trips the limit for everyone.

Choosing the key: the part that’s actually load-bearing

The single most consequential decision is what you key the limit on. $binary_remote_addr is the right default — it’s the client IP in a compact 4-byte form, so a 10m zone holds roughly 160,000 unique addresses. But the moment a proxy sits in front of NGINX, $remote_addr becomes the proxy’s address, and you’re limiting the whole world as one bucket.

Behind a trusted proxy or load balancer, you want the real client IP from X-Forwarded-For. The clean way is real_ip so the rest of your config still uses $binary_remote_addr transparently:

# Trust the load balancer's forwarded header, but only from its subnet
set_real_ip_from 10.0.0.0/16;
real_ip_header   X-Forwarded-For;
real_ip_recursive on;

The danger with trusting $http_x_forwarded_for directly as the limit key is that it’s client-controlled — an attacker just rotates the header value and sidesteps your limit entirely. Only trust it after set_real_ip_from has pinned which upstream is allowed to set it. AI will flag this if you tell it there’s a proxy; it can’t flag it if you don’t.

This is the config that fits the credential-stuffing case. Two zones — one for request rate on the login path, one for connection count to blunt slow-loris-style abuse — plus an explicit status code so the limited responses are obvious in logs and to clients.

http {
    # 10 requests/second per client IP, ~160k IPs in 10MB
    limit_req_zone  $binary_remote_addr  zone=login:10m  rate=10r/s;

    # cap concurrent connections per client IP
    limit_conn_zone $binary_remote_addr  zone=conn_per_ip:10m;

    # 429 is clearer than the default 503 for "you're being limited"
    limit_req_status  429;
    limit_conn_status 429;

    server {
        listen 443 ssl;
        server_name app.example.com;

        location = /api/login {
            # burst absorbs a short spike; no nodelay so excess is queued,
            # not served instantly — an attacker can't dump 30 tries at once
            limit_req   zone=login burst=20;
            limit_conn  conn_per_ip 10;

            proxy_pass http://app_upstream;
        }
    }
}

The burst math is the part worth understanding. At rate=10r/s, NGINX refills the bucket one token every 100ms. burst=20 means up to 20 requests beyond the steady rate get queued instead of rejected. Without nodelay, those queued requests are released at the rate — so a client that fires 30 requests instantly gets the first served, 20 queued and trickled out, and the rest 429’d. That’s exactly what you want on a login endpoint: a real user retrying twice sails through, while a script hammering the endpoint gets throttled to a crawl.

nodelay flips that. With limit_req zone=login burst=20 nodelay, all 20 burst requests are served immediately and the bucket just has to refill before the next burst is allowed. That’s great for a bursty API that real users hit in clusters — a dashboard firing 15 XHRs on page load — but it’s wrong for a login endpoint, because it hands an attacker 20 free attempts before throttling kicks in. Same directive, opposite intent. This is the tradeoff I want AI to explain back to me, and it does it well. The decision is mine.

Allowlisting the traffic that shouldn’t be limited

Your health checks, your office, your monitoring — none of those should ever hit a 429. geo and map let you build a key that’s empty for allowlisted sources, and an empty key disables the limit:

http {
    geo $limit_exempt {
        default          0;
        10.0.0.0/16      1;   # internal / load balancer health checks
        203.0.113.10/32  1;   # office egress IP
    }

    map $limit_exempt $limit_key {
        0  $binary_remote_addr;  # limited: real key
        1  "";                   # exempt: empty key bypasses the zone
    }

    limit_req_zone $limit_key zone=login:10m rate=10r/s;
}

This pattern is cleaner than maintaining separate location blocks, and it’s another good thing to have AI draft — the geo/map interaction is easy to invert by accident. Read the generated version carefully: confirm default is 0 (limited), not 1 (exempt). A flipped default means you’ve accidentally exempted the entire internet.

Where the human stays in control

None of this ships until I’ve validated it twice. First, syntax and structure — never reload on faith:

nginx -t
# nginx: configuration file /etc/nginx/nginx.conf test is successful
nginx -s reload

nginx -t catches typos and bad directive contexts, but it tells you nothing about whether rate=10r/s burst=20 is right for your traffic. That answer only comes from load. I generate realistic traffic — a tool like hey or wrk against a staging copy, plus a replay of real access-log patterns — and watch three things: the 429 rate in the access log, p99 latency on the limited endpoint, and whether any legitimate client pattern trips the limit. If real users 429, the burst is too tight or the rate too low. If attackers still get through, it’s too loose. AI gives me a defensible starting point and the reasoning behind it; the traffic tells me the truth, and I tune from there.

That division of labor is the whole game. AI is a fast, knowledgeable pair for the parts that are about syntax, tradeoffs, and arithmetic — and it’s genuinely good at all three. It is not a substitute for testing config that decides whether your real users can log in. Draft with it, interrogate its reasoning, then validate with nginx -t and a load test before it touches production.

If you want more in this vein, the rest of the NGINX category digs into edge config, and reviewing NGINX security configuration with AI pairs naturally with this — limiting is abuse control, that post is exposure control. The prompt above is a starting template; I keep refined versions of these in my prompt library so I’m not rewriting the “here’s where NGINX sits in my network” context every time.

What AI is actually good for here

Choosing the key: the part that’s actually load-bearing

A real config: protecting login and capping concurrency

Allowlisting the traffic that shouldn’t be limited

Where the human stays in control

Download the Free 500-Prompt DevOps AI Toolkit