AI-Assisted NGINX Proxy Caching and Microcaching

Last quarter our upstream app servers started buckling under a traffic spike that wasn’t even that large — a few thousand requests per second hitting an endpoint that was mostly the same JSON for every anonymous visitor. The app was rendering it fresh every single time. NGINX was sitting in front doing nothing but proxying. The fix was obvious in hindsight: cache the responses. But the config to do that correctly — without serving stale data to logged-in users or stampeding the origin on every cache expiry — is fiddly enough that people copy-paste it from a 2014 blog post and pray. I drafted ours with an AI assistant, then spent the real time validating that the cache key, the bypass rules, and the stale behavior actually did what I thought. That second part is the job. The AI gets you a first draft fast; it does not get to decide what’s correct.

Set up the cache zone

Everything starts with proxy_cache_path. This lives in the http block, not inside a server, because the cache is shared across all your virtual hosts.

proxy_cache_path /var/cache/nginx/proxy
    levels=1:2
    keys_zone=app_cache:50m
    max_size=2g
    inactive=60m
    use_temp_path=off;

keys_zone=app_cache:50m reserves 50MB of shared memory for cache keys and metadata — roughly 400k entries — not the cached bodies, which go on disk under max_size=2g. inactive=60m evicts anything not requested in an hour regardless of its validity. use_temp_path=off writes directly into the cache directory instead of staging through a temp dir, which avoids a needless cross-filesystem copy.

This is exactly the kind of thing AI is good at explaining. I pasted the raw directive and asked what each parameter controlled and what happens when max_size is hit. The answer correctly described the LRU-style eviction the cache manager performs. But “sounds right” is not “is right” — I still confirmed the behavior against the official directive docs before shipping.

Wire it into a location

server {
    listen 443 ssl;
    server_name app.example.com;

    location /api/ {
        proxy_pass http://backend;
        proxy_cache app_cache;
        proxy_cache_key "$scheme$request_method$host$request_uri";
        proxy_cache_valid 200 301 302 10m;
        proxy_cache_valid 404 1m;

        add_header X-Cache-Status $upstream_cache_status always;
    }
}

The proxy_cache_key is the single most important line here, and it’s where AI reasoning genuinely helps. The default key omits things you might care about and includes things you might not. Ask the assistant to reason through it:

Prompt: “My NGINX proxy_cache_key is $scheme$request_method$host$request_uri. The same URL returns different JSON depending on an Accept-Language header. What’s wrong with my cache key, and what’s the risk if I leave it?”

A good answer points out that two clients requesting the same URL with different Accept-Language values will collide on one cache entry — whoever populates it first wins, and everyone else gets the wrong language. The fix is to fold the varying input into the key, e.g. append $http_accept_language. The assistant can spot that class of bug instantly. What it cannot do is know which headers actually vary your responses — that’s your domain knowledge, and getting the key wrong means silently serving wrong content to real users. Validate it.

The X-Cache-Status header is your debugger

$upstream_cache_status is the difference between guessing and knowing. Add it as a header and inspect with curl:

curl -sI https://app.example.com/api/widgets | grep -i x-cache-status

You’ll see one of MISS, HIT, EXPIRED, STALE, UPDATING, REVALIDATED, or BYPASS. The first request to a fresh key is MISS; the next, within the valid window, is HIT. Hit this in a loop and watch it flip:

for i in 1 2 3; do
  curl -sI https://app.example.com/api/widgets \
    | grep -i x-cache-status
done

If every request says MISS, your cache key is varying when you didn’t intend it to (a Set-Cookie from the upstream, a no-store Cache-Control, a query param ordering issue). If it says HIT for a user who should see personalized data, your bypass rules are broken. Either way the header tells you the truth — don’t trust the AI’s mental model of your traffic, trust this.

Microcaching: 1 second is not zero

The trick that saved us on that JSON endpoint: cache “dynamic” content for one second. At a few thousand req/s, a 1-second TTL collapses thousands of identical origin hits into one, while no user ever sees data more than a second stale. That’s a trade most product owners will happily take once you frame it that way.

location /api/feed {
    proxy_pass http://backend;
    proxy_cache app_cache;
    proxy_cache_key "$scheme$request_method$host$request_uri";

    proxy_cache_valid 200 1s;

    proxy_cache_use_stale error timeout updating
                          http_500 http_502 http_503 http_504;
    proxy_cache_lock on;

    add_header X-Cache-Status $upstream_cache_status always;
}

Two directives make microcaching production-grade. proxy_cache_lock on ensures that when an entry expires, only one request goes to the origin to refresh it while the others wait — no thundering herd. proxy_cache_use_stale ... updating lets NGINX serve the slightly-stale cached copy while that single request refreshes in the background, so latency stays flat. Adding http_500 http_502 http_503 http_504 means if your origin falls over, users keep getting the last good cached response instead of an error page. That last clause has saved more weekends than I can count.

This is stale-while-revalidate behavior, and it’s exactly where I lean on AI to talk through edge cases — “if the origin returns a 500 during a refresh, does the client see the stale copy or the 500?” — but I confirm the answer by actually killing the upstream in staging and watching X-Cache-Status go STALE. The reasoning is a hypothesis; the curl is the test.

Don’t cache logged-in users

The cardinal sin of proxy caching is serving one user’s authenticated response to everyone else. If your app sets a session cookie, exclude those requests:

map $http_cookie $no_cache {
    default          0;
    "~*sessionid="   1;
}

location /api/ {
    proxy_pass http://backend;
    proxy_cache app_cache;
    proxy_cache_key "$scheme$request_method$host$request_uri";
    proxy_cache_valid 200 10m;

    proxy_cache_bypass $no_cache;
    proxy_no_cache     $no_cache;

    add_header X-Cache-Status $upstream_cache_status always;
}

proxy_cache_bypass means “don’t read from cache for this request — go to the origin.” proxy_no_cache means “don’t write this response to cache.” You almost always want both set to the same condition. Here a map flips $no_cache to 1 whenever a sessionid= cookie is present, so authenticated traffic skips the cache entirely in both directions. Verify it with a cookie:

curl -sI https://app.example.com/api/widgets \
  -H 'Cookie: sessionid=abc123' | grep -i x-cache-status
# expect: X-Cache-Status: BYPASS

A real BYPASS there is your proof the personalization carve-out works. This is the highest-stakes check in the whole setup, so I never delegate the verdict to the model — I run the curl myself, with and without the cookie, and confirm BYPASS versus HIT.

Validate before you reload

Never reload on a hope. Test the config, then reload gracefully so in-flight connections drain:

sudo nginx -t && sudo nginx -s reload

nginx -t parses the full config and catches syntax errors, bad paths, and missing directives before they touch a live worker. If it prints syntax is ok and test is successful, the reload is safe. If the AI-drafted config has a typo or references a zone you never defined, this is where it surfaces — which is the whole point of keeping a human and nginx -t between the model’s output and production.

Where AI fits, and where it doesn’t

The division of labor that works for me: AI drafts the directives and explains the trade-offs faster than I’d reconstruct them from memory, especially the proxy_cache_use_stale matrix and cache-key reasoning. I own everything that determines correctness — which headers vary the response, which cookies mean “do not cache,” what stale window the product can tolerate, and whether the hit rate I’m seeing in X-Cache-Status matches the load I expected. If you want a starting point, the prompts library has a few I use for caching reviews, and there’s more NGINX material under AI for NGINX.

Ship the config, but earn the cache. Run the curl loop, watch the status header, kill the upstream in staging once to see STALE fire, and only then call it done. The AI gives you a strong first draft in a tenth of the time. The validation is still yours.