AI-Assisted NGINX Access and Error Log Analysis

During an incident, the difference between a five-minute fix and a one-hour fishing expedition is usually your logs. If NGINX is emitting the default combined format, you know what was requested and what came back — and almost nothing about why anything was slow. I’ve rebuilt enough log formats mid-incident to have learned the lesson: design the log once, when nothing is on fire, to capture the fields you’ll wish you had. AI is great at two things here: drafting a structured log format with correct escaping, and writing the jq/awk one-liners to query it. You still own the call about what to log and the discipline not to leak secrets into it.

This guide covers building NGINX logs that answer questions, using AI to draft the format and the queries.

Why the default format fails you

The combined log format predates everything we care about in a modern stack. It has no notion of which upstream served the request, how long the upstream took versus how long NGINX added, whether a response was a cache hit, or a request ID to correlate across services. So when someone says “the API got slow at 2pm,” the default log can tell you which URLs were hit but not whether the slowness was NGINX, the backend, or a specific upstream instance.

The single most valuable field to add is $upstream_response_time next to $request_time. The gap between them is the time NGINX spent that wasn’t the backend — buffering, queuing, TLS. That one comparison resolves the “is it us or them” question instantly.

Drafting a structured log format

I want JSON so downstream tools can parse it without a custom regex. Here’s the prompt:

Draft an NGINX log_format that emits JSON with: time, remote_addr, request method and URI, status, body_bytes_sent, request_time, upstream_response_time, upstream_addr, upstream_cache_status, and the X-Request-ID header. Escape it correctly for JSON. Then give me the access_log directive that uses it. Output config only.

The result, after I read the escaping carefully:

log_format json_combined escape=json
  '{'
    '"time":"$time_iso8601",'
    '"remote_addr":"$remote_addr",'
    '"method":"$request_method",'
    '"uri":"$request_uri",'
    '"status":$status,'
    '"bytes":$body_bytes_sent,'
    '"request_time":$request_time,'
    '"upstream_time":"$upstream_response_time",'
    '"upstream_addr":"$upstream_addr",'
    '"cache":"$upstream_cache_status",'
    '"request_id":"$http_x_request_id"'
  '}';

access_log /var/log/nginx/access.json.log json_combined buffer=32k flush=5s;

Two details worth verifying in any AI draft:

escape=json on the log_format is what keeps a quote or backslash in a URI from corrupting your JSON. Without it, one weird request breaks every parser downstream. AI sometimes omits it.
buffer= and flush= batch log writes so high-traffic sites don’t pay a syscall per request. The trade-off is a few seconds of delay before logs hit disk, which is fine for analysis and worth knowing during a live tail.

Don’t log your way into a breach

Structured logging makes over-capture dangerously easy. Once you’re building a JSON object, adding "cookie":"$http_cookie" or the full Authorization header is one line away — and now your log store holds session tokens and credentials. I’m explicit with AI about exclusions, and I review the final field list against retention policy. The rule of thumb: log enough to debug, never enough to impersonate a user. Query strings often carry tokens too, so if yours do, log the path without the args.

Asking AI for the queries

This is where the structured format pays off. Instead of remembering awk field positions, I hand AI a sample line and ask for the query:

Here’s a sample line from my NGINX JSON access log: [PASTE ONE LINE]. Give me a jq command that shows the 10 endpoints with the highest p95 upstream_time over this log file.

For a JSON log, the queries stay readable:

# Top 5xx URLs right now
jq -r 'select(.status >= 500) | .uri' access.json.log | sort | uniq -c | sort -rn | head

# Cache hit ratio
jq -r '.cache' access.json.log | sort | uniq -c

# Every request slower than 2 seconds, with its upstream
jq -r 'select((.upstream_time|tonumber? // 0) > 2)
       | "\(.upstream_time)s \(.uri) -> \(.upstream_addr)"' access.json.log

I always run the AI’s query against a sample first and sanity-check the count, because a subtly wrong filter (comparing a string to a number, missing the tonumber) returns plausible-looking nonsense. The AI writes the query fast; you confirm it counts the right thing.

The error log is the other half

The access log tells you what clients saw; the error log tells you what NGINX struggled with. When upstreams fail, the error log names the reason — connection refused, timeout, SSL handshake failure — and that’s often the actual root cause behind a wall of 502s in the access log. A quick triage pass I lean on:

# Most common recent error reasons
grep -oP '\[error\].*?(?=, client)' /var/log/nginx/error.log \
  | sort | uniq -c | sort -rn | head

Pasting a few representative error lines into an AI assistant and asking “what does this NGINX error mean and what are the usual causes” is a legitimately fast way to decode an unfamiliar message — as long as you treat the answer as a lead to verify, not a verdict.

Validate before you reload

Changing a log format is a config change, so it goes through the gate, and there’s an extra step: confirm your log shipper still parses the new format before you roll it everywhere.

sudo nginx -t
sudo nginx -s reload

# Then sanity-check the format parses as JSON:
tail -n 1 /var/log/nginx/access.json.log | jq .

If that jq . errors, your format has an escaping bug — fix it before your collector starts dropping lines silently.

Where AI fits

AI drafted the log format with correct JSON escaping, wrote the jq queries against my sample line, and decoded an unfamiliar error message faster than I’d have found it in the docs. What it didn’t decide was which fields were worth the bytes, which ones would leak PII, or whether a given query actually counted the right rows. Those judgment calls stay with you. Draft with AI, validate with nginx -t, and confirm both the format and the queries against real lines before you rely on them.

More in the AI for NGINX category. The log format design and analysis prompt gives you the reusable version of this, and the 502/504 triage prompt in the prompt library is the natural next step once your logs point at a misbehaving upstream.