NGINX Log Format Design & Analysis Prompt

Design a structured (JSON or key-value) access log format that captures upstream timing, request IDs, and cache status — then get a set of awk/jq one-liners to actually answer questions from it during an incident.

Target user

Engineers who want NGINX logs that are useful at 3am, not just a wall of combined-format text

Difficulty

Intermediate

Tools

Claude, ChatGPT, Cursor

You are a senior infrastructure engineer who treats access logs as a debugging tool, not exhaust. You log upstream timings, a request ID, and cache status, and you can answer "what's slow right now" from the log alone. I will provide: - What I most often need to debug: [DESCRIBE — slow upstreams, 5xx spikes, cache misses, specific clients] - Where logs go (file, stdout, shipped to a collector): [DESCRIBE] - Whether downstream tooling wants JSON or key=value: [DESCRIBE] - Any PII/compliance constraints (don't log full query strings, mask IPs): [DESCRIBE] Build the solution: 1. **The log_format** — a named `log_format` capturing at minimum: `$request`, `$status`, `$body_bytes_sent`, `$request_time`, `$upstream_response_time`, `$upstream_addr`, `$upstream_cache_status`, `$http_x_request_id` (or generate one), and `$remote_addr`. Output JSON if requested, with correct escaping. Explain why `$request_time` vs `$upstream_response_time` distinguishes NGINX overhead from backend slowness. 2. **PII handling** — show how to omit or mask anything the constraints forbid; never log secrets or full auth headers. 3. **Wiring** — the `access_log` directive referencing the format, plus a note on buffering (`buffer=`/`flush=`) for high-traffic sites. 4. **Analysis one-liners** — 4-6 ready-to-run `jq`/`awk` commands answering: slowest endpoints by p95 upstream time, top 5xx URLs, cache hit ratio, and requests over N seconds. Make them paste-ready for the format you defined. Output: (a) the `log_format` + `access_log` config, (b) the analysis command cheat-sheet, (c) a one-line `tail`/`jq` sanity check to confirm the format parses. Validate the config with `nginx -t` and reload — do not change a live log format without confirming your log shipper still parses it.

Why this prompt works

The default combined log format is built for the 1990s: it tells you what was requested and what status came back, but nothing about why a request was slow or which backend served it. The single most valuable upgrade is logging $upstream_response_time alongside $request_time, because the gap between them tells you instantly whether NGINX or the backend is the bottleneck — a distinction you otherwise burn an hour guessing at during an incident. This prompt makes those fields, plus cache status and a request ID, non-negotiable.

Pairing the format design with paste-ready jq/awk one-liners is the part that pays off at 3am. A structured log you can’t query is just prettier exhaust; having “top 5xx URLs” and “p95 upstream time by endpoint” already written for the exact fields you defined means you go from “something’s wrong” to “this endpoint, this backend” in seconds. Tying the queries to the format you just generated guarantees they actually match your field names.

The PII guardrail is there because structured logging makes it easy to over-capture. Once you’re emitting JSON, dumping the full query string or auth headers is one line away, and that line quietly turns your log store into a credential leak. Forcing an explicit masking decision — and confirming the downstream shipper still parses the new format before rollout — keeps a useful change from becoming a compliance incident.

NGINX Log Format Design & Analysis Prompt

Why this prompt works

Related prompts

NGINX 502/504 Bad Gateway Triage Prompt

NGINX Performance Tuning Prompt

Why this prompt works

Related prompts

NGINX 502/504 Bad Gateway Triage Prompt

NGINX Performance Tuning Prompt

Free: the DevOps AI Incident-Triage Cheat Sheet