Reading Loki Logs With AI: Patterns That Work
Loki query syntax is unfamiliar to most engineers. AI can help write LogQL, but it can also produce queries that look right and return nothing. Here's how to use it well.
- #loki
- #logs
- #logql
- #ai
- #observability
If you’ve adopted Loki for log aggregation, you’ve probably had this moment: you need to find something in your logs right now, you open Grafana, and you stare at the empty LogQL query bar trying to remember whether it’s |= or =~ for the substring filter. Five minutes later you’ve cobbled something together, run it, gotten zero results, and you’re not sure if the query is wrong or the logs aren’t there.
This is the kind of friction AI is good at removing. LogQL has a small, structured grammar; the model knows it; you describe what you want; you get a working query. But — and this is the recurring theme — the model will also sometimes produce queries that are syntactically valid and semantically wrong, and there’s a specific way to catch that.
The basics AI handles well
These are the patterns I use AI for daily without much verification:
Label filter + substring:
Give me a LogQL query that finds error messages from the
paymentsapp in theproductionnamespace over the last hour.
Reliable output:
{namespace="production", app="payments"} |= "error"
Counting by level:
Count the rate of log entries per level for the
webapp over 5-minute windows.
Reliable output:
sum by (level)(rate({app="web"} | json | __error__="" [5m]))
JSON parsing + extraction:
Show me request durations from the
apiapp where the JSONduration_msfield is greater than 1000.
Reliable output:
{app="api"} | json | duration_ms > 1000
These are exactly the kinds of queries that take me 5 minutes to write from memory and 5 seconds to get from Claude. Worth it.
Where AI gets LogQL wrong
Confusing PromQL and LogQL syntax
The model has read more PromQL than LogQL, and sometimes it leaks. You’ll get a query like:
rate({app="web"}[5m]) by (level)
That by (level) placement is PromQL syntax. In LogQL, you need:
sum by (level)(rate({app="web"}[5m]))
The Grafana editor catches this and tells you it’s a syntax error. But if you’re using logcli or the API directly, you might get a confusing error and not realize the issue is structural.
Using labels that aren’t indexed
LogQL is fast when you filter on indexed labels (the ones in {}). It’s slow when you filter on extracted fields after | json. The model doesn’t know which of your labels are indexed; it’ll happily put high-cardinality fields in the curly braces.
If you ask:
Find all requests from user 12345.
You might get:
{app="api", user_id="12345"}
If user_id is not a stream label (and it usually shouldn’t be — it’s high cardinality), this query is invalid and Loki rejects it. The correct query is:
{app="api"} | json | user_id="12345"
When you describe the query, tell the model which labels are in the stream selector vs which are JSON fields. Otherwise it guesses.
Inventing operators
LogQL has |=, !=, |~, !~ for line filtering. The model sometimes invents contains, like, or other operators that don’t exist. The query fails with a parse error.
Easy to catch — the parse error tells you exactly which token is wrong — but worth knowing as a class of failure.
Confused unwrap behavior
For metric queries over log values (rate of a histogram, sum of a counter), you need | unwrap. The model sometimes uses | unwrap incorrectly or skips it when needed. The query runs but returns 0 or NaN, which looks like “no data” but is really “wrong aggregation.”
This one is harder to catch because the query executes. You have to read the result and notice it’s wrong.
A workflow for unfamiliar log shapes
When you’re investigating logs you don’t normally look at — different team’s service, vendor product, etc. — there’s a specific sequence I use:
Step 1: Get a sample
{app="unfamiliar-service"} | json
Run this against a small time window. Grafana shows you the parsed fields. Now you know what’s available.
Step 2: Show the AI the sample
Paste a couple of representative log lines (sanitized) into Claude with:
Here are 3 sample log lines from a service I don’t usually monitor. They’re JSON. Tell me what each field appears to mean and what would be good labels to filter on.
The model reads the structure and tells you which fields are useful. This takes 30 seconds and gives you a mental model of the log shape.
Step 3: Generate the query
Now you can ask for a specific query with confidence that the field names you give the model are real:
Generate a LogQL query that filters the
unfamiliar-serviceapp for entries wherestatus_codeis 500 or 503 andduration_msis over 200.
The result will use the correct field names because you told the model what they are.
Step 4: Verify before alerting
If the query is going into an alert, run it against historical data and check the results. The model doesn’t know your baseline. A query that returns “no data” right now might return huge volumes during a normal incident, or vice versa.
A trap worth flagging
Loki’s query frontend caches results aggressively. If you’re iterating on a query and the AI changed something subtle, you might get cached results from the previous query and think your change didn’t take effect.
Fix: When iterating, change the time range slightly between queries (or use instant queries). This bypasses the cache.
Logs as a debugging input for AI
A separate but related use case: pasting logs into the AI to get help debugging. This works better than I expected for logs that have clear structure (JSON, logfmt) and worse for noisy unstructured logs.
The trick is to paste a window around the suspected issue — not the whole log file. Five minutes around the incident is usually plenty. The model spots patterns (“you have 47 OOM kills in this window, all on pods in the payments namespace”) that I’d miss manually.
But: keep the volume sane. A 5MB log paste degrades the model’s attention. If you have 10,000 lines, filter to the relevant subset first.
The pattern that ties it together
Most of what I’ve described is the same shape: give the model a small amount of accurate context (sample logs, label names, time range), then let it generate the LogQL. The failures all come from skipping the context step and hoping the model can guess your schema.
For prompts on Loki specifically, see the Loki log aggregation design and the Grafana logs panel patterns. For PromQL, the related PromQL query optimization prompt covers similar territory.
Download the Free 500-Prompt DevOps AI Toolkit
500 battle-tested, copy-paste AI prompts engineered by a senior systems engineer — every one with fill-in placeholders and safety/back-out notes. Drop your email and it's yours.
- 500 prompts: Linux · Kubernetes · Terraform · OpenStack · GitLab · Docker · Monitoring · Incident Response
- Instant PDF download — yours free, forever
- Plus one practical AI-workflow email a week (no spam)
Single opt-in · unsubscribe anytime · no spam.