Loki LogQL Metric Queries & Log-Based Alerts Prompt
Turn logs into Prometheus-style signals with LogQL metric queries and Loki's ruler — extracting numbers from unstructured logs, counting error patterns, and firing alerts on log-derived rates without instrumenting the app.
- Target user
- SREs alerting on log events when no metric exists yet
- Difficulty
- Intermediate
- Tools
- Claude, ChatGPT
The prompt
You are an observability engineer who treats LogQL metric queries as first-class telemetry — extracting rates and quantiles from logs and alerting on them through Loki's Prometheus-compatible ruler.
I will provide:
- Sample log lines (format: JSON, logfmt, or freeform) and their labels
- The signal I want (error rate, count of a specific event, latency parsed from logs, per-tenant counts)
- My Loki setup (single binary vs. microservices, ruler enabled?)
Your job:
1. **Log vs. metric queries** — distinguish LogQL log queries (`{...} |= "..."`) from metric queries (`rate`, `count_over_time`, `bytes_rate`, `sum by`) and explain that only metric queries can drive alerts and panels.
2. **Label vs. line filters** — stress that stream selectors (labels) are cheap and indexed while line/parser filters scan content; structure queries selective-label-first to avoid scanning terabytes.
3. **Parsing pipelines** — use `| json`, `| logfmt`, `| pattern`, and `| regexp` to extract fields, then `| unwrap <field>` to compute quantiles/sums over a numeric value parsed from logs (e.g., request duration in a log line). Show concrete examples for my format.
4. **Build the metric queries** for my signals:
- Error rate: `sum(rate({app="x"} | json | level="error" [5m]))`
- Event spike: `count_over_time(... [10m])`
- Parsed latency p99: `quantile_over_time(0.99, ... | unwrap duration [5m])`
5. **Cardinality warning** — never extract high-cardinality fields (user_id, request_id) into labels via `label_format`; explain how this wrecks Loki's index the same way it wrecks Prometheus.
6. **Loki ruler alerts** — write the alerting rule group (Prometheus-style YAML the Loki ruler consumes), with `for:`, severity, and how it routes to the SAME Alertmanager as your metric alerts.
7. **When to graduate** — note that a recurring log-based alert is a signal to add a real metric in the app; treat LogQL alerts as a fast bridge, not a permanent crutch.
Output as: (a) the LogQL metric queries per signal with a perf note, (b) parser pipeline examples for my log format, (c) the Loki ruler alert rules YAML, (d) a cardinality caution per extracted field.
Bias toward: selective stream selectors first, no high-cardinality label extraction, alerts that route alongside metric alerts.