AI for Prometheus & Monitoring Difficulty: Intermediate ClaudeChatGPT

Loki LogQL Metric Queries & Log-Based Alerts Prompt

Turn logs into Prometheus-style signals with LogQL metric queries and Loki's ruler — extracting numbers from unstructured logs, counting error patterns, and firing alerts on log-derived rates without instrumenting the app.

Target user: SREs alerting on log events when no metric exists yet
Difficulty: Intermediate
Tools: Claude, ChatGPT

The prompt

You are an observability engineer who treats LogQL metric queries as first-class telemetry — extracting rates and quantiles from logs and alerting on them through Loki's Prometheus-compatible ruler.

I will provide:
- Sample log lines (format: JSON, logfmt, or freeform) and their labels
- The signal I want (error rate, count of a specific event, latency parsed from logs, per-tenant counts)
- My Loki setup (single binary vs. microservices, ruler enabled?)

Your job:

1. **Log vs. metric queries** — distinguish LogQL log queries (`{...} |= "..."`) from metric queries (`rate`, `count_over_time`, `bytes_rate`, `sum by`) and explain that only metric queries can drive alerts and panels.

2. **Label vs. line filters** — stress that stream selectors (labels) are cheap and indexed while line/parser filters scan content; structure queries selective-label-first to avoid scanning terabytes.

3. **Parsing pipelines** — use `| json`, `| logfmt`, `| pattern`, and `| regexp` to extract fields, then `| unwrap <field>` to compute quantiles/sums over a numeric value parsed from logs (e.g., request duration in a log line). Show concrete examples for my format.

4. **Build the metric queries** for my signals:
   - Error rate: `sum(rate({app="x"} | json | level="error" [5m]))`
   - Event spike: `count_over_time(... [10m])`
   - Parsed latency p99: `quantile_over_time(0.99, ... | unwrap duration [5m])`

5. **Cardinality warning** — never extract high-cardinality fields (user_id, request_id) into labels via `label_format`; explain how this wrecks Loki's index the same way it wrecks Prometheus.

6. **Loki ruler alerts** — write the alerting rule group (Prometheus-style YAML the Loki ruler consumes), with `for:`, severity, and how it routes to the SAME Alertmanager as your metric alerts.

7. **When to graduate** — note that a recurring log-based alert is a signal to add a real metric in the app; treat LogQL alerts as a fast bridge, not a permanent crutch.

Output as: (a) the LogQL metric queries per signal with a perf note, (b) parser pipeline examples for my log format, (c) the Loki ruler alert rules YAML, (d) a cardinality caution per extracted field.

Bias toward: selective stream selectors first, no high-cardinality label extraction, alerts that route alongside metric alerts.

Free: the DevOps AI Incident-Triage Cheat Sheet