Skip to content
CloudOps
Newsletter
All prompts
AI for Prometheus & Monitoring Difficulty: Intermediate ClaudeChatGPT

Loki LogQL Metric Queries & Log-Based Alerts Prompt

Turn logs into Prometheus-style signals with LogQL metric queries and Loki's ruler — extracting numbers from unstructured logs, counting error patterns, and firing alerts on log-derived rates without instrumenting the app.

Target user
SREs alerting on log events when no metric exists yet
Difficulty
Intermediate
Tools
Claude, ChatGPT

The prompt

You are an observability engineer who treats LogQL metric queries as first-class telemetry — extracting rates and quantiles from logs and alerting on them through Loki's Prometheus-compatible ruler.

I will provide:
- Sample log lines (format: JSON, logfmt, or freeform) and their labels
- The signal I want (error rate, count of a specific event, latency parsed from logs, per-tenant counts)
- My Loki setup (single binary vs. microservices, ruler enabled?)

Your job:

1. **Log vs. metric queries** — distinguish LogQL log queries (`{...} |= "..."`) from metric queries (`rate`, `count_over_time`, `bytes_rate`, `sum by`) and explain that only metric queries can drive alerts and panels.

2. **Label vs. line filters** — stress that stream selectors (labels) are cheap and indexed while line/parser filters scan content; structure queries selective-label-first to avoid scanning terabytes.

3. **Parsing pipelines** — use `| json`, `| logfmt`, `| pattern`, and `| regexp` to extract fields, then `| unwrap <field>` to compute quantiles/sums over a numeric value parsed from logs (e.g., request duration in a log line). Show concrete examples for my format.

4. **Build the metric queries** for my signals:
   - Error rate: `sum(rate({app="x"} | json | level="error" [5m]))`
   - Event spike: `count_over_time(... [10m])`
   - Parsed latency p99: `quantile_over_time(0.99, ... | unwrap duration [5m])`

5. **Cardinality warning** — never extract high-cardinality fields (user_id, request_id) into labels via `label_format`; explain how this wrecks Loki's index the same way it wrecks Prometheus.

6. **Loki ruler alerts** — write the alerting rule group (Prometheus-style YAML the Loki ruler consumes), with `for:`, severity, and how it routes to the SAME Alertmanager as your metric alerts.

7. **When to graduate** — note that a recurring log-based alert is a signal to add a real metric in the app; treat LogQL alerts as a fast bridge, not a permanent crutch.

Output as: (a) the LogQL metric queries per signal with a perf note, (b) parser pipeline examples for my log format, (c) the Loki ruler alert rules YAML, (d) a cardinality caution per extracted field.

Bias toward: selective stream selectors first, no high-cardinality label extraction, alerts that route alongside metric alerts.
Newsletter

Free: the DevOps AI Incident-Triage Cheat Sheet

Subscribe and we’ll send you the one-page cheat sheet — plus weekly AI prompts, automation ideas, and tool reviews for infrastructure engineers. One email a week. No spam, unsubscribe anytime.

  • AI Incident-Triage Cheat Sheet (PDF)
  • Access to 1,603 DevOps AI prompts
  • One practical workflow email per week