Grafana Loki + Prometheus Correlation Prompt
Correlate metrics and logs in Grafana — exemplars from Prometheus to traces, derived fields from Loki, jump from spike to log line.
- Target user
- SREs debugging with metrics + logs
- Difficulty
- Advanced
- Tools
- Claude, ChatGPT
The prompt
You are a senior SRE who has built dashboards correlating metrics with logs — click on a latency spike, see relevant logs immediately. I will provide: - Current setup (Prom + Loki versions) - Use case - Symptom (correlation not working) Your job: 1. **Correlation patterns**: - **Time-aligned panels** — metric + log volume + logs on same dashboard - **Exemplars** — Prom links to traces from histogram buckets - **Derived fields** — Loki extracts traceID from logs, links to Tempo - **Split view (Explore)** — drag from metric panel to logs 2. **For exemplars**: - Prometheus must support exemplars - Apps emit metrics with exemplar (traceID) - Grafana renders dots on histogram heatmap 3. **For Loki derived fields**: - Regex match in log - Extract traceID - Link to Tempo datasource 4. **For panel-to-panel sync**: - Shared dashboard variable for time range - Click drills into specific service 5. **For Tempo / trace correlation**: - From metric: exemplar → trace - From log: derived field → trace - From trace: service graph 6. **For ad-hoc filters**: - Variable applied to all panels - Useful for narrowing investigation 7. **For Explore mode**: - Side-by-side metrics + logs - Time linked 8. **For data source UIDs**: - Cross-DS links need correct UID - DS provisioning sets these Mark DESTRUCTIVE: removing exemplars from app (loses correlation), changing DS UID (breaks derived field links), overly aggressive derived field regex (false matches). --- Setup: [DESCRIBE] Use case: [DESCRIBE] Symptom: [DESCRIBE]
Why this prompt works
Correlation is the modern observability story. This prompt walks setup.
How to use it
- App emits exemplars + traceID in logs.
- Prom stores exemplars.
- Loki derived fields extract traceID.
- Tempo serves traces.
Setup
Prometheus exemplar support
# In Prometheus config
global:
scrape_interval: 30s
# Enable exemplars (Prom 2.26+)
# Storage:
storage:
exemplars:
max_exemplars: 1000000
# CLI flag:
--enable-feature=exemplar-storage
App instrumentation (Go example)
import "github.com/prometheus/client_golang/prometheus"
histogram := prometheus.NewHistogramVec(prometheus.HistogramOpts{
Name: "http_request_duration_seconds",
Buckets: prometheus.DefBuckets,
}, []string{"method"})
// Observe with exemplar
histogram.WithLabelValues("GET").(prometheus.ExemplarObserver).
ObserveWithExemplar(duration, prometheus.Labels{"traceID": traceID})
Loki app: include traceID in logs
logger.Info("request processed",
zap.String("traceID", traceID),
zap.Duration("duration", duration))
Loki derived field
datasources:
- name: Loki
type: loki
jsonData:
derivedFields:
- matcherRegex: 'traceID[=":\s]+(\w+)'
name: TraceID
url: ''
datasourceUid: tempo-uid
urlDisplayLabel: "View Trace"
Tempo datasource
- name: Tempo
type: tempo
uid: tempo-uid
url: http://tempo:3200
jsonData:
tracesToLogs:
datasourceUid: loki-uid
filterByTraceID: true
tracesToMetrics:
datasourceUid: prometheus-uid
serviceMap:
datasourceUid: prometheus-uid
Correlated dashboard layout
┌─────────────────────────────────────────┐
│ Service Selector: $service │
├─────────────────────────────────────────┤
│ Request Rate | Error Rate │
│ [time series] | [time series] │
├─────────────────────────────────────────┤
│ Latency Heatmap (with exemplar dots) │
│ [click exemplar → trace in Tempo] │
├─────────────────────────────────────────┤
│ Log Volume | Logs (filtered) │
│ [time series] | [logs panel] │
│ | [click TraceID │
│ | → trace in Tempo]│
└─────────────────────────────────────────┘
Common findings this catches
- No exemplars visible → app not emitting OR Prom not storing.
- Derived field not linking → DS UID wrong; regex no match.
- Click on exemplar goes nowhere → tracesTo* config missing.
- Time skew between metric and log → NTP issues.
- High exemplar volume → Prom storage; tune retention.
- Tempo can’t find trace → retention; sampling lost it.
- Multi-cluster correlation → cluster label propagation.
When to escalate
- App instrumentation rollout — engage app teams.
- Tempo / Loki scaling — engineering.
- Trace sampling design — coordinate.
Related prompts
-
Grafana Logs Panel & Derived Fields Prompt
Use Grafana Logs panel — Loki queries, derived fields (link to traces), log volume panel, streaming logs.
-
Grafana Tempo Distributed Tracing Prompt
Visualize traces in Grafana — Tempo data source, service graph, span metrics, trace search, OTLP integration.
-
OpenTelemetry on Kubernetes Collector Design Prompt
Design and debug the OpenTelemetry Collector on Kubernetes — agent vs gateway, receivers/processors/exporters, sidecar vs DaemonSet, traces/metrics/logs pipelines.