Skip to content
CloudOps
Newsletter
All prompts
AI for Prometheus & Monitoring Difficulty: Advanced ClaudeChatGPT

Prometheus Exemplars & Trace Correlation Prompt

Wire Prometheus exemplars end-to-end so a spike on a latency histogram links directly to the slow trace in Tempo — covering instrumentation, OpenMetrics exposition, storage, and Grafana exemplar links.

Target user
Engineers connecting metrics to traces for faster root-cause on latency outliers
Difficulty
Advanced
Tools
Claude, ChatGPT

The prompt

You are an observability engineer who has built metrics-to-traces correlation so on-call can click a p99 spike and land on the exact slow request.

I will provide:
- My instrumentation stack (Prometheus client library, language, OTel or native)
- Histogram/summary metrics where I care about outliers
- Tracing backend (Tempo, Jaeger) and trace ID propagation
- Prometheus version and storage config

Your job:

1. **What exemplars are** — explain exemplars as sampled trace-id annotations on histogram buckets, and why they beat eyeballing dashboards next to a trace search. Clarify they ride the OpenMetrics exposition format, not classic Prometheus text format.

2. **Instrumentation** — show, for my language/client, how to attach an exemplar (trace_id + span_id) when observing a histogram, pulling the trace context from the active span. Cover the common mistake of recording exemplars without an active sampled span.

3. **Exposition & scrape** — enable OpenMetrics (`Accept: application/openmetrics-text`) and the Prometheus scrape-side flags (e.g., exemplar storage). Note exemplar storage is in-memory and capped — explain the eviction behavior.

4. **Storage sizing** — recommend `--storage.exemplars.exemplars-limit` (or equivalent) based on my series count, and the tradeoff of exemplar retention vs. memory.

5. **Grafana linking** — configure the Prometheus data source's exemplar settings and the `internalLink` to the Tempo data source so the trace-id renders as a clickable jump. Show the data-source JSON.

6. **Sampling interplay** — reconcile head/tail trace sampling with exemplars: if the linked trace was sampled out, the link 404s. Recommend a strategy (exemplar-aware sampling or always-sample on error/slow).

7. **Validation** — a query (`<metric>` with exemplars in the API response) and a checklist to confirm exemplars appear on the panel and resolve to real traces.

Output as: (a) instrumentation code snippet for my stack, (b) Prometheus scrape/storage flags, (c) Grafana data-source exemplar config JSON, (d) a sampling-strategy recommendation, (e) an end-to-end smoke test.

Bias toward: working click-through over completeness; explicit handling of the "trace was sampled out" failure.
Newsletter

Free: the DevOps AI Incident-Triage Cheat Sheet

Subscribe and we’ll send you the one-page cheat sheet — plus weekly AI prompts, automation ideas, and tool reviews for infrastructure engineers. One email a week. No spam, unsubscribe anytime.

  • AI Incident-Triage Cheat Sheet (PDF)
  • Access to 1,603 DevOps AI prompts
  • One practical workflow email per week