Distributed Tracing With Grafana Tempo Alongside Prometheus
Metrics tell you something is slow; traces tell you where. Here's how to run Grafana Tempo next to Prometheus and use exemplars to jump from a latency spike to the exact trace.
- #prometheus
- #tempo
- #tracing
- #grafana
- #observability
- #sre
Prometheus is brilliant at telling you that checkout p99 jumped to two seconds. It is useless at telling you which downstream call ate that time. For years I closed that gap by grepping logs and guessing. Grafana Tempo, sitting next to Prometheus, closes it properly: you click a spike on a latency graph and land on the actual trace that produced it.
This is how I run Tempo as a companion to Prometheus, not a replacement for it.
Why Tempo and not “just more metrics”
You could add more histogram buckets and per-dependency timers forever, and you’d still be approximating. Traces are the ground truth of a single request’s path through your system. Tempo’s pitch is specifically that it’s cheap to operate: it indexes only the trace ID and stores spans in object storage (S3/GCS), so you keep a lot of traces without a heavyweight index. That cost profile is what makes “trace everything, sample on read” realistic.
The division of labor I aim for:
- Prometheus answers “is it bad, and how bad” across all requests (RED metrics).
- Tempo answers “for this slow request, where did the time go.”
- Exemplars are the bridge between them.
Getting spans in: the same Collector you already run
If you’ve deployed the OpenTelemetry Collector for metrics, traces are nearly free — it’s another pipeline in the same process:
receivers:
otlp:
protocols:
grpc:
endpoint: 0.0.0.0:4317
exporters:
otlp/tempo:
endpoint: tempo:4317
tls:
insecure: true
service:
pipelines:
traces:
receivers: [otlp]
processors: [memory_limiter, batch]
exporters: [otlp/tempo]
Your instrumented services emit OTLP spans, the Collector batches them, Tempo stores them. No separate agent, no second protocol to babysit.
Exemplars: the click that saves the night
The feature that makes this combo worth it is exemplars. An exemplar is a trace ID attached to a metric sample — so a point on your latency histogram knows about an example request that produced it. In Grafana, that turns into a little diamond on the graph you can click to open the trace.
To get them, your instrumentation must emit exemplars (most OTEL SDKs do when tracing is enabled), and Prometheus must store them. Enable the feature flag:
# prometheus startup flag
--enable-feature=exemplar-storage
# prometheus.yml
storage:
exemplars:
max_exemplars: 100000
Then a histogram query in Grafana shows exemplar diamonds. The workflow becomes: spot the p99 spike in a Prometheus panel, click the highest exemplar diamond on it, read the trace in Tempo. Two clicks from “something is slow” to “this span took 1.8s waiting on the payments gRPC call.”
Generate metrics from traces
Tempo’s metrics-generator can produce RED metrics and a service graph directly from spans, then remote-write them into Prometheus:
# tempo.yaml
metrics_generator:
registry:
external_labels:
source: tempo
storage:
remote_write:
- url: http://prometheus:9090/api/v1/write
processor:
service_graphs:
enabled: true
span_metrics:
enabled: true
This gives you traces_spanmetrics_latency and a service-graph view of who-calls-whom, derived from real traffic rather than a hand-drawn architecture diagram that’s six months stale. Query it like any other Prometheus metric:
histogram_quantile(0.99,
sum by (le, service) (
rate(traces_spanmetrics_latency_bucket[5m])
)
)
Sampling: keep what’s interesting
You almost never want to store 100% of traces at scale. Tail sampling — deciding after a trace completes — lets you keep the ones that matter: anything with an error, anything slow, and a small baseline of normal traffic for context.
processors:
tail_sampling:
policies:
- name: errors
type: status_code
status_code: { status_codes: [ERROR] }
- name: slow
type: latency
latency: { threshold_ms: 500 }
- name: baseline
type: probabilistic
probabilistic: { sampling_percentage: 5 }
Tail sampling has to happen on a Collector that sees the whole trace, so it lives on the gateway tier, not the per-node agents. That’s the one topology constraint people trip on.
What to actually alert on
Resist the urge to alert on traces directly. Alert on the metrics (Prometheus), and use traces to investigate the page. The healthy pattern:
- Prometheus alert fires on RED metrics — error rate or p99 latency over an SLO threshold.
- The alert links to a Grafana dashboard with exemplars enabled.
- The on-call clicks an exemplar and reads the offending trace.
That keeps your alerting deterministic and cheap while still giving you depth on demand. If you’re tuning where alerts go, our monitoring alert routing write-up pairs well with this.
Retention and cost reality
Tempo stores in object storage, so retention is a bucket lifecycle policy, not a disk-sizing exercise. I keep error and slow traces longer than baseline by routing them to different storage tiers where the backend supports it. Thirty days of “interesting” traces and a few days of everything else covers most debugging, and the bill stays sane.
The bottom line
Tempo doesn’t replace Prometheus — it answers the question Prometheus can’t. Send spans through the Collector you already run, turn on exemplars so a metric spike is one click from its trace, generate RED metrics and a service graph from real traffic, and sample on the tail so you keep what’s worth keeping. For more on the metrics half of this stack, start at the Prometheus & Monitoring category.
Tracing configurations differ across Tempo and Collector versions. Validate against your deployment and the official Grafana docs before production use.
Download the Free 500-Prompt DevOps AI Toolkit
500 battle-tested, copy-paste AI prompts engineered by a senior systems engineer — every one with fill-in placeholders and safety/back-out notes. Drop your email and it's yours.
- 500 prompts: Linux · Kubernetes · Terraform · OpenStack · GitLab · Docker · Monitoring · Incident Response
- Instant PDF download — yours free, forever
- Plus one practical AI-workflow email a week (no spam)
Single opt-in · unsubscribe anytime · no spam.