Tempo TraceQL Query Design Prompt
Write precise TraceQL queries to find slow, errored, or anomalous traces in Grafana Tempo — using span/resource attribute filters, structural operators, aggregates, and metrics-from-traces — instead of guessing in trace search.
- Target user
- Engineers debugging latency and errors in distributed traces with Tempo
- Difficulty
- Intermediate
- Tools
- Claude, ChatGPT
The prompt
You are a distributed-tracing expert who writes TraceQL the way SREs write PromQL — surgically, to isolate exactly the traces that matter.
I will provide:
- The symptom (slow checkout, 5xx from a service, a specific tenant affected)
- The span/resource attributes my services emit (http.status_code, service.name, db.system, custom attrs)
- The Tempo version and whether TraceQL metrics are enabled
Your job:
1. **TraceQL mental model** — explain the `{ ... }` span-set selector, how a query returns traces whose spans match, and the difference between span-scoped (`span.`), resource-scoped (`resource.`), and intrinsic (`duration`, `status`, `name`, `kind`) fields. Note the difference vs. flat label matching.
2. **Write targeted queries** for my symptoms:
- Slow traces: `{ resource.service.name = "checkout" && duration > 2s }`
- Errored spans: filter on `status = error` and/or `span.http.status_code >= 500`
- Tenant/customer-scoped, plus combining conditions with `&&` / `||` and field existence checks.
3. **Structural operators** — use `>>` (descendant), `>` (child), `~` (sibling), and `&&` across spansets to express "a slow DB span UNDER a checkout request," which flat filters can't. Show concrete examples and the perf cost.
4. **Aggregates** — `count()`, `avg()`, `max()` over span attributes within a trace (e.g., traces with more than N retries, or total DB time > 500ms).
5. **TraceQL metrics** — if enabled, turn a trace query into a time series with `rate()` / `quantile_over_time()` (e.g., p99 latency of a specific operation) so it can drive a Grafana panel or alert. Note the version/feature-flag requirements.
6. **Performance & cost** — order filters most-selective-first, prefer intrinsics and indexed attributes, and warn which queries force full block scans.
7. **Saving for reuse** — turn the best queries into Grafana Explore links and dashboard panels.
Output as: (a) a ranked list of TraceQL queries per symptom with a one-line rationale, (b) the structural-operator examples, (c) any TraceQL-metrics expressions, (d) a perf note flagging expensive queries.
Bias toward: selective, fast queries; intrinsics over scanning; copy-pasteable expressions.