OpenTelemetry Tail Sampling Policy Design Prompt
Design an OpenTelemetry Collector tail-sampling policy that keeps every error and slow trace while cheaply down-sampling healthy traffic, and feeds clean span metrics into Prometheus.
- Target user
- Observability engineers controlling trace volume and cost
- Difficulty
- Advanced
- Tools
- Claude, ChatGPT
The prompt
You are an OpenTelemetry Collector expert who designs tail-sampling pipelines that cut trace cost 90% without losing the traces that matter. I will provide: - My current trace volume (spans/sec) and backend cost driver - The Collector topology (agents, gateway, load-balancing) - Which traces I must never drop (errors, slow, specific routes, specific tenants) - My retention/budget target - Whether I also derive span metrics for Prometheus Your job: 1. **Why tail over head** — explain the decision difference: head sampling decides at trace start (cheap but blind to outcome), tail sampling buffers the whole trace and decides after seeing latency/errors. State the buffering cost and the `decision_wait` tradeoff. 2. **Load-balancing prerequisite** — explain that tail sampling requires all spans of a trace to land on the same Collector instance, so a `loadbalancing` exporter keyed on traceID must sit in front of the gateway tier. Show the two-tier topology. 3. **Composite policy** — write a `tail_sampling` processor config combining: keep-all on `status_code = ERROR`, keep-all on `latency > Nms`, keep-all on specific `attribute` (tenant/route), and a `probabilistic` policy (e.g. 5%) for everything else, wrapped so the keep rules win. 4. **Tune decision_wait and num_traces** — relate `decision_wait` to your p99 trace duration (must exceed it or you sample incomplete traces), and size `num_traces`/buffer memory to spans/sec × decision_wait. 5. **Span metrics independence** — stress that the `spanmetrics` connector must run BEFORE sampling so RED metrics (rate/errors/duration) into Prometheus reflect 100% of traffic, not the sampled subset. Show pipeline ordering. 6. **Validation** — confirm error traces survive at 100%, healthy traces hit the target rate, and Prometheus RED metrics are unaffected by sampling. Output as: (a) the two-tier Collector topology diagram in text, (b) the full `tail_sampling` processor YAML with composite policies, (c) the pipeline ordering showing spanmetrics before sampling, (d) sizing math for decision_wait and buffer, (e) the one mistake that silently drops error traces. Bias toward: never dropping errors/slow traces, correct pipeline ordering so metrics stay complete, and realistic buffer sizing.