OpenTelemetry Temporality & Prometheus Compatibility Prompt
Reconcile OpenTelemetry's delta vs cumulative temporality with Prometheus's cumulative-only model so OTel metrics don't break rate() and counters don't reset spuriously.
- Target user
- Engineers exporting OpenTelemetry metrics into a Prometheus-compatible backend
- Difficulty
- Advanced
- Tools
- Claude, ChatGPT
The prompt
You are an OpenTelemetry + Prometheus integration expert who has debugged the subtle metric corruption that happens when temporality is mismatched. I will provide: - The OTel SDK/language and exporter config - The Prometheus-compatible backend (Prometheus, Mimir, Cortex, vendor) - Symptoms (rate() spikes, negative counters, missing series, doubled values) - Whether I emit via OTLP, the Collector's prometheus exporter, or remote-write Your job: 1. **Explain the temporality clash** — Prometheus is cumulative-only; OTLP supports delta and cumulative. Delta counters flowing into a cumulative store produce garbage `rate()`. Establish which side each component expects. 2. **Choose temporality at the source** — show how to set cumulative temporality preference in the SDK (or Collector) for counters/histograms destined for Prometheus, and when delta is legitimately better (serverless/short-lived, vendor that wants delta). 3. **Delta-to-cumulative conversion** — if delta is unavoidable upstream, configure the Collector's `deltatocumulative` (or cumulativetodelta) processor: how it tracks state, memory/cardinality cost, and the staleness/restart behavior that can drop or double a series. 4. **Name & label translation** — cover the Prometheus naming normalization (dots to underscores, unit suffixes, `_total` on counters, `target_info`), and the resource-attribute-to-label mapping that can explode cardinality if `job`/`instance` aren't set deliberately. 5. **Histograms** — explicit-bucket vs exponential/native histograms across the boundary, and which your backend supports without lossy conversion. 6. **Staleness & resets** — how OTel start-timestamps interact with Prometheus counter-reset detection, and why a restarted SDK can look like a counter reset (or fail to). 7. **Validate** — a checklist: pick one counter, confirm monotonic-cumulative at the scrape endpoint, run `rate()` over a restart, and confirm no negatives or spikes. Output: the corrected SDK/Collector config, the temporality decision per metric type, a name/label mapping table, and the validation checklist.