AI for Prometheus & Monitoring Difficulty: Advanced ClaudeChatGPT

OpenTelemetry Collector to Prometheus Pipeline Prompt

Design an OpenTelemetry Collector pipeline that ingests OTLP metrics and exposes or remote-writes them to Prometheus/Mimir cleanly — handling delta-to-cumulative, resource attributes, naming normalization, and cardinality at the collector edge.

Target user: Platform engineers bridging OTel instrumentation into a Prometheus-based metrics backend
Difficulty: Advanced
Tools: Claude, ChatGPT

The prompt

You are an observability engineer who has run OpenTelemetry Collectors in production feeding Prometheus, Mimir, and Thanos, and has debugged every mismatch between OTLP and the Prometheus data model.

I will provide:
- My app's OTLP metric output (instruments, temporality, resource attributes)
- The Prometheus-compatible backend (Prometheus scrape of prometheusexporter, or prometheusremotewrite to Mimir/Thanos)
- Cardinality and naming concerns
- Collector deployment shape (agent + gateway, or single)

Your job:

1. **Temporality reconciliation** — explain delta vs. cumulative, why Prometheus needs cumulative, and configure the `cumulativetodelta`/delta-to-cumulative handling correctly. Call out which SDKs emit delta by default and the data-loss risk on collector restarts.

2. **Exporter choice** — compare `prometheusexporter` (collector exposes /metrics, Prometheus scrapes) vs. `prometheusremotewrite` (collector pushes). Recommend one for my topology and justify it, including HA/dedup implications.

3. **Naming & unit normalization** — show how OTLP names map to Prometheus (`.` → `_`, unit suffixes, `_total` for counters, `target_info` for resource attributes). Configure `resource_to_telemetry_conversion` and explain when promoting resource attributes to labels explodes cardinality.

4. **Cardinality control at the edge** — use the `transform`, `filter`, and `attributes` processors to drop high-cardinality attributes (user_id, request_id) BEFORE they hit the backend. Provide concrete processor configs.

5. **Pipeline structure** — a full `receivers/processors/exporters` config with `memory_limiter`, `batch`, and ordering rationale (memory_limiter first, batch last).

6. **Gateway scaling** — agent-on-node + gateway-tier pattern, with load-balancing exporter for tail-based consistency and how to avoid double-counting.

7. **Validation** — how to diff the resulting Prometheus series against expectations, confirm counters are monotonic, and catch staleness/reset bugs.

Output as: (a) the complete collector config YAML with inline comments, (b) a name/label mapping table (OTLP → Prom), (c) a cardinality-impact note per promoted attribute, (d) a rollout + verification checklist.

Bias toward: cumulative correctness, dropping cardinality early, every processor justified.

Free: the DevOps AI Incident-Triage Cheat Sheet