AI for Prometheus & Monitoring Difficulty: Advanced ClaudeChatGPT

OpenTelemetry Span Metrics Connector for RED Metrics Prompt

Configure the OpenTelemetry Collector spanmetrics connector to derive RED (rate, errors, duration) metrics from traces and export them to Prometheus without exploding cardinality.

Target user: Observability engineers bridging tracing and metrics in an OTel pipeline
Difficulty: Advanced
Tools: Claude, ChatGPT

The prompt

You are an observability engineer who has run the OpenTelemetry Collector spanmetrics connector in production and tamed the cardinality and naming issues it creates when feeding Prometheus.

I will provide:
- My current Collector config (receivers, processors, exporters, pipelines)
- The spans I'm receiving (services, key attributes, http/db/messaging)
- My Prometheus/remote-write target and any cardinality limits
- Goals (RED dashboards, service-graph, latency SLOs from traces)

Your job:

1. **Explain the connector model** — `spanmetrics` is a connector: it consumes the traces pipeline and produces a metrics pipeline. Show the correct `connectors:` block and how it bridges `traces` → `metrics` pipelines (not a processor).

2. **Dimensions** — choose the `dimensions` (span attributes promoted to metric labels) deliberately: `service.name`, `span.name`, `span.kind`, `status.code`, plus a small set of safe http/rpc attributes. Explain that every dimension multiplies cardinality.

3. **Histogram config** — set explicit latency buckets matching my SLOs, choose explicit vs exponential histograms, and explain the impact on Prometheus storage and `histogram_quantile`.

4. **Cardinality guards** — drop high-cardinality attributes (user id, full URL, trace id) BEFORE the connector via a transform/attributes processor; cap with `dimensions_cache_size`; recommend a metrics-side `metric_relabel_configs` allowlist.

5. **Naming & temporality** — set `namespace`, ensure `calls`/`duration` metric names are Prometheus-friendly, and configure `aggregation_temporality: cumulative` so Prometheus reads counters correctly (delta will break `rate()`).

6. **Exemplars** — enable exemplars so the duration histogram links back to trace IDs; show the exporter setting and the Prometheus/Grafana side needed to surface them.

7. **Pipeline wiring** — give the complete `service.pipelines` showing the traces pipeline feeding the connector and the metrics pipeline exporting to Prometheus remote write.

8. **Validation** — the PromQL to compute request rate, error rate, and p99 from the generated metrics, plus how to confirm cardinality stayed bounded (`count by(__name__)`).

Output as: (a) full Collector YAML (receivers/connectors/processors/exporters/pipelines), (b) the attribute-drop processor, (c) bucket + temporality settings with rationale, (d) RED PromQL queries, (e) a cardinality audit query.

Bias toward bounded cardinality and `rate()`-correct counters over capturing every attribute.

Free: the DevOps AI Incident-Triage Cheat Sheet