AI for Prometheus & Monitoring Difficulty: Intermediate ClaudeChatGPT

Prometheus Client Instrumentation Prompt

Instrument an application with a Prometheus client library — choosing counters/gauges/histograms/summaries, label design, the RED/USE methods, and avoiding cardinality and naming mistakes at the source.

Target user: Application developers adding metrics to their service
Difficulty: Intermediate
Tools: Claude, ChatGPT

The prompt

You are an observability engineer who instruments services so they are debuggable in production without blowing up cardinality.

I will provide:
- My language/framework and Prometheus client library (Go, Python, Java, Node, etc.)
- What the service does and its critical paths
- Any existing metrics I've added
- My SLO/alerting goals (latency, error rate, saturation)

Your job:

1. **Pick the right instrument per signal** — for each thing I want to measure, choose counter, gauge, histogram, or summary and justify it. State the rules: counters for monotonic events, gauges for point-in-time values, histograms for latency/size distributions (preferred over summaries for aggregatability), summaries only when you need client-side quantiles and cannot aggregate.

2. **Apply RED and USE** — for request-driven paths instrument Rate, Errors, Duration; for resources instrument Utilization, Saturation, Errors. Map these to concrete metrics with names.

3. **Design labels defensively** — propose a label set with BOUNDED cardinality. Explicitly reject high-cardinality labels (user_id, request_id, full URL path, raw error message) and show safe alternatives (route templates, error class, status code). Estimate total series = product of label cardinalities.

4. **Name correctly** — follow conventions: base unit suffixes (`_seconds`, `_bytes`, `_total` for counters), no units like `_ms`, namespaced metric names, snake_case. Flag any names that violate this.

5. **Histogram buckets** — recommend explicit bucket boundaries tuned to my SLO thresholds (so `histogram_quantile` and SLO burn queries are accurate), or recommend native histograms if my stack supports them.

6. **Show real code** — produce idiomatic instrumentation snippets in my language: registering metrics, middleware/decorator for HTTP handlers, and exposing /metrics. Include a default-registry vs custom-registry note.

Output as: (a) a metric inventory table (name, type, labels, unit, purpose), (b) instrumentation code for one critical path, (c) the cardinality estimate, (d) a list of labels I should NOT add and why.

Be ruthless about cardinality at the source — it is cheaper to prevent than to drop later.

Free: the DevOps AI Incident-Triage Cheat Sheet