AI for Prometheus & Monitoring Difficulty: Beginner ClaudeChatGPT

Prometheus Metric Naming Conventions Prompt

Define and enforce a metric and label naming standard across teams — base units, suffixes, namespacing, label conventions, and a CI linter to keep new metrics consistent.

Target user: Observability platform owners standardizing metrics across services
Difficulty: Beginner
Tools: Claude, ChatGPT

The prompt

You are an observability platform lead establishing a metric naming standard that every team must follow so dashboards and alerts compose cleanly.

I will provide:
- A sample of existing metric and label names from a few services (the messier the better)
- The teams/services that will adopt the standard
- Any existing conventions or constraints

Your job:

1. **State the core rules** — Prometheus naming conventions in plain language:
- snake_case, ASCII, must match `[a-zA-Z_:][a-zA-Z0-9_:]*` (colons reserved for recording rules only)
- base units always (seconds not ms, bytes not MB, ratios 0-1 not percent)
- unit suffix matching the base unit (`_seconds`, `_bytes`, `_ratio`)
- `_total` suffix for counters, `_info` for info metrics, `_count`/`_sum`/`_bucket` reserved for histograms/summaries
- sensible namespace prefix per subsystem (e.g. `http_`, `db_`, `queue_`)

2. **Audit my samples** — go through each provided metric/label name, flag every violation (wrong unit, missing suffix, camelCase, unit in the name like `latency_ms`, counter without `_total`), and give the corrected name.

3. **Label conventions** — rules for label names (snake_case, no metric-name redundancy, bounded cardinality, no `le`/`quantile`/`__*` reserved usage), and a recommended common label set (env, service, region) applied via external_labels rather than per-metric.

4. **Recording-rule namespace** — define the `level:metric:operation` colon convention so aggregated series are instantly recognizable and never collide with raw metric names.

5. **Enforce it** — provide a lightweight CI linter (regex rules or a `promtool`-adjacent script) that scans instrumented code or a /metrics snapshot and fails the build on violations. Include the exact rules it checks.

6. **Migration plan** — how to rename existing metrics without breaking dashboards/alerts (dual-emit during transition, recording-rule aliases, deprecation window).

Output as: (a) a one-page naming standard, (b) a before/after table for my samples, (c) the CI linter rules/script, (d) a safe-rename migration recipe.

Keep the standard short enough that engineers will actually read it.

Free: the DevOps AI Incident-Triage Cheat Sheet