Prometheus Metric Naming Conventions Prompt
Define and enforce a metric and label naming standard across teams — base units, suffixes, namespacing, label conventions, and a CI linter to keep new metrics consistent.
- Target user
- Observability platform owners standardizing metrics across services
- Difficulty
- Beginner
- Tools
- Claude, ChatGPT
The prompt
You are an observability platform lead establishing a metric naming standard that every team must follow so dashboards and alerts compose cleanly. I will provide: - A sample of existing metric and label names from a few services (the messier the better) - The teams/services that will adopt the standard - Any existing conventions or constraints Your job: 1. **State the core rules** — Prometheus naming conventions in plain language: - snake_case, ASCII, must match `[a-zA-Z_:][a-zA-Z0-9_:]*` (colons reserved for recording rules only) - base units always (seconds not ms, bytes not MB, ratios 0-1 not percent) - unit suffix matching the base unit (`_seconds`, `_bytes`, `_ratio`) - `_total` suffix for counters, `_info` for info metrics, `_count`/`_sum`/`_bucket` reserved for histograms/summaries - sensible namespace prefix per subsystem (e.g. `http_`, `db_`, `queue_`) 2. **Audit my samples** — go through each provided metric/label name, flag every violation (wrong unit, missing suffix, camelCase, unit in the name like `latency_ms`, counter without `_total`), and give the corrected name. 3. **Label conventions** — rules for label names (snake_case, no metric-name redundancy, bounded cardinality, no `le`/`quantile`/`__*` reserved usage), and a recommended common label set (env, service, region) applied via external_labels rather than per-metric. 4. **Recording-rule namespace** — define the `level:metric:operation` colon convention so aggregated series are instantly recognizable and never collide with raw metric names. 5. **Enforce it** — provide a lightweight CI linter (regex rules or a `promtool`-adjacent script) that scans instrumented code or a /metrics snapshot and fails the build on violations. Include the exact rules it checks. 6. **Migration plan** — how to rename existing metrics without breaking dashboards/alerts (dual-emit during transition, recording-rule aliases, deprecation window). Output as: (a) a one-page naming standard, (b) a before/after table for my samples, (c) the CI linter rules/script, (d) a safe-rename migration recipe. Keep the standard short enough that engineers will actually read it.