AI Instrumentation Review: Catching Label Explosions at Code

Every cardinality incident I’ve cleaned up had the same origin story: someone added a metric label that seemed reasonable in the pull request, and three weeks later Prometheus was eating 40GB of RAM and falling over. The fix at the TSDB level is always painful — drop rules, relabeling, sometimes data loss. The cheap fix is catching it at code review time, before the bad label ever ships. The problem is that a high-cardinality label looks completely innocent in a diff: user_id, request_id, email — all plausible, all catastrophic at scale. This is a perfect job for AI review, because spotting cardinality risk in instrumentation is pattern recognition, and the model has seen a million metric definitions. Here’s how I make it a reliable guard.

Why cardinality is a code-time problem

Prometheus cardinality is the number of unique label-value combinations, and it grows multiplicatively. A metric with a user_id label and ten thousand users is ten thousand series; add a region label and you’ve multiplied again. None of this is visible in the application diff that introduces it — the line requests.labels(user_id=uid).inc() looks harmless. By the time it’s visible in Prometheus, it’s already a production problem. So the only cheap place to catch it is in the review of the instrumentation code itself.

The label-bounding question

The core review question for any new label is: how many distinct values can this take? A label is safe if its value set is small and bounded — HTTP method, status code class, region name. It’s a bomb if its value set is unbounded or grows with traffic — user IDs, request IDs, full URLs, error messages. I ask the model to classify every new label this way:

# Dangerous: unbounded, grows with every user
REQUESTS = Counter("api_requests_total", "...", ["user_id", "endpoint"])

# Safe: bounded value sets
REQUESTS = Counter("api_requests_total", "...", ["method", "status_class", "route"])

The model flags user_id instantly as unbounded — that’s the fast-junior-engineer strength, instant recall of the convention. The judgment call it can’t make is whether route in your app is a bounded set of registered routes or the raw request path, which is unbounded. That distinction is the whole ballgame, and only you know it.

Watch for the path-vs-route confusion

The single most common cardinality bomb is labeling with the raw request path instead of the route template. /users/12345/orders/98765 is a unique series; /users/{id}/orders/{id} is one. The diff often hides which one is being used behind a framework call. I specifically ask the model to determine whether the label captures the templated route or the literal path:

# Bomb: literal path, one series per unique URL
REQUEST_TIME.labels(path=request.path).observe(elapsed)

# Safe: templated route, bounded by number of routes
REQUEST_TIME.labels(route=request.url_rule.rule).observe(elapsed)

If the model can’t tell from the diff which one it is, that’s a signal to pull in more context, not to wave it through. An unbounded path label is the most common way a healthy service quietly DOSes its own Prometheus.

Pro Tip: For every new label, ask the AI to estimate the series count by multiplying the cardinality of each label dimension. A metric with 5 routes × 4 methods × 6 status classes is 120 series — fine. The moment one factor is “unbounded,” the product is unbounded, and the estimate makes that obvious in a way the diff never does.

Free-text labels are always wrong

Error messages, exception strings, and user-supplied input as label values are unbounded by definition, and they appear constantly because they feel useful for debugging. They belong in logs or traces, never in metric labels. I have the model flag any label whose value comes from an exception, a user input, or a string format:

# Wrong: error message as label = unbounded cardinality
ERRORS.labels(message=str(exc)).inc()

# Right: bounded error type, details go to the log/trace
ERRORS.labels(error_type=type(exc).__name__).inc()
log.error("request failed", exc_info=exc)

This is a hard rule, and it’s one the model enforces reliably because the pattern is unambiguous. The nuance — “is error_type itself bounded?” — is where I still apply judgment.

Histograms multiply, so count the buckets

A subtle one: a histogram with N buckets and a label of cardinality C produces N×C series, plus the _sum and _count. Add a couple of labels to a histogram and the series count explodes faster than for a counter. I make the model account for bucket count explicitly when reviewing histogram instrumentation, because the multiplication is easy to forget and brutal at scale. A histogram with 12 buckets across 50 routes is already 600+ series before any other dimension.

Wire it into the review pipeline

Because instrumentation lives in application code, this review belongs in the normal code review flow, not in a separate monitoring process. Our code review dashboard runs the cardinality check as a standing instruction on any diff that touches metric definitions, so a bad label gets flagged before a human even opens the PR. The human then confirms the model’s bounded/unbounded classifications against actual app knowledge. I keep the cardinality-review prompt in the prompt workspace so it’s consistent across reviewers.

The human owns the bounds

The pattern that makes this work: the AI is excellent at flagging candidate cardinality risks because it recognizes the shapes, but it cannot know the true cardinality of your specific labels — whether route is templated, whether tenant_id has ten values or ten million. So the model proposes the risk; a human who knows the system confirms or clears it. The output that ships must be explainable — “this label is bounded to our 30 registered routes” — not just “the AI said it was fine.” Catch it here and you never need the painful TSDB-level fixes documented across the monitoring guides. I draft and review instrumentation in Cursor and GitHub Copilot with the cardinality check always in play.

Conclusion

Cardinality bombs are born in application code and detonate in Prometheus weeks later, which makes code-time review the only cheap place to defuse them. AI is a strong cardinality reviewer because it recognizes unbounded-label patterns instantly — but it’s your knowledge of the system that turns its flags into decisions. Bound every label, kill free-text values, count histogram buckets, and never ship a label you can’t explain as bounded. More cardinality patterns are in the monitoring guides, and the review prompts are in the prompts library.

AI Instrumentation Review: Catching Label Explosions at Code Time