Catching PromQL Unit Mistakes With AI Before They Mislead

The worst PromQL bug I ever shipped wasn’t a syntax error — those fail loudly. It was a unit mistake: a dashboard panel that divided bytes by megabytes and confidently displayed a number that was off by a million, which nobody questioned because it had units printed next to it. PromQL has no type system and no concept of units. Every metric is just a float, and the language will happily let you add a duration in seconds to a count of requests and return a meaningless number with total confidence. AI is surprisingly good at catching these, because reasoning about units from metric names is exactly the kind of pattern-matching it excels at. But it’s also capable of introducing them, so the relationship has to be adversarial.

Why unit bugs are the silent killer of PromQL

A syntax error throws. A unit error returns a plausible-looking number that’s wrong by a factor of 1000 or 8. There’s no exception, no red squiggle, just a graph that looks fine and a threshold that’s calibrated to nonsense. The Prometheus naming conventions exist precisely to make units visible — _bytes, _seconds, _total, _ratio — but only if you read them, and under deadline pressure people stop reading them. That’s where an AI reviewer that never gets tired of reading suffixes becomes genuinely useful.

Have the AI annotate units before reviewing logic

My first move with any non-trivial query is to ask the model to annotate the unit of every term and the final result, before it says anything about whether the query is correct. This forces the unit reasoning into the open:

sum(rate(node_network_receive_bytes_total[5m])) * 8 / 1e6

The model should report: rate(..._bytes_total[5m]) is bytes per second, * 8 converts to bits per second, / 1e6 converts to megabits per second — result is Mbps. If the annotation matches my intent, great. If the query had dropped the * 8, the model’s annotation would say “MB per second,” and I’d immediately see the bits-versus-bytes confusion that produces network graphs off by a factor of eight.

The classic bytes-vs-bits dashboard bug

Network metrics are reported in bytes; humans think in bits; the conversion is a factor of eight that’s trivial to forget. I have the model specifically check any network or throughput expression for the bits/bytes boundary:

# Wrong: labels the panel "Mbps" but math yields MB/s — 8x too low
sum(rate(node_network_transmit_bytes_total[5m])) / 1e6

# Right: explicit conversion, result genuinely in Mbps
sum(rate(node_network_transmit_bytes_total[5m])) * 8 / 1e6

This is the fast-junior-engineer pattern in its purest form: the model has seen the correct conversion thousands of times and flags the missing * 8 instantly, faster than a human re-deriving it. But I never trust the flag blindly — I check the metric suffix myself to confirm it really is _bytes.

Seconds, milliseconds, and the histogram trap

Latency is where units go to die. Prometheus histograms report buckets in seconds by convention, but plenty of exporters report milliseconds, and the bucket boundaries hide the truth. A histogram_quantile that returns 0.45 could be 450ms or 0.45ms depending on the metric’s native unit, and a threshold of > 0.5 means wildly different things:

# Is this 500ms or 0.5ms? Depends entirely on the metric's unit.
histogram_quantile(0.95,
  sum by (le) (rate(http_request_duration_seconds_bucket[5m]))) > 0.5

I ask the model to state the native unit of the histogram based on its name and bucket boundaries, then confirm the threshold is in those same units. The _seconds suffix is the contract; if the exporter violates it, the model can’t know, so I verify against a real sample.

Pro Tip: For any threshold comparison, ask the AI to convert both sides to plain English with units — “this fires when 95th-percentile latency exceeds 500 milliseconds.” If it can’t produce a clean sentence with consistent units on both sides, there’s a unit mismatch hiding in the expression. The sentence test catches what the math hides.

Ratios masquerading as percentages

A _ratio metric is between 0 and 1; a percentage is between 0 and 100. Mixing them produces thresholds off by 100x, and the AI both catches and causes this one. A common bug is comparing a ratio metric against 90:

# Wrong: ratio is 0-1, so this never fires
node_memory_ratio_used > 90

# Right: compare ratio against a ratio
node_memory_ratio_used > 0.90

When I ask the model to draft a memory alert, I explicitly tell it whether the source metric is a ratio or a percentage, because if I don’t, it guesses — and a guess about the value range of your metric is exactly the kind of thing that looks right and silently never fires.

Build unit-checking into review, not just authoring

Because these queries live in dashboards-as-code and alert rules in Git, I fold the unit annotation step into review rather than authoring alone. Our code review dashboard is where queries get a second pass, and “annotate every unit” is a standing review instruction. The discipline that matters is that the unit reasoning has to be written down and checkable — an output that can’t be explained in consistent units doesn’t ship. I keep my unit-annotation prompt in the prompt workspace so any query gets the same treatment.

The human still owns the metric’s true unit

Here’s the boundary the AI can’t cross: it reasons about units from names, but only you know what your exporter actually reports. A metric named latency_seconds that secretly emits milliseconds will fool the model exactly as it fools a human reading the name. So the model’s unit annotation is a hypothesis I verify against a real sample in the expression browser. The model is the tireless suffix-reader and conversion-checker; the engineer is the one who confirms the metric’s real-world unit. Draft the alerts on top of verified-unit queries with the Alert Rule Generator once the math is sound.

Conclusion

PromQL’s lack of a type system makes unit bugs silent and expensive, and AI is a genuinely strong unit reviewer because suffix-reasoning is its strength — as long as you keep the relationship adversarial. Make it annotate every unit before judging logic, check the bits/bytes and seconds/milliseconds boundaries explicitly, and verify the metric’s true unit against a real sample. More query-correctness patterns are in the monitoring guides, and reusable unit-review prompts are in the prompts library.