Cutting Cardinality at Ingest With vmagent Stream

Cardinality is the cost center of every metrics stack, and the usual fixes are blunt: drop the metric, drop the label, or pay for more storage. VictoriaMetrics offers a sharper instrument — stream aggregation in vmagent, which collapses many high-cardinality input series into a few low-cardinality aggregates at ingest, statsd-style, before anything hits the TSDB. Instead of storing a per-request-id latency series and querying it later, you aggregate it into a per-route histogram on the way in. Done well, it can shrink stored series dramatically while preserving exactly the dimensions your dashboards and alerts actually use. Done carelessly, it silently produces wrong numbers or breaks an alert nobody notices until an incident.

What stream aggregation does

Stream aggregation runs in vmagent, between scrape and remote-write. You define rules that match input series, choose an output function, pick the labels to keep, and emit an aggregated series on an interval. A typical rule looks like this:

- match: 'http_request_duration_seconds_bucket'
  interval: 30s
  by: [route, le, status]      # keep what queries group by
  outputs: [total]             # counters -> total, NOT sum

The match selects the noisy input, by keeps the dimensions you care about (dropping the high-cardinality ones not listed), and outputs chooses the aggregation. The original per-instance, per-id series collapse into one series per route/le/status combination.

The counter trap: total, not sum

The single most important detail: for counters, use total, not sum. total is reset-aware — it correctly handles vmagent restarts and counter resets across the input series. sum naively adds current values, which produces wrong results whenever a counter resets or vmagent restarts mid-window. This is a quiet bug: the aggregate looks fine in a demo and drifts in production. Map the function to the metric type deliberately:

# counters
outputs: [total]              # or increase for a delta per interval
# gauges
outputs: [avg, min, max, last]
# histograms
outputs: [total]             # on the _bucket series, keeping le

Keep the labels your queries need

Every label you don’t list in by is gone from the stored aggregate forever. If an alert filters on a label you aggregated away, that alert can never fire again — and it fails open, silently. So the design step that matters most isn’t choosing the output function, it’s reconciling the by labels against every consumer:

# If an alert does this...
sum by (route) (rate(http_request_duration_seconds_count{status=~"5.."}[5m]))
# ...then your aggregation MUST keep route AND status, or the alert breaks.

List the dashboards, alerts, and recording rules that touch the metric, extract every label they group by or filter on, and make sure all of them survive the aggregation. Anything no consumer references is fair game to collapse.

Designing rules with AI, then shadow-testing

Reconciling labels by hand across dozens of queries is exactly the kind of careful, mechanical task an assistant does well — provided you make it cross-check rather than guess:

I want to stream-aggregate http_request_duration_seconds (high cardinality from a per-pod label). Here are the alerts and dashboards that use it. Tell me which output function to use, which labels I must keep, and write the vmagent stream aggregation rule. Flag anything that would break a consumer.

Use total (it’s a counter histogram — sum would corrupt rates on restart). Your alerts group by route and filter on status, and the latency dashboard groups by le, so keep [route, status, le] and drop pod and instance. Run this in shadow first, keeping the raw input, and confirm the aggregated rate() matches the raw rate() before you drop the originals.

The model drafts the rule and the label analysis; you verify by running it in shadow mode. The pattern of using AI as a careful reviewer on cardinality work runs through the rest of the Prometheus and monitoring guides.

Roll it out as an irreversible change

Dropping the raw input is irreversible — once vmagent stops storing the per-series data, it’s gone with no backfill. So treat the rollout like any one-way door:

Deploy the aggregation rule but keep the raw input series.
Run both for a window long enough to cover your traffic patterns.
Diff: confirm rate() and quantiles over the aggregate match the raw series within tolerance.
Confirm every alert and dashboard still works against the aggregate.
Only then drop the raw input, and measure the series-count win.

# measure the win
count({__name__="http_request_duration_seconds_bucket"})    # before vs after

Skipping the shadow phase is how teams discover, weeks later, that their p99 dashboard was wrong the whole time and the only copy of the raw data is gone.

The bottom line

vmagent stream aggregation is the precision tool for cardinality: it collapses noise into exactly the aggregates you query, often saving far more series than blunt drops. The discipline is non-negotiable — use total for counters, keep every label a consumer touches, and shadow-test before dropping raw because the change can’t be undone. For a structured way to design the rules against your real consumers, the stream aggregation prompt and the broader cardinality control prompt both start from your queries rather than guessing what’s safe to drop.

Cutting Cardinality at Ingest With vmagent Stream Aggregation