Skip to content
DevOps AI ToolKit
Newsletter
All prompts
AI for Prometheus & Monitoring Difficulty: Advanced ClaudeChatGPTCursor

VictoriaMetrics vmagent Stream Aggregation Rules Design Prompt

Design vmagent stream aggregation rules that pre-aggregate high-cardinality metrics at ingest, cutting stored series while preserving the dimensions your queries need.

Target user
Engineers running VictoriaMetrics/vmagent who need ingest-time aggregation
Difficulty
Advanced
Tools
Claude, ChatGPT, Cursor

The prompt

You are a VictoriaMetrics operator who uses vmagent stream aggregation to collapse high-cardinality input series into low-cardinality aggregates at ingest time, statsd-style, without losing the dimensions queries actually use.

I will provide:
- The high-cardinality metric(s) and the labels causing the blowup (e.g. request_duration with a per-request id or a per-pod label): [METRICS + LABELS]
- The queries/dashboards/alerts that consume these metrics and the labels they group by: [CONSUMERS]
- Whether the raw per-series data has any value (debugging, billing) or is purely aggregate: [RAW VALUE]
- My setup (vmagent in front of single-node VM or cluster, remote_write topology): [TOPOLOGY]

Your job:

1. **Pick the right output function** — map the metric type to the aggregation: counters -> total/increase, gauges -> avg/min/max/last, histograms -> histogram_bucket or quantiles. Explain why total (not sum) is correct for counters across vmagent restarts.

2. **Choose the by/without labels** — keep exactly the labels the consumers group by; drop the high-cardinality ones. Cross-check against the consumer queries so no alert loses a label it filters on.

3. **Decide keep_metric_names and the interval** — choose the aggregation interval (matching or coarser than scrape) and whether to keep original metric names or suffix them, and explain the dedup implications.

4. **Decide whether to drop the raw series** — recommend whether to keep raw input alongside the aggregate (via drop_input_labels / keeping both) based on the stated value of raw data. Default to keeping raw briefly during rollout, then dropping.

5. **Write the rules** — produce the stream aggregation config (YAML) with comments, and the formula to measure series-before vs series-after.

Output as: (a) a table mapping each metric to its output function and by-labels with the consumer-safety column, (b) the runnable stream aggregation config, (c) the rollout plan (run in shadow keeping raw, validate aggregates match, then drop raw), (d) the before/after series-count query.

Never drop a label a consumer query groups by or filters on. Validate that aggregated counters use total (not sum) so restarts don't corrupt rates.

Why this prompt works

Stream aggregation is VictoriaMetrics’ answer to cardinality blowups, but it is genuinely dangerous if you treat it as a config toggle rather than a data transformation. It collapses many input series into a few aggregates at ingest, which is exactly what you want when a metric carries an unbounded label — except the choice of aggregation function is not interchangeable. For counters, using sum instead of total silently produces wrong values across vmagent restarts and counter resets, because total is the function designed to be reset-aware. This prompt front-loads that distinction so you don’t ship an aggregate that looks right in a demo and drifts in production.

The harder part is choosing which labels to keep. Every label you drop is gone from the stored aggregate forever, so if an alert filters on a label you aggregated away, that alert quietly breaks. By forcing the model to cross-check the by/without label set against the actual consumer queries — the dashboards and alerts you supply — the prompt turns label selection from a guess into a reconciliation. Anything a consumer groups by or filters on must survive; anything else is fair game for collapse.

The rollout discipline is what makes this safe to actually run. Stream aggregation that drops raw input is irreversible for that data, so the prompt defaults to a shadow phase: keep the raw series, let the aggregates run alongside, confirm they match expectations, and only then drop the raw. Paired with a before/after series-count query, this is the AI-drafts, human-verifies pattern applied to an irreversible operation — you get a concrete config, but you prove the aggregates are correct against real data before you throw away the only copy of the originals.

Related prompts

Newsletter

Free: the DevOps AI Incident-Triage Cheat Sheet

Subscribe and we’ll send you the one-page cheat sheet — plus weekly AI prompts, automation ideas, and tool reviews for infrastructure engineers. One email a week. No spam, unsubscribe anytime.

  • AI Incident-Triage Cheat Sheet (PDF)
  • Access to 2,104 DevOps AI prompts
  • One practical workflow email per week