Prometheus Federation vs Remote-Write: Which to Use and When

The first time I tried to federate a fleet of Prometheis, I did exactly the wrong thing: I pointed a central server at every child’s /federate endpoint and grabbed {__name__=~".+"}. It worked for about a week. Then the central instance fell over under a tidal wave of per-pod, per-endpoint, high-cardinality time series it was never meant to hold. The lesson stuck with me. Federation and remote-write both move metrics from many places to one place, but they are not interchangeable, and choosing the wrong one is expensive in both compute and on-call sanity.

This post is the guide I wish I’d had: when hierarchical federation is the right tool, when remote-write is, and how to write the match[] selectors that keep federation from eating your monitoring stack.

The two patterns, in one breath

Hierarchical federation is a pull. A central (“global”) Prometheus scrapes the special /federate HTTP endpoint on each child Prometheus, asking for a specific, named set of series. The child evaluates the match[] selectors and returns a snapshot of just those series. You’re meant to pull aggregates, typically the outputs of recording rules.

Remote-write is a push. Each Prometheus streams its samples, as they’re scraped, to a remote endpoint that speaks the remote-write protocol: Thanos Receive, Mimir, Cortex, VictoriaMetrics, or a managed vendor. The central store holds the raw, full-resolution data.

The shorthand I use: federation moves summaries, remote-write moves everything. Get that distinction wrong and you’ve built an anti-pattern.

When federation is the right tool

Federation shines when you want a global view of pre-aggregated data without shipping the firehose. Concrete cases:

Cross-team or cross-cluster aggregates. Each team’s Prometheus computes job:request_errors:rate5m via recording rules. A global Prometheus federates only those job-level series to build a company-wide dashboard.
Global view of recording-rule outputs. You’ve already done the expensive aggregation downstream; federation just collects the answers.
A small, bounded set of series. Dozens or low hundreds per child, not tens of thousands.

Here’s a real, conservative scrape_config on the global Prometheus that federates aggregated series from two datacenter children:

scrape_configs:
  - job_name: 'federate'
    scrape_interval: 30s
    honor_labels: true
    metrics_path: '/federate'
    params:
      'match[]':
        - '{__name__=~"job:.*"}'
        - '{__name__=~"cluster:.*"}'
        - 'up{job="kubernetes-apiservers"}'
    static_configs:
      - targets:
          - 'prometheus-dc1.internal:9090'
          - 'prometheus-dc2.internal:9090'
        labels:
          federation_source: 'datacenter'

Two details matter enormously here.

honor_labels: true tells the global server not to overwrite labels the child already set. Without it, the instance and job labels from the federated series get clobbered with the child’s address, and your aggregates lose their meaning. For federation this is almost always what you want.

The match[] params are the contract. Each entry is an instant-vector selector. The child returns the union of every series matching any selector. By naming a recording-rule prefix convention (job:, cluster:) I’m asking only for aggregates I deliberately precomputed.

Pro Tip: Adopt a recording-rule naming convention like level:metric:operation (e.g. job:http_requests:rate5m) and federate by prefix. Your match[] selectors stay short, your intent is self-documenting, and a stray raw metric can never silently sneak through a job:.* filter.

When federation is an anti-pattern

Do not federate raw, high-cardinality series. This is the mistake I opened with. A selector like this is a trap:

# DON'T DO THIS on a busy child
params:
  'match[]':
    - '{__name__=~".+"}'          # every series, the whole firehose
    - '{namespace="production"}'  # still thousands of raw series

/federate is a synchronous scrape. Pulling tens of thousands of series means a slow, memory-hungry HTTP response on the child and an ingestion spike on the parent, every interval. You also inherit a staleness and gap problem: federation samples on the parent’s scrape interval, so anything finer-grained than that interval is lost, and a single missed scrape leaves a hole no backfill will repair. Federation was never designed to be a durable copy of all your data.

If what you actually need is the full raw dataset centralized for long-term storage, querying, or HA dedup, that’s a remote-write job, not a federation job.

When remote-write is the right tool

Reach for remote-write when you need:

Durable, long-term storage of full-resolution data beyond a single Prometheus’s local retention.
Centralized querying over raw series from many sources (Thanos, Mimir, Cortex, VictoriaMetrics).
High cardinality that no sane match[] could ever filter down.
Global aggregation across instances without the per-scrape staleness federation imposes.

A minimal remote-write block on each child looks like this:

remote_write:
  - url: 'https://mimir.internal/api/v1/push'
    queue_config:
      capacity: 10000
      max_shards: 50
      max_samples_per_send: 2000
    write_relabel_configs:
      - source_labels: [__name__]
        regex: 'go_gc_.*'
        action: drop

Even here, be selective: write_relabel_configs lets you drop noisy internal metrics before they’re shipped, saving bandwidth and central-store cost. Tuning the queue_config for reliable delivery under backpressure is its own subject. I go deep on it in Tuning Prometheus remote-write for reliable shipping.

A decision shortcut

Ask one question: am I moving an aggregate or the raw data?

Aggregate, small, bounded, recording-rule output, global dashboard → federation.
Raw, high-cardinality, durable storage, centralized query → remote-write.

They also compose. A common mature setup uses remote-write to a central store for the raw data and federation for a tiny set of cross-cluster KPIs that drive a top-level executive dashboard. Different jobs, different tools.

Let AI draft the selectors, then review them

match[] selectors and recording-rule names are exactly the kind of fiddly, convention-bound YAML where an AI assistant earns its keep. Treat it like a fast junior engineer: great at producing a first draft quickly, not yet trusted to ship unreviewed. Ask a tool like Claude or Cursor to “write match[] selectors to federate only my job: and cluster: recording rules, excluding raw series,” and you’ll get a sensible starting point in seconds.

Then review it like the production change it is:

# AI-suggested selector — eyeball it before it ships
{__name__=~"(job|cluster):.*"}

Things I check by hand every time:

Does the regex anchor the way I expect? job:.* matches anywhere unless I think about anchoring; a .+ left in by accident pulls the whole firehose.
Are honor_labels and the scrape interval set so aggregates keep their identity and resolution?
Will this selector still be bounded after the next team adds 200 new raw metrics?

The point isn’t that AI gets it wrong, it’s that federation failures are silent until they’re catastrophic, so the output must be explainable and reviewable before it lands. If you’re also generating the alerting rules that sit on top of these federated aggregates, our free Alert Rule Generator produces reviewable PromQL alert definitions you can read line by line before committing. For reusable selector and rule prompts, the prompt packs collection has Prometheus-specific starters.

Wiring it to recording rules

Federation is only as good as what you federate. If your children are returning raw series, the fix usually isn’t a cleverer match[], it’s adding the recording rules that produce the aggregates in the first place:

groups:
  - name: federation_aggregates
    rules:
      - record: job:http_requests:rate5m
        expr: sum by (job) (rate(http_requests_total[5m]))
      - record: cluster:cpu_usage:ratio
        expr: |
          sum by (cluster) (rate(container_cpu_usage_seconds_total[5m]))
          /
          sum by (cluster) (machine_cpu_cores)

Now the child holds tidy, low-cardinality series your {__name__=~"(job|cluster):.*"} selector can grab cheaply. If your recording rules aren’t pulling their weight yet, Prometheus recording rules that make queries fast covers how to design them. And for the HA-with-federation topology, see Prometheus high availability and federation done right.

Conclusion

Federation and remote-write answer different questions. Federation pulls small, pre-aggregated summaries for a global view; remote-write pushes the raw firehose to a durable central store. The anti-pattern, the one that took down my central server, is using federation to copy everything. Name your recording rules by convention, keep match[] selectors bounded and anchored, and let AI draft the YAML as long as you review it before it ships. Browse more in the Prometheus monitoring category when you’re ready to go deeper.