Scaling Prometheus Scraping: Functional Sharding, Hashmod

The first time a single Prometheus instance buckled under our scrape load, I did what most people do: I gave it more RAM. Then more CPU. Then a bigger disk. Six months later that same box was a 64 GB monster that took twenty minutes to replay its WAL after a restart, and I finally admitted the truth — I wasn’t running a monitoring system, I was running a single point of failure with a generous memory allocation. Vertical scaling buys you time. It does not buy you resilience. At some point you have to spread the scraping out, and Prometheus gives you two clean ways to do that plus a third deployment mode that changes the math entirely.

This post is about scaling scraping horizontally: functional sharding by job, hashmod-based scrape sharding, and Prometheus Agent Mode. I’ll show real config you can paste and adapt, and I’ll be honest about when each one is the right call.

When to Shard vs. When to Scale Up

Before reaching for sharding, be honest about whether you actually need it. A single modern Prometheus instance comfortably handles millions of active series and tens of thousands of targets. If you’re at a few hundred thousand series, your problem is probably cardinality, not horizontal scale — fix the labels first.

You should start sharding when you hit one of these walls:

Memory pressure from active series. Ingestion RAM scales roughly with the number of active series. When the head block won’t fit, no amount of disk helps.
Scrape duration creeping past the interval. If a single instance can’t finish scraping all targets within scrape_interval, you get gaps. That’s a CPU and network problem sharding solves directly.
Blast radius. One instance scraping everything means one restart blinds your entire fleet.

Vertical scaling is simpler and almost always the right first move. Shard when a single replica can no longer do the work in the time you have, or when losing it costs too much.

Pro Tip: Measure prometheus_target_interval_length_seconds and scrape_duration_seconds before you shard. If scrapes finish in 2s on a 60s interval, your bottleneck is series cardinality, and sharding will just multiply your problem across more boxes.

Functional Sharding: One Prometheus Per Job Family

The simplest and most readable form of sharding is functional: you split by what is being scraped. One Prometheus scrapes Kubernetes infrastructure, another scrapes application services, a third handles the database fleet. Each instance gets its own config with its own subset of jobs.

# prometheus-infra.yml — scrapes node_exporter, kube-state-metrics, cadvisor
scrape_configs:
  - job_name: "node-exporter"
    kubernetes_sd_configs:
      - role: node
  - job_name: "kube-state-metrics"
    static_configs:
      - targets: ["kube-state-metrics:8080"]

# prometheus-apps.yml — scrapes application /metrics endpoints
scrape_configs:
  - job_name: "checkout-service"
    kubernetes_sd_configs:
      - role: pod
  - job_name: "payments-service"
    kubernetes_sd_configs:
      - role: pod

The advantages are obvious: each shard is independently reasoned about, owned by the team responsible for those targets, and trivially understandable. The downside is just as obvious — load is uneven. If your application tier has 90% of the targets, splitting infra into its own box doesn’t help the box that’s actually on fire. Functional sharding is great for ownership boundaries and bad for evenly distributing a single overloaded job.

Hashmod Scrape Sharding: Splitting One Job Across N Instances

When a single job is too big to scrape from one instance, functional sharding can’t help — you need to split that job’s targets across multiple identical Prometheus instances. This is where the hashmod relabel action earns its keep.

The trick: every Prometheus replica runs the same config, but each is assigned a different shard index. A relabel rule hashes a stable label (usually __address__), takes the modulus by the total shard count, and keeps only the targets whose hash matches this replica’s index. Every target lands on exactly one shard, deterministically.

scrape_configs:
  - job_name: "big-fleet"
    kubernetes_sd_configs:
      - role: pod
    relabel_configs:
      # Hash the target address into a numeric bucket.
      - source_labels: [__address__]
        modulus: 4              # total number of shards
        action: hashmod
        target_label: __tmp_hash
      # Keep only the targets that belong to THIS shard.
      - source_labels: [__tmp_hash]
        regex: "0"              # this replica's shard index (0..3)
        action: keep

Replica 0 keeps targets where the hash mod 4 equals 0, replica 1 uses regex: "1", and so on through replica 3. You deploy four instances, change only that one regex value per instance, and your scrape load is split four ways with even distribution because the hash spreads addresses uniformly.

In practice you template that regex value. With a Kubernetes StatefulSet you can derive the shard index from the pod ordinal; with the Prometheus Operator the shards field on the Prometheus CR generates this exact hashmod + keep pair for you, injecting $(SHARD) from the pod name.

# Templated — value injected per replica from the pod ordinal
      - source_labels: [__tmp_hash]
        regex: "$(SHARD)"
        action: keep

A few things to get right. Use a label that’s present on every target and stable over time — __address__ is the standard choice. Don’t hash on something that churns, or targets will hop between shards and you’ll see series discontinuities. And remember each shard now holds a fragment of the data: to query across the whole fleet you need a fan-out layer like Thanos Query or Mimir sitting in front of all the shards. Sharding the scrape means you must un-shard the read.

Pro Tip: Keep __tmp_hash prefixed with __tmp — labels starting with __ are dropped automatically after relabeling, so your hash bucket never pollutes the final time series. If you named it shard_hash, it would leak into every metric.

Agent Mode: Scrape-and-Forward, Nothing Else

There’s a third lever, and it’s the one I wish I’d reached for years earlier. Prometheus Agent Mode turns an instance into a stripped-down scraper that does service discovery, scrapes, applies relabeling, and immediately forwards everything via remote write to a central store. No local querying, no alerting, no long-term TSDB — just a small WAL for buffering during network blips.

That makes agents cheap and disposable, which is exactly what you want at the edge: one agent per cluster, per region, or per shard, all funneling into a central Mimir, Thanos Receive, or Cortex backend that owns the data, the queries, and the alert rules.

You enable it with a feature flag and run with the --agent mode:

prometheus \
  --enable-feature=agent \
  --config.file=/etc/prometheus/agent.yml \
  --storage.agent.path=/var/lib/prometheus-agent

The config looks like a normal Prometheus config minus everything you can’t use in agent mode — no alerting, no rule_files. Just scrape and remote write:

global:
  scrape_interval: 30s
  external_labels:
    cluster: "us-east-1"
    shard: "0"

scrape_configs:
  - job_name: "big-fleet"
    kubernetes_sd_configs:
      - role: pod
    relabel_configs:
      - source_labels: [__address__]
        modulus: 4
        action: hashmod
        target_label: __tmp_hash
      - source_labels: [__tmp_hash]
        regex: "0"
        action: keep

remote_write:
  - url: "https://mimir.internal/api/v1/push"
    queue_config:
      max_shards: 50
      capacity: 10000

Notice you can combine agent mode with hashmod sharding — each agent is both a shard and a forwarder. That’s a powerful pattern: cheap edge agents split a huge job by hash, all writing into one central store that handles HA, retention, and querying. If you’re tuning that remote_write block, the queue settings matter more than people expect; I went deep on them in tuning Prometheus remote write for reliable shipping.

Agent mode’s tradeoff is that the agent is now useless on its own. You can’t curl it for a graph during an incident; the data lives downstream. If your central store is down, agents buffer to their WAL and replay when it returns — but you’ve concentrated a lot of importance into that central tier. Make it highly available.

The Operational Tradeoffs, Side by Side

None of these is free. Functional sharding is the easiest to operate and reason about but distributes load poorly. Hashmod sharding distributes load evenly but fragments your data and forces a fan-out query layer plus a global view. Agent mode minimizes edge footprint and centralizes everything but makes the central store load-bearing for all querying and alerting.

In practice you mix them. A common mature setup: agent-mode instances at each cluster edge, hashmod-sharding the heaviest jobs, remote-writing into a central Mimir that does global query, dedup, and alerting. Your scrape config still wants careful relabeling regardless of which path you choose — if relabeling itself is fuzzy to you, the scrape config and relabeling deep dive is the prerequisite. And if you’re running redundant scrapers for HA rather than pure load splitting, that’s a different pattern covered in Prometheus high availability and federation done right.

Let AI Draft the Relabel Rules — Then Review Them

Hashmod relabeling is fiddly. The ordering of rules matters, the __tmp_ prefix matters, and a wrong regex silently drops all your targets instead of just the ones for the other shards. This is exactly the kind of mechanical-but-error-prone work where a model shines. I treat Claude or Cursor like a fast junior engineer: I describe the fleet and the shard count, it produces a relabel_configs block in seconds, and I review it line by line before it ever touches a config repo.

The review is non-negotiable. A junior engineer who’s quick but unsupervised ships subtle bugs, and so does an LLM. I check three things every time: is the hashed source label actually stable, does the modulus match the real replica count, and does the keep regex correctly select this shard. The output has to be explainable — if I can’t articulate why each rule exists, it doesn’t merge. For the alerting side that you’ve now centralized in your store, our free Alert Rule Generator drafts reviewable PromQL alert rules the same way: fast first draft, human sign-off before shipping. If you lean on AI for infra config regularly, the curated prompt packs save the rewriting.

Wrapping Up

Scaling Prometheus scraping isn’t one decision, it’s a small menu: scale up until you can’t, shard functionally for clean ownership, shard by hashmod when one job outgrows one box, and reach for agent mode when you want cheap edge scrapers feeding a central store. The config is short and deterministic — which is precisely why it’s worth getting exactly right. Let AI write the first draft fast, then review it like the careful engineer you’d want on call at 3 a.m. More in the Prometheus monitoring category.

Scaling Prometheus Scraping: Functional Sharding, Hashmod, and Agent Mode