Running Grafana Mimir at Scale: Multi-Tenant Metrics

I’ve stood up Grafana Mimir for a platform team serving dozens of internal tenants, and the first lesson is humbling: Mimir is not “Prometheus but bigger.” It’s a distributed system with a dozen microservices, an object-storage backend, and a hash ring, and if you deploy it like an app you’ll spend your first month debugging the ring instead of looking at metrics. This is what I’d tell my past self.

When you actually need Mimir

Mimir is for horizontal scale and multi-tenancy. Reach for it when:

You’re past what a single Prometheus or even Thanos can comfortably hold — tens of millions of active series and up.
You need hard tenant isolation: separate teams whose metrics, limits, and queries must not bleed into each other.
You want long retention on cheap object storage (S3, GCS, Azure Blob) rather than local disk.

If you’re a single team with a few million series, Mimir is overkill — you’ll spend more time operating it than it saves. Thanos or even a beefy VictoriaMetrics gets you there with less surface area.

The component map, simplified

Mimir splits the read and write paths across services. The ones you must understand:

distributor — receives remote_write, validates, and shards samples to ingesters via the hash ring.
ingester — holds recent samples in memory, flushes blocks to object storage.
store-gateway — serves historical blocks from object storage for queries.
querier / query-frontend — execute and accelerate PromQL; the frontend does splitting and caching.
compactor — merges and deduplicates blocks in object storage over time.

The trap is deploying all of these separately on day one. Don’t. Mimir ships a monolithic mode (-target=all) that runs every component in one binary. Start there, prove it works, and split into microservices only when a specific component becomes your bottleneck.

Multi-tenancy is the whole point

Every request to Mimir carries an X-Scope-OrgID header that names the tenant. This is how isolation works:

# Prometheus / vmagent sending to Mimir as tenant "team-payments"
remote_write:
  - url: http://mimir-distributor/api/v1/push
    headers:
      X-Scope-OrgID: team-payments

Without that header, Mimir rejects the write (when auth_enabled: true). Each tenant gets its own series limits, retention, and query scope. Critically, Mimir does not do authentication itself — it trusts the header. You put an auth proxy (or your gateway) in front that sets X-Scope-OrgID based on a verified identity. Skipping that step means any client can impersonate any tenant.

Per-tenant limits keep one team from sinking everyone

The most important config in a multi-tenant Mimir is limits. A single tenant with runaway cardinality can exhaust ingester memory for everyone if you let it:

limits:
  max_global_series_per_user: 1500000
  ingestion_rate: 50000
  ingestion_burst_size: 200000
  max_label_names_per_series: 30
  compactor_blocks_retention_period: 180d

# override a noisy tenant in runtime_config (hot-reloadable)
overrides:
  team-analytics:
    max_global_series_per_user: 5000000
    ingestion_rate: 150000

The runtime overrides file is reloaded without a restart, which matters when a tenant pages you at 3am because they’re hitting a limit during an incident. You bump their ceiling, Mimir picks it up, no rollout.

Object storage is your durability story

Ingesters hold only recent data in memory. Everything durable lives in object storage:

common:
  storage:
    backend: s3
    s3:
      endpoint: s3.us-east-1.amazonaws.com
      bucket_name: mimir-blocks
blocks_storage:
  backend: s3

Two operational truths follow. First, the compactor is not optional — without it, object storage fills with un-compacted, un-deduplicated blocks and query performance rots. Treat a wedged compactor as a real incident. Second, your retention is enforced by the compactor deleting old blocks, so a stuck compactor also means retention silently stops working.

Watching Mimir watch your metrics

Mimir exposes its own metrics, and you must monitor the monitor. The queries I keep on the on-call dashboard:

# ingester memory pressure - active series per ingester
cortex_ingester_memory_series

# distributor rejecting samples (limits hit or validation)
sum by (reason) (rate(cortex_distributor_samples_in_total[5m]))
  - sum(rate(cortex_distributor_received_samples_total[5m]))

# compactor falling behind
cortex_compactor_blocks_marked_for_deletion_total

# query frontend latency
histogram_quantile(0.99,
  rate(cortex_query_frontend_query_duration_seconds_bucket[5m]))

If cortex_ingester_memory_series is climbing toward your configured limit and the compactor lag is growing, you’re heading for trouble before the pager knows it.

The pragmatic deployment order

Deploy monolithic mode with object storage and auth_enabled: true.
Put an auth proxy in front that injects X-Scope-OrgID.
Set conservative per-tenant limits from day one.
Point one non-critical Prometheus at it via remote_write and live with it for a week.
Onboard tenants gradually, watching ingester memory and compactor lag.
Split into microservices only when you can name the component that’s the bottleneck.

Mimir is genuinely excellent at the scale it’s built for. It’s just not a starter system — earn your way into it.

For the storage trade-offs against the alternatives, our companion guides on long-term Prometheus storage live in the Prometheus and monitoring category, and our monitoring alert assistant helps keep the alert rules you run on top of Mimir from becoming noise.

Mimir config keys evolve between releases. Validate every option against the version you deploy rather than copying blindly.