Thanos Store Gateway Caching Tiers Explained

The Thanos Store Gateway is the component people blame most and understand least. It sits between your queriers and object storage, turning a long-range query against a year of blocks in S3 into something that returns before the dashboard times out. When it’s slow or OOMing, the instinct is to raise its memory limit and move on. That works right up until it doesn’t, because the Store Gateway’s behavior is governed by three distinct caches that fail in three distinct ways. Treat “Store Gateway is unhappy” as one problem and you’ll keep adding RAM to a component that needed a different cache, or sharding, all along.

The three tiers

Index-header. For every block in the bucket, the Store Gateway loads a small index-header (memory-mapped) so it knows which series and label values live in that block without downloading the whole index. Index-header memory scales with the number of blocks, which scales with retention. This is the cache that OOMs the gateway at startup, before it serves a single query, once your retention grows past what one instance can hold.

Index cache. When a query runs, the gateway resolves label matchers into postings and series. The index cache holds those postings and series entries. It scales with active label cardinality and with how many distinct series your queries touch. This is the cache that OOMs under query load, especially when someone runs a wide, high-cardinality matcher.

Caching bucket. Chunk data and object-store responses are expensive to fetch repeatedly. The caching bucket caches those fetches so a repeated query doesn’t go back to S3 every time. An undersized caching bucket produces the classic “slow first query, fast second query” symptom — and on S3-class backends, every miss is a real request you pay for.

Reading the symptom

This mapping is the whole game:

OOM at startup, before queries -> index-header memory; you have too many blocks for one gateway.
OOM under query load -> index cache; high-cardinality queries are filling it.
Slow first query, fast retry -> caching bucket miss; the data isn’t cached or the cache is too small.

Once you know which tier is at fault, the fix is obvious. Guess, and you’ll resize the wrong one.

Config that maps to the tiers

A typical sizing moves the index cache and caching bucket onto Memcached so they survive restarts and can grow beyond a single gateway’s heap:

# index cache via Memcached
type: MEMCACHED
config:
  addresses: ["memcached:11211"]
  max_item_size: 16MiB
  max_async_concurrency: 50

# caching bucket config (chunks/objects)
type: MEMCACHED
config:
  addresses: ["memcached:11211"]
chunk_subrange_size: 16000
max_chunks_get_range_requests: 3
blocks_iter_ttl: 5m
metafile_max_size: 1MiB

And guardrails so a few heavy queries can’t take the gateway down:

# Store Gateway flags
--store.grpc.series-max-concurrency=20
--store.grpc.touched-series-limit=0   # set a real cap for shared tenancy

The index cache and caching bucket can be sized generously because Memcached holds them off-heap. The index-header cannot — it lives in the gateway’s own memory and is the hard ceiling.

Sizing without magic numbers

The honest way to size the index cache is a formula you fill in, not a constant copied off a forum. Roughly: index-header memory grows with block count, and index-cache and caching-bucket capacity should track your active cardinality and working-set size. An assistant is good for turning your real numbers into a starting config:

My Thanos bucket holds about 4 TB across 12 months, the Store Gateway OOMs at startup before serving queries, and I’m in-memory only. Tell me which cache tier this points to, whether to move to Memcached, and whether I need to shard — show the reasoning, not a magic memory number.

Startup OOM before queries points at the index-header tier, which scales with block count — Memcached won’t help because index-headers live in the gateway’s own heap. With 12 months of blocks you’ve likely outgrown a single instance. Shard the Store Gateway by time range (or hashmod on __block_id) so each instance loads only a slice of the headers, then move the index cache and caching bucket to Memcached for capacity and restart resilience.

That answer is correct and useful precisely because it refuses to invent a RAM figure. The model drafts the strategy; you verify by watching index-header memory and cache hit rate after the change. The broader pattern of using AI as a co-investigator on storage internals shows up across the Prometheus and monitoring guides.

Shard before you scale up

The single most important takeaway: index-header memory scales with block count, and block count scales with retention, so there is no amount of RAM that makes a single Store Gateway hold an unbounded history. Vertical scaling buys you months; sharding by time or hashmod buys you years. Plan the shard before the OOM, not after, because the OOM tends to arrive during an incident when you’re already trying to query the past.

After any sizing change, prove it. Watch the cache hit rate (a low caching-bucket hit rate means you’re still hammering S3), watch index-header memory at startup, and watch for OOMs under your heaviest real dashboards rather than synthetic queries. If you want a structured way to turn your numbers into a config and a sharding decision, the Store Gateway cache sizing prompt walks the formula, and the long-term storage selection prompt helps if you’re still deciding between Thanos and the alternatives.

The bottom line

The Store Gateway isn’t one black box — it’s three caches with three failure modes. Diagnose by symptom, push the index cache and caching bucket onto Memcached, keep a hard eye on index-header memory because that’s your real ceiling, and shard by time before retention forces your hand. Do that and long-range queries stay fast without the 3 a.m. OOM that takes your history offline exactly when you need it.