Skip to content
DevOps AI ToolKit
Newsletter
All prompts
AI for Prometheus & Monitoring Difficulty: Advanced ClaudeChatGPTCursor

Thanos Store Gateway Index & Caching Tier Sizing Prompt

Size the Thanos Store Gateway index cache, bucket cache, and caching-bucket tiers so long-range queries are fast without exhausting memory or hammering object storage.

Target user
Engineers operating Thanos for long-term Prometheus storage
Difficulty
Advanced
Tools
Claude, ChatGPT, Cursor

The prompt

You are a Thanos operator who understands that the Store Gateway sits between queriers and object storage, and that its caches (index-header, index cache, and the caching bucket) are what make long-range queries survive without re-fetching blocks from S3 on every request.

I will provide:
- Total data in the object store and retention (e.g. 12 months, ~4 TB of blocks): [BUCKET SIZE]
- The query patterns hitting Store Gateway (long dashboards, ad-hoc range queries, downsampled vs raw): [QUERY PATTERNS]
- Current Store Gateway memory limit and observed OOMs or slow queries: [RESOURCES + SYMPTOMS]
- Whether I run a separate caching layer (Memcached/Redis) or in-memory only: [CACHE BACKEND]

Your job:

1. **Name the three cache layers and what each holds** — index-header (mmap'd, sizes with block count), index cache (postings/series, sizes with active label cardinality), and the caching bucket (chunk/object responses, offloads S3). Explain which layer a "slow first query, fast second query" symptom points to.

2. **Diagnose the symptom** — OOM on startup vs OOM under query load vs slow-but-stable point to different caches. Map mine to the layer that needs sizing.

3. **Recommend backend choice** — explain when in-memory index cache is fine vs when to move to Memcached for shared, larger, restart-surviving cache, and the trade-offs (network hop vs capacity and resilience).

4. **Propose concrete config** — give index-cache size, caching-bucket config, and Memcached sizing tied to my bucket size and cardinality, with the relevant Store Gateway flags and comments.

5. **Add guardrails** — set max-concurrent and block-sync settings so a few heavy queries can't OOM the whole gateway, and recommend sharding by time or hashmod if one gateway can't hold the index headers.

Output as: (a) a 3-row table mapping symptom to cache layer, (b) the recommended config block with flags, (c) a sizing rationale tied to my numbers (no invented capacity figures — show the formula and let me plug in), (d) a sharding recommendation if my block count is too large for one instance.

Always show the sizing as a formula I fill in with my real numbers, never a magic constant. Recommend sharding before recommending an ever-larger single instance.

Why this prompt works

The Thanos Store Gateway is the component people understand least and blame most, because its performance is dominated by three separate caches that fail in different ways. A “slow first query, fast on retry” symptom is a caching-bucket miss; an OOM at startup before any query runs is the index-header layer exhausting memory as block count grows; an OOM under load is the index cache filling with postings for high-cardinality label sets. Most troubleshooting flails because people treat “Store Gateway is slow/OOMing” as one problem. This prompt’s first job is to split the symptom across the three layers so the fix targets the actual bottleneck instead of just raising the memory limit and hoping.

It also refuses to hand you magic numbers. Cache sizing depends on your block count, retention, and label cardinality — values the model cannot know — so the prompt insists on a formula you fill in rather than a fabricated constant. That matters here because invented capacity figures are how teams end up with a Store Gateway that OOMs the day retention crosses some threshold nobody computed. By framing sizing as an explicit formula plus your real numbers, the answer stays honest and reusable as your data grows.

The sharding guardrail is the strategic backbone. Index-header memory scales with block count, which scales with retention, so a single Store Gateway eventually cannot hold the headers no matter how much RAM you give it — the only real fix is sharding by time or hashmod. Forcing the model to recommend sharding before recommending a bigger instance steers you away from the dead-end of vertical scaling. Combined with cache-hit-rate measurement, this keeps the work verifiable: you apply the config, watch hit rate and memory, and confirm the gateway is healthy before pointing production queriers at it.

Related prompts

Newsletter

Free: the DevOps AI Incident-Triage Cheat Sheet

Subscribe and we’ll send you the one-page cheat sheet — plus weekly AI prompts, automation ideas, and tool reviews for infrastructure engineers. One email a week. No spam, unsubscribe anytime.

  • AI Incident-Triage Cheat Sheet (PDF)
  • Access to 2,104 DevOps AI prompts
  • One practical workflow email per week