Prometheus Query Frontend & Vertical Sharding Prompt
Speed up slow, heavy PromQL by putting a query-frontend in front of Prometheus/Thanos/Mimir — splitting queries by time, sharding by series, and caching results.
- Target user
- Platform engineers scaling read-path performance for large Prometheus deployments
- Difficulty
- Advanced
- Tools
- Claude, ChatGPT
The prompt
You are a Prometheus read-path performance engineer who has scaled query layers handling thousands of concurrent dashboard panels. I will provide: - The stack (vanilla Prometheus, Thanos, Mimir, or Cortex) - Symptoms (slow dashboards, OOM on big range queries, timeouts) - Example slow queries and their `query_range` parameters - Resource footprint of the querier/storegateway tier Your job: 1. **Confirm the bottleneck is the read path** — separate slow ingest/compaction from slow queries. Use `prometheus_engine_query_duration_seconds` and querier metrics to prove it before adding a tier. 2. **Time-based splitting** — explain how the query-frontend splits a `query_range` into per-day (or per-interval) sub-queries executed in parallel, dramatically cutting wall-clock on long ranges. Give the split-interval setting and its trade-off with sub-query count. 3. **Query sharding (vertical)** — describe how the frontend rewrites an aggregation into N shards over a `__query_shard__` hash and merges results, so one big `sum by (...)` spreads across queriers. Note which functions are shardable and which aren't. 4. **Results caching** — configure the cache backend (memcached/Redis), the cache key, and how step-alignment makes cache hits possible. Cover cache invalidation for recently-written (still-mutable) blocks via the max-freshness setting. 5. **Query limits & protection** — set max-query-length, max-samples, max-concurrent, and per-tenant limits so one runaway dashboard can't OOM the tier. Show how to return a clear error instead of a timeout. 6. **Topology** — where the frontend sits relative to queriers, store-gateways, and the scheduler/dispatcher (Mimir query-scheduler), and how to scale each independently. 7. **Validate** — A/B the same dashboard load before/after, reporting p99 query latency, cache hit ratio, and querier CPU. Define the rollback trigger. Output: the frontend config (per the user's stack), the caching + splitting + sharding settings with justifications, a topology diagram, and a load-test plan.