Protecting the Prometheus Read Path: max-samples, timeout

Most teams harden the Prometheus write path obsessively — sample limits, target limits, cardinality controls, relabeling — and leave the read path wide open. Then, during an incident, someone runs a wide ad-hoc query across three months of high-cardinality data, the server’s memory spikes, and Prometheus gets OOMKilled. Monitoring goes dark for every team at once, at the precise moment everyone needs it. The read path can take down the server just as surely as the write path, and the guardrails that prevent it — query.max-samples, query.timeout, and query.max-concurrency — are flags many operators have never touched.

What the three flags do

Each guards against a different way a query can hurt the server, and crucially, each makes the offending query fail rather than crashing the process.

--query.max-samples caps the total number of samples a single query may load into memory at once. This is the primary defense against an OOM from a wide range query. Exceed it and the query returns an error; the server stays up.
--query.timeout bounds the wall-clock time any single query may run. A pathological query can’t hold resources indefinitely; it’s cut off and errors.
--query.max-concurrency caps how many queries execute simultaneously. A burst of heavy queries can’t collectively exhaust memory; excess queries queue instead of piling on.

prometheus \
  --query.max-samples=50000000 \
  --query.timeout=2m \
  --query.max-concurrency=20

The mental shift: these are guardrails that turn a server-killing query into a user-facing error. On a shared instance, that’s almost always the right trade.

Map the symptom to the flag

As with most tuning, start from the observed failure:

OOM tied to a specific dashboard or query -> query.max-samples is too high or unset; one query is loading enough samples to exhaust memory.
Everyone slows down when one person runs a big query -> query.max-concurrency; concurrent heavy queries are contending for resources.
Queries hang and never return -> query.timeout is too high or unset.

Clamp all three blindly and you’ll break legitimate dashboards while fielding complaints. Tie the change to the symptom and it’s targeted.

Sizing without a magic number

query.max-samples is the one people most want a constant for, and it’s the one you most shouldn’t copy off a forum. The right value depends on your server’s memory, because each in-flight sample costs memory, and the cap exists to keep a single query’s working set well under available RAM. Reason about it as a formula — available query memory divided by the rough per-sample cost, with headroom — and plug in your real numbers. An assistant is good for the arithmetic, as long as you forbid it from inventing the per-sample constant:

My shared Prometheus OOMs when someone loads a 30-day high-cardinality dashboard. The box has 32Gi. I want to set query.max-samples to protect it. Walk me through sizing it as a formula with my numbers, and tell me what else to set.

Don’t copy a number — size it from memory. Reserve a portion of the 32Gi for query working set (leave plenty for TSDB head and scrape), divide by the approximate bytes-per-sample for your build, and that’s your max-samples ceiling. Set it generously first (e.g. start high), watch which queries trip it via the query log, then tighten. Also set query.timeout=2m and query.max-concurrency for your tenant count. And remember: these error the query — for genuinely heavy workloads, put a query frontend in front rather than just clamping.

The model drafts the approach; you verify by setting the limit generously, watching real traffic, and tightening. That staged approach is the human-verifies half — and the refusal to fabricate a bytes-per-sample constant is what keeps a memory-safety guardrail honest. The same discipline runs through the Prometheus and monitoring prompts.

Find the offending query, don’t just clamp it

A limit that errors a query tells you that something is expensive, not what. Enable the query log to find the actual culprit so you can fix the query — often it’s a missing recording rule or an unbounded matcher — rather than just capping the symptom:

# prometheus.yml
global:
  query_log_file: /prometheus/query.log

# Watch query cost over time
histogram_quantile(0.99, rate(prometheus_engine_query_duration_seconds_bucket[5m]))

The clamp protects the server today; the query log lets you fix the cause so the clamp stops tripping.

Roll out generous, then tighten

The one way to guarantee an angry team is to apply tight limits on day one and break legitimate dashboards. Instead:

Set the limits generously.
Watch which real queries trip them (query log, prometheus_engine_* metrics).
Fix or optimize the genuinely-broken queries; offload heavy ones to a query frontend or recording rules.
Tighten the limits to a level that catches runaways but clears everything legitimate.

These flags are protection, not a performance feature. If a workload is legitimately heavy, the answer is a query frontend, vertical sharding, or recording rules — not a tighter clamp that just makes the dashboard fail.

The bottom line

A single runaway query can OOM a shared Prometheus and blind every team at once. query.max-samples, query.timeout, and query.max-concurrency convert that catastrophe into a recoverable, per-query error — but only if you size max-samples from your real memory instead of a borrowed constant, find the offending query rather than just clamping it, and roll out generous-then-tighten so you don’t break what works. For a structured way to size the flags and build the detection, the query resource-limit tuning prompt and the read-path protection prompt start from your hardware, not a magic number.

Protecting the Prometheus Read Path: max-samples, timeout, and Concurrency