Skip to content
DevOps AI ToolKit
Newsletter
All prompts
AI for Prometheus & Monitoring Difficulty: Advanced ClaudeChatGPTCursor

Prometheus query.max-samples, timeout & concurrency Tuning Prompt

Tune Prometheus server query limits (query.max-samples, query.timeout, query.max-concurrency) so heavy or runaway queries fail fast instead of OOMing the server.

Target user
Engineers protecting a shared Prometheus from expensive queries
Difficulty
Advanced
Tools
Claude, ChatGPT, Cursor

The prompt

You are a Prometheus operator who knows that the read path can OOM the whole server, and that query.max-samples, query.timeout, and query.max-concurrency are the server-side guardrails that make a runaway query fail fast instead of taking everyone down.

I will provide:
- The symptom (server OOM during certain dashboards, query timeouts, slowness for everyone when one user runs a big query): [SYMPTOM]
- Current flag values for query.max-samples, query.timeout, query.max-concurrency, query.lookback-delta (or "defaults"): [CURRENT FLAGS]
- Server resources (memory, cores) and active series scale: [RESOURCES]
- Whether this Prometheus is shared by many teams/dashboards or dedicated: [TENANCY]

Your job:

1. **Explain each flag and the failure it prevents**:
   - query.max-samples: caps total samples a single query may load into memory; the main defense against an OOM from a wide range query. Exceeding it errors the query, not the server.
   - query.timeout: bounds wall-clock per query so a slow query can't hold resources indefinitely.
   - query.max-concurrency: caps simultaneous queries so a burst can't collectively exhaust memory.

2. **Diagnose my symptom** — OOM on a specific dashboard points at max-samples too high (or absent); everyone-slow-when-one-user-queries points at concurrency; queries hanging points at timeout.

3. **Recommend values** — set max-samples relative to available memory (show the rough samples-to-bytes reasoning as a formula, not a magic number), a timeout that fails fast but allows legitimate heavy panels, and a concurrency cap for shared tenancy. Note these protect the server but return user-facing errors, so pair with query-frontend/sharding for genuinely heavy workloads.

4. **Add detection** — give the queries/log settings to find the offending queries (query log, prometheus_engine_query_duration_seconds, slow-query audit) so I fix the query, not just clamp the limit.

5. **Sequence the rollout** — recommend setting limits generously first, watching which queries trip them, then tightening, so you don't break legitimate dashboards on day one.

Output as: (a) a table of the three flags with the failure each prevents, (b) recommended values tied to my resources via a formula, (c) the detection queries/log config, (d) a staged rollout plan. No invented memory-per-sample constants — show the formula.

Frame these as guardrails that error individual queries, never as a performance fix for legitimately heavy workloads. Recommend finding and fixing the offending query alongside clamping the limit.

Why this prompt works

Most Prometheus operators harden the write path — scrape limits, cardinality controls, sample caps — and leave the read path wide open, which is how a single ad-hoc query during an incident OOMs the server and takes monitoring down for everyone exactly when they need it most. The three server flags query.max-samples, query.timeout, and query.max-concurrency are the guardrails for that read path, and their crucial property is that they make an offending query fail with an error rather than crash the process. This prompt centers that distinction so operators understand they are trading a user-facing query error for server survival, which is almost always the right trade on a shared instance.

The diagnostic mapping is what makes the advice actionable. Each symptom points at a different flag: an OOM tied to a specific dashboard implicates max-samples; everyone slowing down when one person runs a big query implicates concurrency; queries that hang forever implicate the timeout. Without that mapping, operators tend to clamp all three blindly and then spend a week fielding complaints about broken dashboards. By tying the fix to the observed failure, the prompt produces a targeted change.

It also refuses two tempting shortcuts. First, it insists on a formula for max-samples rather than a fabricated bytes-per-sample constant, because the right value depends on the server’s memory and nobody should copy a magic number into a memory-safety guardrail. Second, it frames these limits as protection, not performance — genuinely heavy workloads belong behind a query frontend or sharding, not under a tighter clamp. Combined with the requirement to find and fix the offending query via the query log and a staged rollout, this keeps the work honest and reversible: you discover what’s expensive, fix the cause, and only then tighten the safety net, all while watching real traffic instead of guessing.

Related prompts

Newsletter

Free: the DevOps AI Incident-Triage Cheat Sheet

Subscribe and we’ll send you the one-page cheat sheet — plus weekly AI prompts, automation ideas, and tool reviews for infrastructure engineers. One email a week. No spam, unsubscribe anytime.

  • AI Incident-Triage Cheat Sheet (PDF)
  • Access to 2,104 DevOps AI prompts
  • One practical workflow email per week