Grafana Error Guide: Prometheus 'too many outstanding requests' — Fix Query Concurrency Limits
Fix Prometheus 'too many outstanding requests' in Grafana: diagnose query concurrency limits, heavy dashboards, query sharding queues, and Thanos/Cortex frontend backpressure.
- #grafana
- #troubleshooting
- #errors
- #prometheus
Overview
“too many outstanding requests” is backpressure from the Prometheus query engine (or a Thanos/Cortex/Mimir query frontend). Prometheus limits how many queries execute concurrently (--query.max-concurrency, default 20) and queues the rest. When the queue is also full, new queries are rejected outright — and Grafana surfaces that rejection on the panel.
The literal error returned to Grafana:
too many outstanding requests
{"status":"error","errorType":"unavailable","error":"too many outstanding requests"}
This is not a timeout (504) and not a crash — it is the backend deliberately shedding load. A dashboard full of heavy panels refreshing at once, or several users hitting the same instance, can trip it.
Symptoms
- Multiple panels flash “too many outstanding requests”, often intermittently on refresh.
- The error correlates with dashboard auto-refresh intervals or many concurrent viewers.
- Lightweight queries succeed while heavy ones are rejected.
- With Thanos/Cortex/Mimir, the query-frontend rejects when its inflight queue fills.
Common Root Causes
1. Query concurrency saturated
--query.max-concurrency (default 20) is fully in use by slow queries, so additional queries queue and then get rejected.
2. Dashboards issue too many queries at once
A dashboard with dozens of panels and a short auto-refresh fires all queries simultaneously each cycle.
3. Expensive queries hold slots too long
High-cardinality rate()/histogram_quantile() over long ranges occupy concurrency slots for seconds each, starving everything else.
4. Thanos/Cortex/Mimir frontend queue full
The query-frontend’s -querier.max-outstanding-requests-per-tenant queue fills under load and rejects.
5. Under-provisioned queriers
Too few querier replicas for the query volume; the fix is horizontal capacity, not a bigger queue.
Diagnostic Workflow
Step 1: Confirm the source in the Grafana log
sudo journalctl -u grafana-server --no-pager | grep -i "outstanding" | tail -20
kubectl logs deploy/grafana -n monitoring | grep -i "outstanding" | tail -20
grep -i "too many outstanding" /var/log/grafana/grafana.log | tail -20
Step 2: Look at Prometheus/query-frontend logs and flags
kubectl logs sts/prometheus -n monitoring | grep -iE "concurrency|outstanding|queue" | tail -20
# Current concurrency flag
ps aux | grep -oE '\-\-query.max-concurrency=[0-9]+'
For Thanos/Cortex/Mimir, check the frontend/querier logs for too many outstanding requests per tenant.
Step 3: Find the heavy queries holding slots
# Slowest queries by engine time (Prometheus exposes these)
topk(10, prometheus_engine_query_duration_seconds{slice="inner_eval"})
# Concurrency in use vs. limit
prometheus_engine_queries
prometheus_engine_queries_concurrent_max
Step 4: Reduce concurrent load from Grafana
- Lengthen dashboard auto-refresh (e.g. 30s → 5m) or disable it on heavy boards.
- Split monster dashboards into smaller ones so a single refresh fires fewer queries.
- Replace expensive per-panel PromQL with recording rules.
Step 5: Raise limits or add capacity (deliberately)
# Prometheus flag (raise concurrency if CPU headroom exists)
# --query.max-concurrency=40
# Cortex/Mimir query-frontend
# -querier.max-outstanding-requests-per-tenant=200
Raising the queue without CPU/querier headroom just moves the rejection to a timeout. Add querier replicas when the backend is genuinely saturated.
Example Root Cause Analysis
An ops team reports panels intermittently showing “too many outstanding requests” every 10 seconds — exactly the dashboard’s auto-refresh interval. The dashboard has 40 panels, each running a histogram_quantile over per-endpoint buckets.
Checking Prometheus, prometheus_engine_queries sits pinned at the max-concurrency of 20 during each refresh burst, and query durations show the p95 panels taking 2–4s each. Forty heavy queries firing at once overwhelm 20 slots, so a dozen get rejected.
Fix: move the percentile math into recording rules so each panel reads one precomputed series, change auto-refresh from 10s to 1m, and split the board into two. Concurrency usage drops well under the limit and the rejections stop. Root cause: a heavy dashboard firing more concurrent expensive queries than the engine allows — a load problem, not a bug.
Prevention Best Practices
- Precompute expensive aggregations with recording rules so panels read cheap single series; see more Grafana guides.
- Set conservative dashboard auto-refresh intervals; avoid 5–10s refresh on large boards.
- Keep dashboards focused — fewer panels per board means fewer simultaneous queries per refresh.
- Size
--query.max-concurrency(and Thanos/Cortex/Mimir queues) to available CPU, and scale querier replicas horizontally under real load. - Monitor
prometheus_engine_queriesvs.prometheus_engine_queries_concurrent_maxto see saturation before users do.
Quick Command Reference
# Grafana-side confirmation
sudo journalctl -u grafana-server | grep -i "outstanding" | tail -20
kubectl logs deploy/grafana -n monitoring | grep -i "outstanding" | tail -20
# Backend logs and concurrency flag
kubectl logs sts/prometheus -n monitoring | grep -iE "concurrency|outstanding|queue" | tail -20
ps aux | grep -oE '\-\-query.max-concurrency=[0-9]+'
# Saturation metrics (PromQL)
# prometheus_engine_queries
# prometheus_engine_queries_concurrent_max
# topk(10, prometheus_engine_query_duration_seconds)
Conclusion
“too many outstanding requests” is intentional backpressure: the query engine is shedding load it cannot service concurrently. Treat it as a capacity/query-shape problem:
- Confirm it is the backend rejecting (not a 504 timeout) from the logs.
- Measure concurrency saturation and find the heavy queries holding slots.
- Reduce concurrent load from Grafana first — recording rules, longer refresh, smaller boards.
- Only then raise concurrency/queue limits, and add querier capacity when genuinely saturated.
Fixing the query shape and refresh cadence removes the pressure; raising limits alone just relocates it.
Download the Free 500-Prompt DevOps AI Toolkit
500 battle-tested, copy-paste AI prompts engineered by a senior systems engineer — every one with fill-in placeholders and safety/back-out notes. Drop your email and it's yours.
- 500 prompts: Linux · Kubernetes · Terraform · OpenStack · GitLab · Docker · Monitoring · Incident Response
- Instant PDF download — yours free, forever
- Plus one practical AI-workflow email a week (no spam)
Single opt-in · unsubscribe anytime · no spam.