Grafana Error Guide: Prometheus 'too many outstanding

Overview

“too many outstanding requests” is backpressure from the Prometheus query engine (or a Thanos/Cortex/Mimir query frontend). Prometheus limits how many queries execute concurrently (--query.max-concurrency, default 20) and queues the rest. When the queue is also full, new queries are rejected outright — and Grafana surfaces that rejection on the panel.

The literal error returned to Grafana:

too many outstanding requests

{"status":"error","errorType":"unavailable","error":"too many outstanding requests"}

This is not a timeout (504) and not a crash — it is the backend deliberately shedding load. A dashboard full of heavy panels refreshing at once, or several users hitting the same instance, can trip it.

Symptoms

Multiple panels flash “too many outstanding requests”, often intermittently on refresh.
The error correlates with dashboard auto-refresh intervals or many concurrent viewers.
Lightweight queries succeed while heavy ones are rejected.
With Thanos/Cortex/Mimir, the query-frontend rejects when its inflight queue fills.

Common Root Causes

1. Query concurrency saturated

--query.max-concurrency (default 20) is fully in use by slow queries, so additional queries queue and then get rejected.

2. Dashboards issue too many queries at once

A dashboard with dozens of panels and a short auto-refresh fires all queries simultaneously each cycle.

3. Expensive queries hold slots too long

High-cardinality rate()/histogram_quantile() over long ranges occupy concurrency slots for seconds each, starving everything else.

4. Thanos/Cortex/Mimir frontend queue full

The query-frontend’s -querier.max-outstanding-requests-per-tenant queue fills under load and rejects.

5. Under-provisioned queriers

Too few querier replicas for the query volume; the fix is horizontal capacity, not a bigger queue.

Diagnostic Workflow

Step 1: Confirm the source in the Grafana log

sudo journalctl -u grafana-server --no-pager | grep -i "outstanding" | tail -20
kubectl logs deploy/grafana -n monitoring | grep -i "outstanding" | tail -20
grep -i "too many outstanding" /var/log/grafana/grafana.log | tail -20

Step 2: Look at Prometheus/query-frontend logs and flags

kubectl logs sts/prometheus -n monitoring | grep -iE "concurrency|outstanding|queue" | tail -20
# Current concurrency flag
ps aux | grep -oE '\-\-query.max-concurrency=[0-9]+'

For Thanos/Cortex/Mimir, check the frontend/querier logs for too many outstanding requests per tenant.

Step 3: Find the heavy queries holding slots

# Slowest queries by engine time (Prometheus exposes these)
topk(10, prometheus_engine_query_duration_seconds{slice="inner_eval"})

# Concurrency in use vs. limit
prometheus_engine_queries
prometheus_engine_queries_concurrent_max

Step 4: Reduce concurrent load from Grafana

Lengthen dashboard auto-refresh (e.g. 30s → 5m) or disable it on heavy boards.
Split monster dashboards into smaller ones so a single refresh fires fewer queries.
Replace expensive per-panel PromQL with recording rules.

Step 5: Raise limits or add capacity (deliberately)

# Prometheus flag (raise concurrency if CPU headroom exists)
# --query.max-concurrency=40

# Cortex/Mimir query-frontend
# -querier.max-outstanding-requests-per-tenant=200

Raising the queue without CPU/querier headroom just moves the rejection to a timeout. Add querier replicas when the backend is genuinely saturated.

Example Root Cause Analysis

An ops team reports panels intermittently showing “too many outstanding requests” every 10 seconds — exactly the dashboard’s auto-refresh interval. The dashboard has 40 panels, each running a histogram_quantile over per-endpoint buckets.

Checking Prometheus, prometheus_engine_queries sits pinned at the max-concurrency of 20 during each refresh burst, and query durations show the p95 panels taking 2–4s each. Forty heavy queries firing at once overwhelm 20 slots, so a dozen get rejected.

Fix: move the percentile math into recording rules so each panel reads one precomputed series, change auto-refresh from 10s to 1m, and split the board into two. Concurrency usage drops well under the limit and the rejections stop. Root cause: a heavy dashboard firing more concurrent expensive queries than the engine allows — a load problem, not a bug.

Prevention Best Practices

Precompute expensive aggregations with recording rules so panels read cheap single series; see more Grafana guides.
Set conservative dashboard auto-refresh intervals; avoid 5–10s refresh on large boards.
Keep dashboards focused — fewer panels per board means fewer simultaneous queries per refresh.
Size --query.max-concurrency (and Thanos/Cortex/Mimir queues) to available CPU, and scale querier replicas horizontally under real load.
Monitor prometheus_engine_queries vs. prometheus_engine_queries_concurrent_max to see saturation before users do.

Quick Command Reference

# Grafana-side confirmation
sudo journalctl -u grafana-server | grep -i "outstanding" | tail -20
kubectl logs deploy/grafana -n monitoring | grep -i "outstanding" | tail -20

# Backend logs and concurrency flag
kubectl logs sts/prometheus -n monitoring | grep -iE "concurrency|outstanding|queue" | tail -20
ps aux | grep -oE '\-\-query.max-concurrency=[0-9]+'

# Saturation metrics (PromQL)
# prometheus_engine_queries
# prometheus_engine_queries_concurrent_max
# topk(10, prometheus_engine_query_duration_seconds)

Conclusion

“too many outstanding requests” is intentional backpressure: the query engine is shedding load it cannot service concurrently. Treat it as a capacity/query-shape problem:

Confirm it is the backend rejecting (not a 504 timeout) from the logs.
Measure concurrency saturation and find the heavy queries holding slots.
Reduce concurrent load from Grafana first — recording rules, longer refresh, smaller boards.
Only then raise concurrency/queue limits, and add querier capacity when genuinely saturated.

Fixing the query shape and refresh cadence removes the pressure; raising limits alone just relocates it.

Grafana Error Guide: Prometheus 'too many outstanding requests' — Fix Query Concurrency Limits