Skip to content
DevOps AI ToolKit
Newsletter
All guides
AI for Grafana By James Joyner IV · · 8 min read

Grafana Error Guide: Prometheus 'too many outstanding requests' — Fix Query Concurrency Limits

Fix Prometheus 'too many outstanding requests' in Grafana: diagnose query concurrency limits, heavy dashboards, query sharding queues, and Thanos/Cortex frontend backpressure.

  • #grafana
  • #troubleshooting
  • #errors
  • #prometheus

Overview

“too many outstanding requests” is backpressure from the Prometheus query engine (or a Thanos/Cortex/Mimir query frontend). Prometheus limits how many queries execute concurrently (--query.max-concurrency, default 20) and queues the rest. When the queue is also full, new queries are rejected outright — and Grafana surfaces that rejection on the panel.

The literal error returned to Grafana:

too many outstanding requests
{"status":"error","errorType":"unavailable","error":"too many outstanding requests"}

This is not a timeout (504) and not a crash — it is the backend deliberately shedding load. A dashboard full of heavy panels refreshing at once, or several users hitting the same instance, can trip it.

Symptoms

  • Multiple panels flash “too many outstanding requests”, often intermittently on refresh.
  • The error correlates with dashboard auto-refresh intervals or many concurrent viewers.
  • Lightweight queries succeed while heavy ones are rejected.
  • With Thanos/Cortex/Mimir, the query-frontend rejects when its inflight queue fills.

Common Root Causes

1. Query concurrency saturated

--query.max-concurrency (default 20) is fully in use by slow queries, so additional queries queue and then get rejected.

2. Dashboards issue too many queries at once

A dashboard with dozens of panels and a short auto-refresh fires all queries simultaneously each cycle.

3. Expensive queries hold slots too long

High-cardinality rate()/histogram_quantile() over long ranges occupy concurrency slots for seconds each, starving everything else.

4. Thanos/Cortex/Mimir frontend queue full

The query-frontend’s -querier.max-outstanding-requests-per-tenant queue fills under load and rejects.

5. Under-provisioned queriers

Too few querier replicas for the query volume; the fix is horizontal capacity, not a bigger queue.

Diagnostic Workflow

Step 1: Confirm the source in the Grafana log

sudo journalctl -u grafana-server --no-pager | grep -i "outstanding" | tail -20
kubectl logs deploy/grafana -n monitoring | grep -i "outstanding" | tail -20
grep -i "too many outstanding" /var/log/grafana/grafana.log | tail -20

Step 2: Look at Prometheus/query-frontend logs and flags

kubectl logs sts/prometheus -n monitoring | grep -iE "concurrency|outstanding|queue" | tail -20
# Current concurrency flag
ps aux | grep -oE '\-\-query.max-concurrency=[0-9]+'

For Thanos/Cortex/Mimir, check the frontend/querier logs for too many outstanding requests per tenant.

Step 3: Find the heavy queries holding slots

# Slowest queries by engine time (Prometheus exposes these)
topk(10, prometheus_engine_query_duration_seconds{slice="inner_eval"})
# Concurrency in use vs. limit
prometheus_engine_queries
prometheus_engine_queries_concurrent_max

Step 4: Reduce concurrent load from Grafana

  • Lengthen dashboard auto-refresh (e.g. 30s → 5m) or disable it on heavy boards.
  • Split monster dashboards into smaller ones so a single refresh fires fewer queries.
  • Replace expensive per-panel PromQL with recording rules.

Step 5: Raise limits or add capacity (deliberately)

# Prometheus flag (raise concurrency if CPU headroom exists)
# --query.max-concurrency=40

# Cortex/Mimir query-frontend
# -querier.max-outstanding-requests-per-tenant=200

Raising the queue without CPU/querier headroom just moves the rejection to a timeout. Add querier replicas when the backend is genuinely saturated.

Example Root Cause Analysis

An ops team reports panels intermittently showing “too many outstanding requests” every 10 seconds — exactly the dashboard’s auto-refresh interval. The dashboard has 40 panels, each running a histogram_quantile over per-endpoint buckets.

Checking Prometheus, prometheus_engine_queries sits pinned at the max-concurrency of 20 during each refresh burst, and query durations show the p95 panels taking 2–4s each. Forty heavy queries firing at once overwhelm 20 slots, so a dozen get rejected.

Fix: move the percentile math into recording rules so each panel reads one precomputed series, change auto-refresh from 10s to 1m, and split the board into two. Concurrency usage drops well under the limit and the rejections stop. Root cause: a heavy dashboard firing more concurrent expensive queries than the engine allows — a load problem, not a bug.

Prevention Best Practices

  • Precompute expensive aggregations with recording rules so panels read cheap single series; see more Grafana guides.
  • Set conservative dashboard auto-refresh intervals; avoid 5–10s refresh on large boards.
  • Keep dashboards focused — fewer panels per board means fewer simultaneous queries per refresh.
  • Size --query.max-concurrency (and Thanos/Cortex/Mimir queues) to available CPU, and scale querier replicas horizontally under real load.
  • Monitor prometheus_engine_queries vs. prometheus_engine_queries_concurrent_max to see saturation before users do.

Quick Command Reference

# Grafana-side confirmation
sudo journalctl -u grafana-server | grep -i "outstanding" | tail -20
kubectl logs deploy/grafana -n monitoring | grep -i "outstanding" | tail -20

# Backend logs and concurrency flag
kubectl logs sts/prometheus -n monitoring | grep -iE "concurrency|outstanding|queue" | tail -20
ps aux | grep -oE '\-\-query.max-concurrency=[0-9]+'

# Saturation metrics (PromQL)
# prometheus_engine_queries
# prometheus_engine_queries_concurrent_max
# topk(10, prometheus_engine_query_duration_seconds)

Conclusion

“too many outstanding requests” is intentional backpressure: the query engine is shedding load it cannot service concurrently. Treat it as a capacity/query-shape problem:

  1. Confirm it is the backend rejecting (not a 504 timeout) from the logs.
  2. Measure concurrency saturation and find the heavy queries holding slots.
  3. Reduce concurrent load from Grafana first — recording rules, longer refresh, smaller boards.
  4. Only then raise concurrency/queue limits, and add querier capacity when genuinely saturated.

Fixing the query shape and refresh cadence removes the pressure; raising limits alone just relocates it.

Free download · 368-page PDF

Download the Free 500-Prompt DevOps AI Toolkit

500 battle-tested, copy-paste AI prompts engineered by a senior systems engineer — every one with fill-in placeholders and safety/back-out notes. Drop your email and it's yours.

  • 500 prompts: Linux · Kubernetes · Terraform · OpenStack · GitLab · Docker · Monitoring · Incident Response
  • Instant PDF download — yours free, forever
  • Plus one practical AI-workflow email a week (no spam)

Single opt-in · unsubscribe anytime · no spam.