Grafana Error Guide: 'context deadline exceeded' on Datasource Queries — Fix Query Timeouts
Fix Grafana 'context deadline exceeded': diagnose datasource query timeouts, slow backends, short dataproxy/query timeouts, high-cardinality PromQL, and network latency.
- #grafana
- #troubleshooting
- #errors
- #timeout
Overview
“context deadline exceeded” is Go’s way of saying an operation ran past its allotted time and its context was cancelled. In Grafana it appears when a datasource query — Prometheus, Loki, SQL, or any HTTP backend — does not complete before Grafana’s (or the backend’s) timeout fires.
The literal errors you will see on a panel or in the log:
context deadline exceeded
Post "http://prometheus:9090/api/v1/query_range": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
level=error logger=tsdb.prometheus msg="Query error" error="context deadline exceeded"
It is closely related to a datasource-proxy 504, but you will also see it inside SQL/Loki datasource errors and alerting evaluations. The root question is always the same: what took too long, and which timeout fired first?
Symptoms
- Panels fail with “context deadline exceeded”, often only on heavy queries or long ranges.
- Alert rules go to
Errorstate with the same message during evaluation. - The failure lands at a consistent boundary (30s, 60s) matching a configured timeout.
- Light queries on the same datasource succeed.
Common Root Causes
1. Backend query slower than the datasource/proxy timeout
The Prometheus/Loki/SQL query genuinely takes longer than [dataproxy] timeout (default 30s) or the datasource’s own timeout setting.
2. Short per-datasource timeout
The Prometheus datasource’s “Scrape interval”/“Query timeout” or a SQL “Max connection lifetime”/timeout is set too low.
3. Alerting evaluation timeout
Unified alerting evaluates rules with its own timeout; a slow rule query trips context deadline exceeded during evaluation.
4. High-cardinality / long-range query
A rate() or histogram_quantile() over weeks of per-series data is simply expensive.
5. Network latency or an overloaded backend
A congested link or a CPU-starved Prometheus inflates round-trip time past the deadline.
Diagnostic Workflow
Step 1: Locate the error and which subsystem raised it
sudo journalctl -u grafana-server --no-pager | grep -i "deadline exceeded" | tail -20
kubectl logs deploy/grafana -n monitoring | grep -i "deadline exceeded" | tail -20
grep -i "deadline exceeded" /var/log/grafana/grafana.log | tail -20
logger=tsdb.prometheus → panel query; logger=ngalert / alerting → alert evaluation.
Step 2: Time the query against the backend directly
time curl -s -G "http://prometheus:9090/api/v1/query_range" \
--data-urlencode 'query=histogram_quantile(0.95, sum by (le) (rate(http_request_duration_seconds_bucket[5m])))' \
--data-urlencode "start=$(date -d '-24 hours' +%s)" \
--data-urlencode "end=$(date +%s)" \
--data-urlencode 'step=60' > /dev/null
The wall-clock time tells you whether the query, or a too-short timeout, is at fault.
Step 3: Check the effective timeouts
# grafana.ini
[dataproxy]
timeout = 60
[unified_alerting]
evaluation_timeout = 30s
Also check the datasource’s own timeout in its provisioning jsonData:
# /etc/grafana/provisioning/datasources/prometheus.yaml
datasources:
- name: Prometheus
type: prometheus
url: http://prometheus:9090
jsonData:
timeInterval: "30s"
queryTimeout: "60s"
httpMethod: POST
Step 4: Reduce cost or scale the backend
Prefer recording rules and a larger step before raising timeouts; a starved Prometheus needs CPU/memory, not a longer deadline.
Example Root Cause Analysis
An alert rule flips to Error every evaluation with:
logger=ngalert.eval msg="Failed to evaluate rule" error="context deadline exceeded"
The rule runs sum(rate(container_cpu_usage_seconds_total[5m])) by (pod) across a very large cluster. Timing it directly shows ~35s. The unified alerting evaluation_timeout is the default 30s, so evaluation is cancelled before the query returns.
Fix: add a recording rule pod:container_cpu_usage:rate5m and point the alert at that precomputed series (sub-second), keeping evaluation_timeout at a sane 30s. The alert evaluates cleanly. Root cause: an expensive ad-hoc query in an alert rule exceeding the evaluation deadline — solved by precomputation, not by loosening the timeout.
Prevention Best Practices
- Back expensive panel and alert queries with recording rules so evaluations and renders read cheap series; see more Grafana guides.
- Set
[dataproxy] timeout, datasourcequeryTimeout, and[unified_alerting] evaluation_timeoutconsistently and above real query times. - Keep alert-rule queries lightweight; alerts should read precomputed series, not compute heavy aggregations at evaluation time.
- Right-size Prometheus/Loki so queries are not slow under normal load.
- Use a sensible
step/Max data pointsso the backend is not asked for more points than the panel can show.
Quick Command Reference
# Find the error and which subsystem raised it
sudo journalctl -u grafana-server | grep -i "deadline exceeded" | tail -20
kubectl logs deploy/grafana -n monitoring | grep -i "deadline exceeded" | tail -20
# Time the query directly against the backend
time curl -s -G "http://prometheus:9090/api/v1/query_range" \
--data-urlencode 'query=<interpolated query>' \
--data-urlencode "start=$(date -d '-24 hours' +%s)" \
--data-urlencode "end=$(date +%s)" \
--data-urlencode 'step=60' > /dev/null
# Effective timeouts (grafana.ini)
# [dataproxy] timeout = 60
# [unified_alerting] evaluation_timeout = 30s
Conclusion
“context deadline exceeded” means a query outran its deadline — the fix depends on which deadline and why:
- Read the log to see whether a panel query (
tsdb.*) or an alert evaluation (ngalert) timed out. - Time the query directly against the backend to separate “slow query” from “short timeout”.
- Prefer recording rules and larger steps over simply raising timeouts.
- Align
[dataproxy] timeout, datasourcequeryTimeout, andevaluation_timeout, and give the backend enough CPU.
If the query is genuinely fast and only the timeout is short, raise it; otherwise make the query cheaper — a longer deadline only delays the failure.
Download the Free 500-Prompt DevOps AI Toolkit
500 battle-tested, copy-paste AI prompts engineered by a senior systems engineer — every one with fill-in placeholders and safety/back-out notes. Drop your email and it's yours.
- 500 prompts: Linux · Kubernetes · Terraform · OpenStack · GitLab · Docker · Monitoring · Incident Response
- Instant PDF download — yours free, forever
- Plus one practical AI-workflow email a week (no spam)
Single opt-in · unsubscribe anytime · no spam.