Grafana Error Guide: '504 Gateway Timeout' from the

Overview

When a panel queries a datasource, Grafana proxies the request server-side through its datasource proxy (/api/datasources/proxy/...). If the backend does not answer within the proxy’s timeout — or an upstream reverse proxy times out first — Grafana surfaces a 504.

The literal errors you will see on the panel or in the network response:

504 Gateway Timeout

{"message":"Get \"http://prometheus:9090/api/v1/query_range\": context deadline exceeded"}

level=error logger=data-proxy-log msg="Proxy request failed" error="context deadline exceeded" status=504

The key distinction: a 504 means the request reached the backend but the backend (or an intermediary) took too long, not that it was refused (that would be a 502) or unauthorized.

Symptoms

Heavy panels (long time range, high cardinality) show “504 Gateway Timeout”.
Lightweight panels on the same datasource load fine.
The failure is consistent at ~30s or ~60s — a timeout boundary.
Behind Nginx/Ingress, the 504 comes from the proxy layer, not Grafana itself.

Common Root Causes

1. Slow backend query exceeding `[dataproxy] timeout`

Grafana’s datasource proxy defaults to a 30-second timeout. A costly rate()/histogram_quantile() over weeks of high-cardinality data blows past it.

2. Upstream reverse proxy read timeout

Nginx (proxy_read_timeout), a Kubernetes Ingress, or a cloud load balancer times out before Grafana does, so the 504 originates upstream.

3. Backend under-resourced

Prometheus/Loki is CPU- or memory-starved and cannot service the query in time.

4. Time range / step too large

A query_range with a tiny step over a huge range forces the backend to compute an enormous number of points.

5. Network path latency

A slow or congested link between Grafana and the datasource inflates round-trip time past the deadline.

Diagnostic Workflow

Step 1: Confirm where the 504 originates

Check the response Server/Via headers in devtools. Then read Grafana’s proxy log:

sudo journalctl -u grafana-server --no-pager | grep -iE "data-proxy|deadline|504" | tail -20
kubectl logs deploy/grafana -n monitoring | grep -iE "proxy|deadline|504" | tail -20
grep -iE "data-proxy|deadline exceeded" /var/log/grafana/grafana.log | tail -20

If Grafana logs context deadline exceeded, Grafana timed out. If Grafana has no error but the browser shows 504, an upstream proxy timed out.

Step 2: Time the backend query directly

time curl -s -G "http://prometheus:9090/api/v1/query_range" \
  --data-urlencode 'query=histogram_quantile(0.95, sum by (le) (rate(http_request_duration_seconds_bucket[5m])))' \
  --data-urlencode "start=$(date -d '-7 days' +%s)" \
  --data-urlencode "end=$(date +%s)" \
  --data-urlencode 'step=60' > /dev/null

If this alone takes 40s, no Grafana setting saves you — the query is the problem.

Step 3: Raise the Grafana proxy timeout (if the query is legitimately heavy)

# grafana.ini
[dataproxy]
timeout = 120
dialTimeout = 10
keep_alive_seconds = 30

Step 4: Raise the upstream reverse-proxy timeout too

# Kubernetes Ingress (nginx) annotations
nginx.ingress.kubernetes.io/proxy-read-timeout: "120"
nginx.ingress.kubernetes.io/proxy-send-timeout: "120"

# nginx.conf (self-managed)
proxy_read_timeout 120s;
proxy_send_timeout 120s;

Every timeout in the chain must exceed the query time, or the shortest one wins.

Step 5: Make the query cheaper

# Increase step and use a recording rule instead of raw histogram_quantile at query time
job:http_request_duration_seconds:p95

Example Root Cause Analysis

An SLO dashboard 504s only on the “Last 30 days” p95 latency panel; shorter ranges load. Grafana’s log:

logger=data-proxy-log msg="Proxy request failed" error="context deadline exceeded" status=504

Timing the interpolated query_range directly against Prometheus takes ~48 seconds — a histogram_quantile over 30 days of per-endpoint buckets at a 30s step. The default [dataproxy] timeout = 30 cuts it off.

Fix: precompute the percentile with a recording rule (job:http_request_duration_seconds:p95) so the panel reads a single cheap series, and raise [dataproxy] timeout = 120 as a safety margin plus the matching Ingress proxy-read-timeout. The panel now loads in under a second. The root cause was an expensive query, not a misconfiguration — the timeout bump alone would only mask it.

Prevention Best Practices

Use recording rules for expensive aggregations (percentiles, high-cardinality sums) so dashboards read precomputed series; see more Grafana guides.
Align every timeout in the chain (backend → Grafana [dataproxy] timeout → reverse proxy/Ingress → LB) so none is shorter than a legitimate query.
Choose a step/Max data points sane for the range; more points than screen pixels is wasted backend work.
Right-size Prometheus/Loki CPU and memory; a starved backend times out under otherwise normal load.
Set Grafana panel query timeouts and alert on slow queries so you catch drift before users do.
Triage recurring 504s with the free monitoring assistant.

Quick Command Reference

# Where did the 504 come from?
sudo journalctl -u grafana-server | grep -iE "data-proxy|deadline|504" | tail -20
kubectl logs deploy/grafana -n monitoring | grep -iE "proxy|deadline" | tail -20

# Time the backend query directly
time curl -s -G "http://prometheus:9090/api/v1/query_range" \
  --data-urlencode 'query=<interpolated query>' \
  --data-urlencode "start=$(date -d '-7 days' +%s)" \
  --data-urlencode "end=$(date +%s)" \
  --data-urlencode 'step=60' > /dev/null

# Raise timeouts (grafana.ini + Ingress)
# [dataproxy] timeout = 120
# nginx.ingress.kubernetes.io/proxy-read-timeout: "120"

Conclusion

A datasource-proxy 504 means something in the request chain ran out of time. Fix it methodically:

Determine whether Grafana or an upstream proxy timed out (log vs. missing log + browser 504).
Time the interpolated query directly against the backend — that is the ground truth.
If the query is legitimately heavy, align all timeouts and, more importantly, make the query cheaper with recording rules and a sane step.
Right-size the backend so it is not the bottleneck.

Bumping timeouts hides the symptom; recording rules and sane query shapes remove the cause.

Grafana Error Guide: '504 Gateway Timeout' from the Datasource Proxy — Fix Slow Queries

Overview

Symptoms

Common Root Causes

1. Slow backend query exceeding `[dataproxy] timeout`

2. Upstream reverse proxy read timeout

3. Backend under-resourced

4. Time range / step too large

5. Network path latency

Diagnostic Workflow

Step 1: Confirm where the 504 originates

Step 2: Time the backend query directly

Step 3: Raise the Grafana proxy timeout (if the query is legitimately heavy)

Step 4: Raise the upstream reverse-proxy timeout too

Step 5: Make the query cheaper

Example Root Cause Analysis

Prevention Best Practices

Quick Command Reference

Conclusion

Download the Free 500-Prompt DevOps AI Toolkit

Overview

Symptoms

Common Root Causes

1. Slow backend query exceeding [dataproxy] timeout

2. Upstream reverse proxy read timeout

3. Backend under-resourced

4. Time range / step too large

5. Network path latency

Diagnostic Workflow

Step 1: Confirm where the 504 originates

Step 2: Time the backend query directly

Step 3: Raise the Grafana proxy timeout (if the query is legitimately heavy)

Step 4: Raise the upstream reverse-proxy timeout too

Step 5: Make the query cheaper

Example Root Cause Analysis

Prevention Best Practices

Quick Command Reference

Conclusion

Download the Free 500-Prompt DevOps AI Toolkit

1. Slow backend query exceeding `[dataproxy] timeout`