Grafana Error Guide: '502 Bad Gateway' from the Datasource

Overview

A 502 from the Grafana datasource proxy means Grafana tried to reach the backend and the connection failed or returned a bad response — the backend was refused, unresolved, or spoke an unexpected protocol. Unlike a 504 (backend too slow) or “context deadline exceeded” (timed out), a 502 means Grafana never got a usable answer at all.

The literal errors you will see:

502 Bad Gateway

{"message":"Get \"http://prometheus:9090/api/v1/query\": dial tcp 10.0.0.9:9090: connect: connection refused"}

level=error logger=data-proxy-log msg="Proxy request failed" error="dial tcp: lookup prometheus on 10.96.0.10:53: no such host" status=502

The embedded Go error (connection refused, no such host, x509: certificate signed by unknown authority) tells you exactly which layer failed.

Symptoms

Every panel on a datasource shows “502 Bad Gateway” immediately.
“Save & test” on the datasource fails with a dial/DNS/TLS error.
The datasource worked until a URL, port, DNS, or certificate change.
Other datasources on the same Grafana work fine.

Common Root Causes

1. Wrong datasource URL or port

The configured URL points at a stopped service, wrong port, or localhost inside a container that has no local backend.

2. Backend is down

Prometheus/Loki/the SQL server is not running, so the connection is refused.

3. DNS resolution failure

The datasource hostname does not resolve from Grafana’s network (missing service, wrong namespace FQDN in Kubernetes).

4. TLS handshake failure

https:// to a backend with a self-signed or unknown CA, without tlsSkipVerify or a mounted CA, fails the handshake.

5. Reverse proxy in front of the backend returns 502

An Nginx/Ingress in front of Prometheus is itself returning 502 because its upstream is down.

Diagnostic Workflow

Step 1: Read the embedded dial error

sudo journalctl -u grafana-server --no-pager | grep -iE "data-proxy|dial|502|x509|no such host" | tail -20
kubectl logs deploy/grafana -n monitoring | grep -iE "proxy|dial|502|x509" | tail -20
grep -iE "data-proxy|dial tcp|502" /var/log/grafana/grafana.log | tail -20

connection refused → backend down/wrong port; no such host → DNS; x509 → TLS/CA.

Step 2: Confirm the datasource URL

curl -s -H "Authorization: Bearer $GRAFANA_TOKEN" \
  http://localhost:3000/api/datasources | jq '.[] | {name, url, type, access}'

access must be proxy for server-side reachability; the url must be resolvable from Grafana, not from your laptop.

Step 3: Test reachability from Grafana’s own network

# From the Grafana host/pod
curl -sv http://prometheus:9090/-/healthy
nslookup prometheus
nc -vz prometheus 9090
# In Kubernetes, exec into the Grafana pod
kubectl exec -it deploy/grafana -n monitoring -- wget -qO- http://prometheus:9090/-/healthy

Reproduce the exact failure (refused / no such host) from where Grafana runs.

Step 4: Fix TLS if the error is x509

# /etc/grafana/provisioning/datasources/prometheus.yaml
datasources:
  - name: Prometheus
    type: prometheus
    access: proxy
    url: https://prometheus.monitoring.svc:9090
    jsonData:
      tlsSkipVerify: false
      tlsAuthWithCACert: true
    secureJsonData:
      tlsCACert: |
        -----BEGIN CERTIFICATE-----
        ...
        -----END CERTIFICATE-----

Use a proper CA over tlsSkipVerify: true in production.

Example Root Cause Analysis

After moving Prometheus to a new namespace, all Grafana panels 502. The log:

logger=data-proxy-log msg="Proxy request failed" error="dial tcp: lookup prometheus on 10.96.0.10:53: no such host" status=502

no such host is a DNS failure. The datasource URL was http://prometheus:9090, which only resolved when Grafana and Prometheus shared a namespace. After the move, Grafana in monitoring cannot resolve the short name in the new metrics namespace.

Fix: set the datasource URL to the fully-qualified service DNS http://prometheus.metrics.svc.cluster.local:9090. Panels recover immediately. Root cause: a short Kubernetes service name that stopped resolving across namespaces — a DNS problem exposed by the migration, not a Grafana bug.

Prevention Best Practices

Always use access: proxy and fully-qualified backend URLs (svc.cluster.local in Kubernetes) so resolution does not depend on namespace context; see more Grafana guides.
Provision datasources from files so URL/TLS changes are reviewed and versioned.
Mount the backend’s CA and avoid tlsSkipVerify: true in production.
Add a health check / synthetic on the datasource URL so a down backend pages before dashboards 502.
After any network, namespace, or certificate change, re-run “Save & test” on affected datasources.

Quick Command Reference

# Embedded dial/DNS/TLS error
sudo journalctl -u grafana-server | grep -iE "data-proxy|dial|x509|no such host" | tail -20
kubectl logs deploy/grafana -n monitoring | grep -iE "proxy|dial|x509" | tail -20

# What URL is configured?
curl -s -H "Authorization: Bearer $GRAFANA_TOKEN" \
  http://localhost:3000/api/datasources | jq '.[] | {name, url, access}'

# Reachability from Grafana's network
kubectl exec -it deploy/grafana -n monitoring -- wget -qO- http://prometheus:9090/-/healthy
nslookup prometheus ; nc -vz prometheus 9090

Conclusion

A datasource-proxy 502 means Grafana reached out and got nothing usable back. The embedded Go error is the whole diagnosis:

Read connection refused vs. no such host vs. x509 from the proxy log.
Confirm the datasource url and access: proxy.
Reproduce reachability from Grafana’s own host/pod, not your workstation.
Fix the matching layer — start the backend, use FQDN DNS, or mount the CA.

Because the failure is at connect time, the fix is almost always a URL, DNS, or certificate correction rather than anything inside Grafana.

Grafana Error Guide: '502 Bad Gateway' from the Datasource Proxy — Fix Unreachable Backends