Grafana Error Guide: '502 Bad Gateway' from the Datasource Proxy — Fix Unreachable Backends
Fix Grafana datasource proxy 502 Bad Gateway: diagnose unreachable backend, wrong datasource URL, TLS handshake failures, DNS errors, and connection-refused issues.
- #grafana
- #troubleshooting
- #errors
- #datasource
Overview
A 502 from the Grafana datasource proxy means Grafana tried to reach the backend and the connection failed or returned a bad response — the backend was refused, unresolved, or spoke an unexpected protocol. Unlike a 504 (backend too slow) or “context deadline exceeded” (timed out), a 502 means Grafana never got a usable answer at all.
The literal errors you will see:
502 Bad Gateway
{"message":"Get \"http://prometheus:9090/api/v1/query\": dial tcp 10.0.0.9:9090: connect: connection refused"}
level=error logger=data-proxy-log msg="Proxy request failed" error="dial tcp: lookup prometheus on 10.96.0.10:53: no such host" status=502
The embedded Go error (connection refused, no such host, x509: certificate signed by unknown authority) tells you exactly which layer failed.
Symptoms
- Every panel on a datasource shows “502 Bad Gateway” immediately.
- “Save & test” on the datasource fails with a dial/DNS/TLS error.
- The datasource worked until a URL, port, DNS, or certificate change.
- Other datasources on the same Grafana work fine.
Common Root Causes
1. Wrong datasource URL or port
The configured URL points at a stopped service, wrong port, or localhost inside a container that has no local backend.
2. Backend is down
Prometheus/Loki/the SQL server is not running, so the connection is refused.
3. DNS resolution failure
The datasource hostname does not resolve from Grafana’s network (missing service, wrong namespace FQDN in Kubernetes).
4. TLS handshake failure
https:// to a backend with a self-signed or unknown CA, without tlsSkipVerify or a mounted CA, fails the handshake.
5. Reverse proxy in front of the backend returns 502
An Nginx/Ingress in front of Prometheus is itself returning 502 because its upstream is down.
Diagnostic Workflow
Step 1: Read the embedded dial error
sudo journalctl -u grafana-server --no-pager | grep -iE "data-proxy|dial|502|x509|no such host" | tail -20
kubectl logs deploy/grafana -n monitoring | grep -iE "proxy|dial|502|x509" | tail -20
grep -iE "data-proxy|dial tcp|502" /var/log/grafana/grafana.log | tail -20
connection refused → backend down/wrong port; no such host → DNS; x509 → TLS/CA.
Step 2: Confirm the datasource URL
curl -s -H "Authorization: Bearer $GRAFANA_TOKEN" \
http://localhost:3000/api/datasources | jq '.[] | {name, url, type, access}'
access must be proxy for server-side reachability; the url must be resolvable from Grafana, not from your laptop.
Step 3: Test reachability from Grafana’s own network
# From the Grafana host/pod
curl -sv http://prometheus:9090/-/healthy
nslookup prometheus
nc -vz prometheus 9090
# In Kubernetes, exec into the Grafana pod
kubectl exec -it deploy/grafana -n monitoring -- wget -qO- http://prometheus:9090/-/healthy
Reproduce the exact failure (refused / no such host) from where Grafana runs.
Step 4: Fix TLS if the error is x509
# /etc/grafana/provisioning/datasources/prometheus.yaml
datasources:
- name: Prometheus
type: prometheus
access: proxy
url: https://prometheus.monitoring.svc:9090
jsonData:
tlsSkipVerify: false
tlsAuthWithCACert: true
secureJsonData:
tlsCACert: |
-----BEGIN CERTIFICATE-----
...
-----END CERTIFICATE-----
Use a proper CA over tlsSkipVerify: true in production.
Example Root Cause Analysis
After moving Prometheus to a new namespace, all Grafana panels 502. The log:
logger=data-proxy-log msg="Proxy request failed" error="dial tcp: lookup prometheus on 10.96.0.10:53: no such host" status=502
no such host is a DNS failure. The datasource URL was http://prometheus:9090, which only resolved when Grafana and Prometheus shared a namespace. After the move, Grafana in monitoring cannot resolve the short name in the new metrics namespace.
Fix: set the datasource URL to the fully-qualified service DNS http://prometheus.metrics.svc.cluster.local:9090. Panels recover immediately. Root cause: a short Kubernetes service name that stopped resolving across namespaces — a DNS problem exposed by the migration, not a Grafana bug.
Prevention Best Practices
- Always use
access: proxyand fully-qualified backend URLs (svc.cluster.localin Kubernetes) so resolution does not depend on namespace context; see more Grafana guides. - Provision datasources from files so URL/TLS changes are reviewed and versioned.
- Mount the backend’s CA and avoid
tlsSkipVerify: truein production. - Add a health check / synthetic on the datasource URL so a down backend pages before dashboards 502.
- After any network, namespace, or certificate change, re-run “Save & test” on affected datasources.
Quick Command Reference
# Embedded dial/DNS/TLS error
sudo journalctl -u grafana-server | grep -iE "data-proxy|dial|x509|no such host" | tail -20
kubectl logs deploy/grafana -n monitoring | grep -iE "proxy|dial|x509" | tail -20
# What URL is configured?
curl -s -H "Authorization: Bearer $GRAFANA_TOKEN" \
http://localhost:3000/api/datasources | jq '.[] | {name, url, access}'
# Reachability from Grafana's network
kubectl exec -it deploy/grafana -n monitoring -- wget -qO- http://prometheus:9090/-/healthy
nslookup prometheus ; nc -vz prometheus 9090
Conclusion
A datasource-proxy 502 means Grafana reached out and got nothing usable back. The embedded Go error is the whole diagnosis:
- Read
connection refusedvs.no such hostvs.x509from the proxy log. - Confirm the datasource
urlandaccess: proxy. - Reproduce reachability from Grafana’s own host/pod, not your workstation.
- Fix the matching layer — start the backend, use FQDN DNS, or mount the CA.
Because the failure is at connect time, the fix is almost always a URL, DNS, or certificate correction rather than anything inside Grafana.
Download the Free 500-Prompt DevOps AI Toolkit
500 battle-tested, copy-paste AI prompts engineered by a senior systems engineer — every one with fill-in placeholders and safety/back-out notes. Drop your email and it's yours.
- 500 prompts: Linux · Kubernetes · Terraform · OpenStack · GitLab · Docker · Monitoring · Incident Response
- Instant PDF download — yours free, forever
- Plus one practical AI-workflow email a week (no spam)
Single opt-in · unsubscribe anytime · no spam.