Grafana Error Guide: 'connect: connection refused'

Overview

A panel or the datasource “Save & test” button fails with a connection error. It looks like the browser cannot reach the datasource, but it almost never is — Grafana proxies datasource queries server-side:

dial tcp 10.0.0.5:9090: connect: connection refused

The full proxied message often includes the target URL:

{"message":"Get \"http://prometheus:9090/api/v1/query?query=up\": dial tcp: connect: connection refused"}

The key insight: unless a datasource is configured with Browser access mode (rare and deprecated for most types), the request originates from the Grafana server or pod, not the user’s laptop. So connection refused means the Grafana process could not open a TCP connection to the datasource address. The usual culprits are a wrong URL (especially localhost inside a container), the datasource being down, the wrong port, Kubernetes service DNS, or a NetworkPolicy/firewall blocking egress. Related errors like Bad Gateway and context deadline exceeded come from the same proxy layer.

Symptoms

“Save & test” on the datasource page reports connection refused or Bad Gateway.
Every panel on that datasource shows the same error; other datasources work fine.
The error names an IP/host and port: dial tcp 10.0.0.5:9090.
context deadline exceeded appears when the host resolves but never answers (firewall/NetworkPolicy).
It works from your laptop’s browser but fails in Grafana — proof it is server-side.

Common Root Causes

1. `localhost` in the datasource URL inside a container/pod

http://localhost:9090 means “localhost of the Grafana container”, not the host or another pod. This is the number-one container pitfall.

# WRONG inside k8s: localhost is the grafana pod itself
apiVersion: 1
datasources:
  - name: Prometheus
    type: prometheus
    access: proxy
    url: http://localhost:9090      # unreachable from grafana pod

Get "http://localhost:9090/api/v1/query": dial tcp 127.0.0.1:9090: connect: connection refused

Use the Kubernetes Service DNS name instead: http://prometheus-server.monitoring.svc.cluster.local:80.

2. Datasource down or wrong port

The target is up on a different port than configured, or the pod is crash-looping.

datasources:
  - name: Prometheus
    type: prometheus
    access: proxy
    url: http://prometheus-server.monitoring.svc:9090   # service listens on 80

dial tcp 10.96.4.12:9090: connect: connection refused

The Service port was 80 targeting container port 9090; Grafana must dial the Service port, not the container port.

3. NetworkPolicy or firewall blocking egress

DNS resolves, but the packet is dropped. This tends to surface as context deadline exceeded rather than an immediate refusal.

# default-deny NetworkPolicy with no egress allow for grafana
kind: NetworkPolicy
spec:
  podSelector: { matchLabels: { app: grafana } }
  policyTypes: [Egress]
  egress: []      # nothing allowed out

Post "http://prometheus:9090/api/v1/query": context deadline exceeded

4. Access mode confusion (Server vs Browser)

If someone switched a datasource to Browser access, the browser (not Grafana) must reach the URL — an internal service name your laptop cannot resolve.

Get "http://prometheus.monitoring.svc:9090/...": dial tcp: lookup prometheus.monitoring.svc: no such host

Set access: proxy (Server) so Grafana handles the request from inside the cluster.

Diagnostic Workflow

Step 1 — Read the message: which host/port and which error? connection refused = nothing listening there; no such host = DNS; context deadline exceeded = blocked/slow.

Step 2 — Test from the Grafana host itself, not your laptop. This is the whole ballgame:

kubectl exec deploy/grafana -n monitoring -- \
  wget -qO- http://prometheus-server.monitoring.svc:80/-/healthy
kubectl exec deploy/grafana -n monitoring -- \
  nc -vz prometheus-server.monitoring.svc 80

Step 3 — Inspect the provisioned datasource URL:

kubectl exec deploy/grafana -n monitoring -- \
  cat /etc/grafana/provisioning/datasources/datasources.yaml | grep -A2 url

Step 4 — Query the Grafana API for the datasource and run its health check:

curl -s -u admin:$GRAFANA_PW http://localhost:3000/api/datasources | jq '.[].url'
curl -s -u admin:$GRAFANA_PW http://localhost:3000/api/datasources/uid/<uid>/health | jq

Step 5 — Check Grafana logs for the proxy error and confirm dataproxy timeouts:

journalctl -u grafana-server -n 200 --no-pager | grep -i "connection refused"
kubectl logs deploy/grafana -n monitoring | grep -i "tsdb.*proxy\|dial tcp"

Relevant grafana.ini:

[dataproxy]
timeout = 30
keep_alive_seconds = 30
dialTimeout = 10

Example Root Cause Analysis

A team migrates Grafana into Kubernetes. Locally it worked with url: http://localhost:9090; in the cluster every Prometheus panel fails with dial tcp 127.0.0.1:9090: connect: connection refused.

They first assume Prometheus is down, but kubectl get pods shows it healthy and /-/healthy returns 200 when curled from a Prometheus pod. The clue is the IP in the error: 127.0.0.1. That is loopback inside the Grafana pod — there is no Prometheus there.

They exec into the Grafana pod and confirm the fix path:

kubectl exec deploy/grafana -n monitoring -- nc -vz prometheus-server.monitoring.svc 80
# succeeds

They update the provisioned datasource URL to http://prometheus-server.monitoring.svc:80, restart Grafana, and “Save & test” turns green. Root cause: localhost resolves to the Grafana container itself, not the Prometheus service. Always test connectivity from the Grafana host and treat the IP/hostname in the error as the primary clue.

Prevention Best Practices

Never use localhost in a datasource URL inside containers — use the Service FQDN and the Service port.
Keep datasources on access: proxy (Server) unless you have a specific reason for Browser mode.
Add explicit egress rules for the Grafana pod when running default-deny NetworkPolicies.
Validate provisioned datasources with the /api/datasources/uid/<uid>/health endpoint in CI.
Tune [dataproxy] timeout and dialTimeout so genuine slow backends fail fast and legibly.
Standardize URLs and ports in your provisioning repo — cross-reference the Grafana guides.

Quick Command Reference

# Test connectivity FROM the Grafana pod (the request's real origin)
kubectl exec deploy/grafana -n monitoring -- nc -vz prometheus-server.monitoring.svc 80
kubectl exec deploy/grafana -n monitoring -- wget -qO- http://prometheus-server.monitoring.svc:80/-/healthy

# What URL is Grafana actually using?
curl -s -u admin:$GRAFANA_PW http://localhost:3000/api/datasources | jq '.[].url'

# Run the datasource health check
curl -s -u admin:$GRAFANA_PW http://localhost:3000/api/datasources/uid/<uid>/health | jq

# Grafana-side proxy errors
journalctl -u grafana-server -n 200 --no-pager | grep -i "connection refused"
kubectl logs deploy/grafana -n monitoring | grep -i "dial tcp"

Conclusion

Top root causes, in order of likelihood:

localhost in the datasource URL inside a container/pod — it points at Grafana itself; use the Service DNS name.
Datasource down or wrong port — dial the Service port, and confirm the backend pod is healthy.
NetworkPolicy or firewall blocking egress — usually shows as context deadline exceeded; add an egress allow.
Access mode set to Browser instead of Server (proxy) — internal names the browser cannot resolve; switch to access: proxy.
Remember the request is server-side — always test connectivity from the Grafana host, not your laptop.

Grafana Error Guide: 'connect: connection refused' — datasource proxy backend unreachable

Overview

Symptoms

Common Root Causes

1. `localhost` in the datasource URL inside a container/pod

2. Datasource down or wrong port

3. NetworkPolicy or firewall blocking egress

4. Access mode confusion (Server vs Browser)

Diagnostic Workflow

Example Root Cause Analysis

Prevention Best Practices

Quick Command Reference

Conclusion

Download the Free 500-Prompt DevOps AI Toolkit

Overview

Symptoms

Common Root Causes

1. localhost in the datasource URL inside a container/pod

2. Datasource down or wrong port

3. NetworkPolicy or firewall blocking egress

4. Access mode confusion (Server vs Browser)

Diagnostic Workflow

Example Root Cause Analysis

Prevention Best Practices

Quick Command Reference

Conclusion

Download the Free 500-Prompt DevOps AI Toolkit

1. `localhost` in the datasource URL inside a container/pod