Grafana Error Guide: 'failed to send notification'

Overview

In Grafana unified alerting, a firing alert is routed by a notification policy to a contact point, which delivers the message through a notifier (email, Slack, PagerDuty, webhook, etc.). When the notifier itself fails — bad SMTP settings, an expired Slack webhook, blocked egress — Grafana logs a send failure and the alert never reaches a human, even though the rule is correctly firing.

failed to send notification to Email
Failed to send alert notifications
failed to send notification to Email: dial tcp smtp:587: connect: connection refused
failed to send notification to Slack: webhook: Unexpected status code 403
Unexpected status code 404 ... "no_active_hooks"

This is one of the most dangerous classes of alerting bug because the rule dashboard looks green (rule is firing correctly) while notifications silently disappear.

Symptoms

Alerts fire in Alerting → Alert rules but no email/Slack/PagerDuty message arrives.
journalctl -u grafana-server prints Failed to send alert notifications.
The Test button on a contact point returns an error toast.
The grafana_alerting_notification_latency metric or the alertmanager receiver shows errors.
Some contact points work (Slack) while others (Email) fail — pointing at a per-notifier config issue.

Common Root Causes

1. SMTP not enabled or misconfigured

Grafana email notifications require the [smtp] block. If enabled = false (the default) or the host is wrong, delivery fails immediately.

# /etc/grafana/grafana.ini
[smtp]
enabled = true
host = smtp.gmail.com:587
user = alerts@example.com
password = """app-password-here"""
from_address = alerts@example.com
from_name = Grafana
skip_verify = false

level=error msg="Failed to send notification" receiver=Email err="failed to send notification to Email: dial tcp smtp:587: connect: connection refused"

dial tcp smtp:587: connect: connection refused almost always means host still points at the placeholder smtp or nothing is listening on that port.

2. Expired or wrong Slack webhook / bot token

A revoked Incoming Webhook returns 403; a deleted channel or disabled app returns 404 no_active_hooks.

# /etc/grafana/provisioning/alerting/contactpoints.yaml
apiVersion: 1
contactPoints:
  - orgId: 1
    name: slack-oncall
    receivers:
      - uid: slack01
        type: slack
        settings:
          url: https://hooks.slack.com/services/T000/B000/XXXX

failed to send notification to Slack: webhook: Unexpected status code 403
failed to send notification to Slack: Unexpected status code 404: {"ok":false,"error":"no_active_hooks"}

3. Network egress blocked / TLS failure

In Kubernetes, a default-deny NetworkPolicy or missing egress to the internet blocks the notifier. TLS interception can also break the SMTP handshake.

failed to send notification to Slack: Post "https://hooks.slack.com/...": dial tcp: i/o timeout
failed to send notification to Email: tls: failed to verify certificate: x509: certificate signed by unknown authority

4. Notification policy not routing (no matching label) or silenced

Even a perfect contact point delivers nothing if the notification policy has no matching route, or an active silence suppresses the alert.

level=info msg="Notification skipped" reason="silenced" alertname="High CPU"

Diagnostic Workflow

Step 1 — Test the contact point directly.

# Trigger a test notification for a receiver
curl -s -u admin:$GRAFANA_PW -X POST \
  -H "Content-Type: application/json" \
  http://localhost:3000/api/alertmanager/grafana/config/api/v1/receivers/test \
  -d '{"receivers":[{"name":"slack-oncall"}]}'

Step 2 — List configured contact points.

curl -s -u admin:$GRAFANA_PW \
  http://localhost:3000/api/v1/provisioning/contact-points | jq '.[] | {name, type}'

Step 3 — Grep logs for notification failures.

journalctl -u grafana-server --since "20 min ago" | grep -iE "notif|Failed to send"
kubectl logs deploy/grafana -n monitoring | grep -i "send notification"

Step 4 — Verify SMTP reachability from the Grafana host/pod.

# From the Grafana pod
kubectl exec -it deploy/grafana -n monitoring -- sh -c "nc -vz smtp.gmail.com 587"

Step 5 — Check for active silences and route matching.

curl -s -u admin:$GRAFANA_PW \
  http://localhost:3000/api/alertmanager/grafana/api/v2/silences | jq '.[] | {matchers, status}'

Example Root Cause Analysis

A team reported “alerts stopped emailing us on Tuesday.” The rules were firing (visible in state history), so the alerting engine was healthy — narrowing this to the contact point.

Hitting the Test endpoint for the Email receiver returned:

failed to send notification to Email: dial tcp 10.0.0.5:587: i/o timeout

The SMTP host was reachable the week before, so config drift was suspected. git log on the provisioning repo showed a new NetworkPolicy merged Monday that switched the monitoring namespace to default-deny egress. Grafana could still reach in-cluster Slack proxy (allowed) but not the external SMTP relay on 10.0.0.5:587.

Fix: they added an egress rule permitting the Grafana pod to reach the SMTP relay CIDR on TCP 587, redeployed, and the test succeeded. They also added a synthetic heartbeat alert that emails every hour so a broken email path is caught within 60 minutes instead of during an incident.

Prevention Best Practices

Add a heartbeat / DeadMansSwitch alert per contact point so silent delivery failures surface fast.
Store contact points in /etc/grafana/provisioning/alerting/contactpoints.yaml and rotate secrets via env vars, not inline.
Explicitly test each contact point after any SMTP, Slack, or NetworkPolicy change.
In Kubernetes, document required egress (SMTP 587/465, hooks.slack.com 443, PagerDuty 443) in the NetworkPolicy.
Audit notification policies so a catch-all route exists — never rely solely on label-matched routes.
Review active silences during handoff; stale silences hide real alerts.

More Grafana guides at /categories/grafana/, including alert rule evaluation failures.

Quick Command Reference

# Test a contact point
curl -s -u admin:$GRAFANA_PW -X POST -H "Content-Type: application/json" \
  http://localhost:3000/api/alertmanager/grafana/config/api/v1/receivers/test \
  -d '{"receivers":[{"name":"slack-oncall"}]}'

# List contact points
curl -s -u admin:$GRAFANA_PW http://localhost:3000/api/v1/provisioning/contact-points | jq '.[].name'

# Tail notification errors
journalctl -u grafana-server -f | grep -i "send notification"
kubectl logs -f deploy/grafana -n monitoring | grep -i "notif"

# Check reachability of SMTP from the pod
kubectl exec -it deploy/grafana -n monitoring -- sh -c "nc -vz smtp.gmail.com 587"

# List active silences
curl -s -u admin:$GRAFANA_PW \
  http://localhost:3000/api/alertmanager/grafana/api/v2/silences | jq '.[].matchers'

Conclusion

The top root causes of failed to send notification in Grafana alerting:

SMTP not enabled/misconfigured — [smtp] enabled=false or wrong host (connection refused, dial tcp smtp:587).
Expired Slack webhook/token — HTTP 403 or 404 no_active_hooks.
Blocked egress or TLS failure — NetworkPolicy/firewall or cert issues (i/o timeout, x509).
Notification policy not routing — no matching label route reaches the contact point.
Active silence — an existing silence suppresses the alert before delivery.

Grafana Error Guide: 'failed to send notification' — fixing contact point delivery failures