RabbitMQ Error Guide: 'epmd error for host ... nxdomain' Node Resolution Failure
Fix the epmd error for host nxdomain/address: DNS and /etc/hosts, epmd on port 4369, short vs long node names, and firewall rules.
- #rabbitmq
- #troubleshooting
- #errors
- #clustering
Exact Error Message
When a RabbitMQ command or a clustering operation cannot reach another node, the Erlang Port Mapper Daemon (epmd) reports that it cannot resolve or connect to the target host. The output typically looks like this:
Error: unable to perform an operation on node 'rabbit@node2'. Please see diagnostics information and suggestions below.
Most common reasons for this are:
* Target node is unreachable (e.g. due to hostname resolution, TCP connection or firewall issues)
* CLI tool fails to authenticate with the server (e.g. due to CLI tool's Erlang cookie not matching that of the server)
* Target node is not running
attempted to contact: ['rabbit@node2']
rabbit@node2:
* connected to epmd (port 4369) on node2
* epmd reports node 'rabbit' uses port 25672 for inter-node and CLI tool traffic
* TCP connection succeeded but Erlang distribution failed
* epmd error for host node2: nxdomain (non-existing domain)
You may instead see the connectivity variant of the same problem:
* epmd error for host node2: address (cannot connect to host/port)
The nxdomain form means the hostname node2 could not be resolved to an IP address. The address form means the name resolved, but epmd on port 4369 could not be reached.
What the Error Means
RabbitMQ runs on the Erlang runtime, and Erlang nodes find each other through epmd, a small daemon that maps node names to TCP ports. Before any node can talk to rabbit@node2, two things must happen: the hostname node2 must resolve to an IP address, and epmd on that IP must answer on port 4369 so the caller can learn which distribution port the rabbit node listens on.
nxdomain is a DNS-level verdict: the resolver was asked for node2 and authoritatively answered “no such name.” address (cannot connect to host/port) means resolution worked but the TCP handshake to epmd failed, usually because epmd is not running, the node is down, or a firewall is dropping packets on port 4369.
Because the node name is baked into RabbitMQ’s identity (rabbit@node2), even a single host that cannot be resolved will block cluster joins, CLI commands targeting that node, and inter-node replication.
Common Causes
- Missing or incorrect DNS /
/etc/hostsentry. The short hostnamenode2is not in DNS and not mapped in/etc/hosts, producingnxdomain. - epmd not running on the target node. If the RabbitMQ node never started cleanly, epmd may not be listening, yielding the
addressvariant. - Firewall blocking port 4369 (or 25672). Security groups,
firewalld,ufw, or a cloud network ACL drop the epmd or distribution port. - Short vs long node name mismatch. One node uses short names (
rabbit@node2) while another uses fully qualified names (rabbit@node2.example.com). The names do not match, so resolution targets a host that does not exist as configured. - Stale or wrong hostname. The machine’s hostname changed after RabbitMQ was provisioned, so the recorded node name no longer resolves.
How to Reproduce the Error
The simplest reproduction is to attempt a cluster status check against a node whose hostname is not resolvable. On a fresh node where node2 is unknown to DNS and absent from /etc/hosts:
# Inspect the configured node name, then ping the peer
rabbitmq-diagnostics ping -n rabbit@node2
If node2 is not resolvable, the ping fails with epmd error for host node2: nxdomain. To reproduce the address variant, ensure node2 resolves but stop epmd or block port 4369 with a firewall rule, then run the same ping. The TCP connection to epmd will time out or be refused.
Diagnostic Commands
Work outward from the local node to DNS, then to epmd and the firewall. All of the following are read-only.
# 1. Confirm the local node's identity and that it is alive
rabbitmq-diagnostics status
rabbitmq-diagnostics ping
# 2. Check what the cluster currently believes about membership
rabbitmqctl cluster_status
# 3. Verify hostname resolution for the failing peer
getent hosts node2
host node2
nslookup node2
dig +short node2
A healthy getent hosts returns a line like:
192.168.10.22 node2 node2.example.com
An nxdomain problem returns nothing from getent hosts and shows this from dig:
;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 41233
# 4. Ask epmd which Erlang nodes are registered locally, then on the peer
epmd -names
ss -lntp | grep 4369
Typical healthy epmd -names output:
epmd: up and running on port 4369 with data:
name rabbit at port 25672
The ss check should show epmd bound to 4369:
LISTEN 0 128 0.0.0.0:4369 0.0.0.0:* users:(("epmd",pid=812,fd=3))
# 5. Confirm cookie sources match (a related but distinct failure)
rabbitmq-diagnostics erlang_cookie_sources
# 6. Review startup logs for resolution and distribution errors
journalctl -u rabbitmq-server --since "30 min ago" --no-pager
Step-by-Step Resolution
-
Decide on a naming scheme and stick to it. Check
rabbitmq-diagnostics statuson every node. Either all nodes use short names (rabbit@node2) or all use long names (rabbit@node2.example.com). Mixing them is the most common root cause. The scheme is controlled byRABBITMQ_USE_LONGNAMEinrabbitmq-env.conf; keep it consistent across the cluster. -
Make every node name resolvable. For DNS-managed environments, add A records for each node. For smaller or static clusters, add matching
/etc/hostsentries on every node so each name maps to the correct IP. Re-rungetent hosts node2until it returns the expected address. -
Confirm epmd is listening. Run
epmd -namesandss -lntp | grep 4369on the target node. If epmd is absent, the RabbitMQ node is not running; checkjournalctl -u rabbitmq-serverfor the underlying startup failure (epmd is launched automatically when a node boots). -
Open the required ports. Inter-node traffic needs
4369(epmd) and25672(distribution) by default, plus5672/15672for AMQP and management. Verify firewall rules and cloud security groups allow these between cluster members. -
Re-test the path. Run
rabbitmq-diagnostics ping -n rabbit@node2, thenrabbitmqctl cluster_status. A clean ping plus a completeRunning Nodeslist confirms the resolution and epmd path is healthy.
If you would like a guided walkthrough that ties these diagnostics together, our incident response assistant can help structure the investigation.
Prevention and Best Practices
- Standardize node names at provisioning time. Pin short vs long names in configuration management so no node drifts from the cluster convention.
- Manage
/etc/hostsor DNS as code. Treat name resolution as part of the cluster definition, not a manual step, so a rebuilt node is reachable immediately. - Lock hostnames. Avoid DHCP-driven hostname changes on RabbitMQ hosts; a changed hostname silently breaks the recorded node name.
- Document the port matrix. Keep
4369and25672open between members and tested by a health check that runsrabbitmq-diagnostics pingagainst every peer. - Monitor epmd reachability. Alert when
epmd -namesor a cross-node ping fails, so resolution problems surface before they cause partition cascades.
Related Errors
If resolution succeeds but the join still fails, you may hit an Erlang cookie mismatch, which surfaces as an authentication failure because the .erlang.cookie value differs between nodes; verify it with rabbitmq-diagnostics erlang_cookie_sources. During automated bootstrap you may see “Could not auto-cluster” when a seed node is unreachable for the same DNS or epmd reasons described here. Finally, after a full cluster restart you may encounter timeout_waiting_for_tables, where a node blocks waiting for its Mnesia peers to come back; that is a startup-ordering issue rather than a name-resolution one, but it often appears alongside connectivity faults. For more RabbitMQ guides, browse the RabbitMQ category.
Frequently Asked Questions
What is the difference between the nxdomain and address variants of this error?
nxdomain means the hostname could not be resolved to an IP at all, so it is a DNS or /etc/hosts problem. address (cannot connect to host/port) means the name resolved correctly but epmd on port 4369 did not answer, pointing to a stopped node or a firewall.
Which ports must be open for epmd and clustering to work?
Port 4369 for epmd and 25672 for inter-node Erlang distribution are the two essentials for the resolution and join path. AMQP (5672) and management (15672) are separate and not the cause of this specific error.
How do I tell whether my cluster uses short or long node names?
Run rabbitmq-diagnostics status and read the node name. If it is rabbit@node2 it is short; if it is rabbit@node2.example.com it is long. Every node in the cluster must use the same form, which is governed by RABBITMQ_USE_LONGNAME.
Why does epmd -names show nothing on the target node?
An empty epmd -names usually means the RabbitMQ node never started successfully, so epmd has no registered name to report. Check journalctl -u rabbitmq-server for the real startup failure rather than treating epmd itself as broken.
Can a hostname change after install cause this error?
Yes. RabbitMQ records its node name from the hostname at first boot. If the hostname later changes, the old name no longer resolves and you get nxdomain until DNS or /etc/hosts is updated to match the configured node name.
Download the Free 500-Prompt DevOps AI Toolkit
500 battle-tested, copy-paste AI prompts engineered by a senior systems engineer — every one with fill-in placeholders and safety/back-out notes. Drop your email and it's yours.
- 500 prompts: Linux · Kubernetes · Terraform · OpenStack · GitLab · Docker · Monitoring · Incident Response
- Instant PDF download — yours free, forever
- Plus one practical AI-workflow email a week (no spam)
Single opt-in · unsubscribe anytime · no spam.