RabbitMQ Error Guide: 'home node ... is down' Classic Queue Unavailable
Fix RabbitMQ 'home node is down' errors: classic queue unavailable because its home/leader node is offline, queue shown as down in management, and HA fixes.
- #rabbitmq
- #troubleshooting
- #errors
- #clustering
Exact Error Message
A classic queue becomes unavailable when its home node — the single node that hosts the queue’s master process — is down. RabbitMQ reports the queue as down and rejects operations against it:
Channel error on connection <0.6611.0> (10.0.6.88:52310 -> 10.0.4.22:5672, vhost: '/', user: 'app'):
operation basic.consume caused a channel exception not_found:
home node 'rabbit@node2' of durable queue 'invoices' in vhost '/' is down or inaccessible
In the management UI and CLI the queue shows a down state:
rabbitmqctl list_queues name node state
invoices rabbit@node2 down
billing rabbit@node1 running
What the Error Means
A classic (non-mirrored) queue lives on exactly one node — its home node, chosen when the queue was declared. All of that queue’s messages and its master process exist only there. Other nodes in the cluster know the queue’s metadata (so they can route to it) but cannot serve it themselves.
When the home node goes offline, the cluster still remembers the durable queue exists, but it has no live process to handle publishes, consumes, or even passive declares. Connecting to a surviving node and touching the queue therefore yields home node ... is down or inaccessible, and the queue is marked down. The queue and its data are not deleted — they are simply stranded until the home node returns.
Common Causes
1. The home node crashed or was shut down
A hardware failure, OOM kill, manual rabbitmqctl stop, or an unfinished restart takes the home node offline, stranding every classic queue homed there.
2. Network partition cut off the home node
From the perspective of clients on the majority side, the home node is unreachable, so its queues read as down even though that node may still be running in isolation.
3. Rolling upgrade or restart without HA
During a rolling restart, each node is briefly down. Classic non-replicated queues homed on the node being restarted are unavailable for that window because there is no replica to take over.
4. Queue homed on an ephemeral/spot node
Running RabbitMQ on autoscaling or spot instances and homing durable queues on a node that gets reclaimed leaves those queues down whenever the node disappears.
5. Durable queue, transient messages misunderstanding
A durable queue survives node restart, but it is only available again once its home node is back. Teams sometimes expect another node to serve it — classic queues never fail over without mirroring.
How to Reproduce the Error
On a multi-node cluster, home a classic queue on one node, stop that node, then access the queue from a survivor.
# Confirm the queue's home node
rabbitmqctl list_queues name node state | grep invoices
invoices rabbit@node2 running
# With node2 stopped, query/consume from node1
rabbitmqctl -n rabbit@node1 list_queues name node state | grep invoices
rabbitmqadmin --host node1 get queue=invoices
invoices rabbit@node2 down
*** Error: 404 NOT_FOUND - home node 'rabbit@node2' of durable queue 'invoices' in vhost '/' is down or inaccessible
Diagnostic Commands
# Which node homes the queue, and is it running or down?
rabbitmqctl list_queues name node state durable messages | grep <QUEUE>
# Cluster membership and which nodes are actually up
rabbitmqctl cluster_status
rabbitmq-diagnostics cluster_status
# Is the suspected home node reachable from a survivor?
rabbitmq-diagnostics ping -n rabbit@node2
rabbitmq-diagnostics check_running -n rabbit@node2
# All queues currently in a down state across the cluster
rabbitmqctl list_queues name node state | grep -i down
# Any partition reported between nodes?
rabbitmq-diagnostics cluster_status | grep -A5 -i partition
# Broker log entries about the down home node
journalctl -u rabbitmq-server --since "30 min ago" | grep -iE "home node|nodedown|down or inaccessible|partition"
Comparing the queue’s node from list_queues against the running-nodes list in cluster_status confirms the diagnosis in one step.
Step-by-Step Resolution
Step 1: Confirm the home node is the problem
rabbitmqctl list_queues name node state | grep <QUEUE>
rabbitmqctl cluster_status
If the queue’s node is absent from running nodes and its state is down, the home node being offline is the root cause.
Step 2: Distinguish a crash from a partition
rabbitmq-diagnostics cluster_status | grep -A5 -i partition
rabbitmq-diagnostics ping -n <HOME_NODE>
A reported partition needs partition resolution; an unreachable, non-partitioned node needs to be restarted.
Step 3: Restart the home node
Start rabbitmq-server on the down node. Once it rejoins, its classic queues return to running and become available. Verify:
rabbitmq-diagnostics check_running -n <HOME_NODE>
rabbitmqctl list_queues name node state | grep <QUEUE>
Step 4: If the node is gone permanently
A classic non-replicated queue cannot be recovered without its home node — its messages are lost. Once the cluster is stable, recreate the queue (preferably as a quorum queue) and forget the dead node from the cluster after confirming it will not return.
Step 5: Migrate to replicated queues so this cannot recur
Convert critical classic queues to quorum queues so the queue keeps a leader on a surviving node when one fails. This requires draining and recreating the queue with x-queue-type=quorum.
Step 6: Verify availability
rabbitmqctl list_queues name node state messages | grep <QUEUE>
A running state on a live node confirms clients can publish and consume again.
Prevention and Best Practices
- Use quorum queues for anything that must survive a node outage — classic queues are pinned to one node with no failover.
- Run a 3+ node cluster with leaders/queues balanced so one node’s loss strands as few queues as possible.
- Configure
cluster_partition_handling(pause_minorityrecommended) to make partition behavior predictable. - During rolling restarts, drain or relocate classic queues first, or accept the downtime window per node.
- Do not home durable queues on spot/ephemeral instances; place replicas across stable nodes and availability zones.
- Monitor for
downqueue states and node-down events so you react before clients pile up errors.
Related Errors
{error, queue_process_is_stopped}— the internal-error form of the same node-down condition seen on operations like delete.quorum queue ... no leader elected— the quorum-queue equivalent when a majority of replicas is unavailable.- Mnesia network partition / split-brain — the partition that frequently renders a home node inaccessible.
NOT_FOUND - no queue— appears if the queue was removed while its home node was down, versus merelydown.CONNECTION_FORCED— clients dropped when the home node itself shuts down.
Frequently Asked Questions
Will another node serve my classic queue while the home node is down? No. Classic non-mirrored queues have no replica, so no other node can serve them. Only the home node can, once it is back.
Are my messages lost? Durable messages on a durable queue survive a home-node restart and return when the node rejoins. They are only lost if the home node is destroyed permanently.
Why does the queue still appear in listings if it is down?
The cluster retains the queue’s metadata everywhere, so it is visible, but its state is down and it cannot be used until the home node returns.
How do I get automatic failover? Use quorum queues (classic mirroring is deprecated). A quorum queue elects a new leader on a surviving node automatically when a node fails.
Is restarting the whole cluster necessary? No — only the down home node needs to come back. Restarting healthy nodes risks stranding their queues too. See the RabbitMQ guides for HA migration steps.
Download the Free 500-Prompt DevOps AI Toolkit
500 battle-tested, copy-paste AI prompts engineered by a senior systems engineer — every one with fill-in placeholders and safety/back-out notes. Drop your email and it's yours.
- 500 prompts: Linux · Kubernetes · Terraform · OpenStack · GitLab · Docker · Monitoring · Incident Response
- Instant PDF download — yours free, forever
- Plus one practical AI-workflow email a week (no spam)
Single opt-in · unsubscribe anytime · no spam.