RabbitMQ Error Guide: 'home node ... is down' Classic Queue

Exact Error Message

A classic queue becomes unavailable when its home node — the single node that hosts the queue’s master process — is down. RabbitMQ reports the queue as down and rejects operations against it:

Channel error on connection <0.6611.0> (10.0.6.88:52310 -> 10.0.4.22:5672, vhost: '/', user: 'app'):
operation basic.consume caused a channel exception not_found:
home node 'rabbit@node2' of durable queue 'invoices' in vhost '/' is down or inaccessible

In the management UI and CLI the queue shows a down state:

rabbitmqctl list_queues name node state
invoices   rabbit@node2   down
billing    rabbit@node1   running

What the Error Means

A classic (non-mirrored) queue lives on exactly one node — its home node, chosen when the queue was declared. All of that queue’s messages and its master process exist only there. Other nodes in the cluster know the queue’s metadata (so they can route to it) but cannot serve it themselves.

When the home node goes offline, the cluster still remembers the durable queue exists, but it has no live process to handle publishes, consumes, or even passive declares. Connecting to a surviving node and touching the queue therefore yields home node ... is down or inaccessible, and the queue is marked down. The queue and its data are not deleted — they are simply stranded until the home node returns.

Common Causes

1. The home node crashed or was shut down

A hardware failure, OOM kill, manual rabbitmqctl stop, or an unfinished restart takes the home node offline, stranding every classic queue homed there.

2. Network partition cut off the home node

From the perspective of clients on the majority side, the home node is unreachable, so its queues read as down even though that node may still be running in isolation.

3. Rolling upgrade or restart without HA

During a rolling restart, each node is briefly down. Classic non-replicated queues homed on the node being restarted are unavailable for that window because there is no replica to take over.

4. Queue homed on an ephemeral/spot node

Running RabbitMQ on autoscaling or spot instances and homing durable queues on a node that gets reclaimed leaves those queues down whenever the node disappears.

5. Durable queue, transient messages misunderstanding

A durable queue survives node restart, but it is only available again once its home node is back. Teams sometimes expect another node to serve it — classic queues never fail over without mirroring.

How to Reproduce the Error

On a multi-node cluster, home a classic queue on one node, stop that node, then access the queue from a survivor.

# Confirm the queue's home node
rabbitmqctl list_queues name node state | grep invoices

invoices  rabbit@node2  running

# With node2 stopped, query/consume from node1
rabbitmqctl -n rabbit@node1 list_queues name node state | grep invoices
rabbitmqadmin --host node1 get queue=invoices

invoices  rabbit@node2  down
*** Error: 404 NOT_FOUND - home node 'rabbit@node2' of durable queue 'invoices' in vhost '/' is down or inaccessible

Diagnostic Commands

# Which node homes the queue, and is it running or down?
rabbitmqctl list_queues name node state durable messages | grep <QUEUE>

# Cluster membership and which nodes are actually up
rabbitmqctl cluster_status
rabbitmq-diagnostics cluster_status

# Is the suspected home node reachable from a survivor?
rabbitmq-diagnostics ping -n rabbit@node2
rabbitmq-diagnostics check_running -n rabbit@node2

# All queues currently in a down state across the cluster
rabbitmqctl list_queues name node state | grep -i down

# Any partition reported between nodes?
rabbitmq-diagnostics cluster_status | grep -A5 -i partition

# Broker log entries about the down home node
journalctl -u rabbitmq-server --since "30 min ago" | grep -iE "home node|nodedown|down or inaccessible|partition"

Comparing the queue’s node from list_queues against the running-nodes list in cluster_status confirms the diagnosis in one step.

Step-by-Step Resolution

Step 1: Confirm the home node is the problem

rabbitmqctl list_queues name node state | grep <QUEUE>
rabbitmqctl cluster_status

If the queue’s node is absent from running nodes and its state is down, the home node being offline is the root cause.

Step 2: Distinguish a crash from a partition

rabbitmq-diagnostics cluster_status | grep -A5 -i partition
rabbitmq-diagnostics ping -n <HOME_NODE>

A reported partition needs partition resolution; an unreachable, non-partitioned node needs to be restarted.

Step 3: Restart the home node

Start rabbitmq-server on the down node. Once it rejoins, its classic queues return to running and become available. Verify:

rabbitmq-diagnostics check_running -n <HOME_NODE>
rabbitmqctl list_queues name node state | grep <QUEUE>

Step 4: If the node is gone permanently

A classic non-replicated queue cannot be recovered without its home node — its messages are lost. Once the cluster is stable, recreate the queue (preferably as a quorum queue) and forget the dead node from the cluster after confirming it will not return.

Step 5: Migrate to replicated queues so this cannot recur

Convert critical classic queues to quorum queues so the queue keeps a leader on a surviving node when one fails. This requires draining and recreating the queue with x-queue-type=quorum.

Step 6: Verify availability

rabbitmqctl list_queues name node state messages | grep <QUEUE>

A running state on a live node confirms clients can publish and consume again.

Prevention and Best Practices

Use quorum queues for anything that must survive a node outage — classic queues are pinned to one node with no failover.
Run a 3+ node cluster with leaders/queues balanced so one node’s loss strands as few queues as possible.
Configure cluster_partition_handling (pause_minority recommended) to make partition behavior predictable.
During rolling restarts, drain or relocate classic queues first, or accept the downtime window per node.
Do not home durable queues on spot/ephemeral instances; place replicas across stable nodes and availability zones.
Monitor for down queue states and node-down events so you react before clients pile up errors.

{error, queue_process_is_stopped} — the internal-error form of the same node-down condition seen on operations like delete.
quorum queue ... no leader elected — the quorum-queue equivalent when a majority of replicas is unavailable.
Mnesia network partition / split-brain — the partition that frequently renders a home node inaccessible.
NOT_FOUND - no queue — appears if the queue was removed while its home node was down, versus merely down.
CONNECTION_FORCED — clients dropped when the home node itself shuts down.

Frequently Asked Questions

Will another node serve my classic queue while the home node is down? No. Classic non-mirrored queues have no replica, so no other node can serve them. Only the home node can, once it is back.

Are my messages lost? Durable messages on a durable queue survive a home-node restart and return when the node rejoins. They are only lost if the home node is destroyed permanently.

Why does the queue still appear in listings if it is down? The cluster retains the queue’s metadata everywhere, so it is visible, but its state is down and it cannot be used until the home node returns.

How do I get automatic failover? Use quorum queues (classic mirroring is deprecated). A quorum queue elects a new leader on a surviving node automatically when a node fails.

Is restarting the whole cluster necessary? No — only the down home node needs to come back. Restarting healthy nodes risks stranding their queues too. See the RabbitMQ guides for HA migration steps.

RabbitMQ Error Guide: 'home node ... is down' Classic Queue Unavailable

Exact Error Message

What the Error Means

Common Causes

1. The home node crashed or was shut down

2. Network partition cut off the home node

3. Rolling upgrade or restart without HA

4. Queue homed on an ephemeral/spot node

5. Durable queue, transient messages misunderstanding

How to Reproduce the Error

Diagnostic Commands

Step-by-Step Resolution

Step 1: Confirm the home node is the problem

Step 2: Distinguish a crash from a partition

Step 3: Restart the home node

Step 4: If the node is gone permanently

Step 5: Migrate to replicated queues so this cannot recur

Step 6: Verify availability

Prevention and Best Practices

Frequently Asked Questions

Download the Free 500-Prompt DevOps AI Toolkit

Exact Error Message

What the Error Means

Common Causes

1. The home node crashed or was shut down

2. Network partition cut off the home node

3. Rolling upgrade or restart without HA

4. Queue homed on an ephemeral/spot node

5. Durable queue, transient messages misunderstanding

How to Reproduce the Error

Diagnostic Commands

Step-by-Step Resolution

Step 1: Confirm the home node is the problem

Step 2: Distinguish a crash from a partition

Step 3: Restart the home node

Step 4: If the node is gone permanently

Step 5: Migrate to replicated queues so this cannot recur

Step 6: Verify availability

Prevention and Best Practices

Related Errors

Frequently Asked Questions

Download the Free 500-Prompt DevOps AI Toolkit