RabbitMQ Error Guide: 'cannot reach majority' Quorum Queue Lost Quorum
Fix RabbitMQ quorum queues that lost majority: diagnose 'cannot reach majority', under-replicated members, and recover queues after losing over half the nodes.
- #rabbitmq
- #troubleshooting
- #errors
- #quorum-queues
Exact Error Message
When a quorum queue loses more than half its members, it can no longer elect a leader or commit writes, and the broker reports it cannot reach a majority:
[warning] <0.1142.0> quorum queue 'orders' in vhost '/' cannot reach majority
of nodes [rabbit@mq-01, rabbit@mq-02, rabbit@mq-03]; only rabbit@mq-03 is online
[error] <0.1142.0> failed to recover quorum queue 'orders': not enough members online
[warning] <0.1190.0> queue 'orders': replica on rabbit@mq-03 is not synchronised
with the leader; waiting for quorum
Clients publishing to or consuming from the queue hang and then receive operation timeouts; rabbitmq-diagnostics quorum_status shows fewer online members than required for a majority.
What the Error Means
A quorum queue of N members needs a majority — floor(N/2) + 1 — online to elect a leader and commit operations. For a 3-member queue that is 2 members; for a 5-member queue, 3. If more than half the members are down (e.g., 2 of 3 nodes lost), the surviving minority cannot form a majority, so it refuses to elect a leader or accept writes. This is Raft working correctly: it would rather become unavailable than risk split-brain or data divergence.
“Cannot reach majority,” “waiting for quorum,” and “replica not synchronised” all mean the queue does not have enough healthy, caught-up members to make progress. Unlike a Ra command timeout (members slow but a majority present), this is genuine quorum loss: the queue is unavailable until enough members return or you intervene.
Common Causes
- More than half the member nodes are down. Two of three nodes crashed, were stopped, or lost network — leaving a minority that cannot reach majority.
- A network partition isolated the members. Each side has a minority, so neither can elect a leader (
cluster_statusshows a partition). - Members were never spread correctly. A 3-member queue with two members on the same host/AZ loses majority when that host/AZ fails.
- A permanently destroyed node. A node was terminated (cloud instance lost) and never removed from the queue’s membership, so the queue counts a member that will never return.
- Followers fell too far behind and are not synchronised. Replicas exist but have not caught up to the leader, so they do not count toward a healthy majority.
- Rolling restart taken too aggressively. Restarting members faster than they re-sync drops the online count below majority mid-operation.
How to Reproduce the Error
Create a 3-member quorum queue, then stop two of its three nodes:
queue.declare(queue='orders', durable=true,
arguments={'x-queue-type':'quorum'}) # auto-placed across 3 nodes
# stop two members:
# rabbitmqctl -n rabbit@mq-01 stop_app
# rabbitmqctl -n rabbit@mq-02 stop_app
# now only rabbit@mq-03 (a minority of 3) is online:
basic.publish(routing_key='orders', ...) # hangs; queue cannot reach majority
rabbitmq-diagnostics quorum_status orders will show one online member out of three — below the majority of two.
Diagnostic Commands
# Which quorum queues are under-replicated / missing majority?
rabbitmqctl list_queues name type members online leader --sort=name | grep -i quorum
# Detailed per-queue Raft membership and online state
rabbitmq-diagnostics quorum_status orders
# Cluster node status and any network partition
rabbitmqctl cluster_status
# Confirm which nodes are actually reachable/running
rabbitmq-diagnostics ping -n rabbit@mq-01
rabbitmq-diagnostics check_running
# Find queues with no leader or too few online members
rabbitmqctl list_queues name leader online | awk '$3 < 2'
# Majority-loss / recovery messages in the log
journalctl -u rabbitmq-server --since "30 min ago" | grep -iE 'majority|quorum|not synchronised|failed to recover'
In quorum_status, compare the number of online members against the majority for the membership size. Online < majority is the definitive signal of quorum loss; cluster_status then tells you whether nodes are down or partitioned.
Step-by-Step Resolution
-
Confirm quorum is actually lost, not just slow. Run
rabbitmq-diagnostics quorum_status <queue>. Ifonlinemembers are below the majority, this is quorum loss (this guide). If a majority is online but operations time out, it is a Ra command-timeout problem instead. -
Bring the down members back — this is the clean fix. Restart the stopped nodes (
rabbitmqctl start_appon each) or recover the failed hosts. Once a majority is online and caught up, the queue elects a leader and resumes automatically. Always prefer restoring nodes over forcing recovery. -
Resolve any network partition. If
cluster_statusshows a partition, fix the network and follow your partition-handling strategy. Each isolated minority cannot proceed until the partition heals. -
For permanently lost members, shrink the membership. If a node is gone for good, remove it from the queue so the remaining members can form a majority. Forget the dead cluster node, then grow/restore replicas:
# remove the permanently dead node from the cluster
rabbitmqctl forget_cluster_node rabbit@mq-02
-
Re-balance replicas after recovery. Once enough nodes are healthy, add replicas back so the queue regains its target redundancy and is not left at minimum membership.
-
As a last resort, force recovery (data-loss risk). If a majority can never be restored and the queue is stuck, RabbitMQ provides operator commands to force a minority to recover. This can lose uncommitted (and possibly committed-but-unreplicated) data — use only when restoring members is impossible and document it.
-
Verify. Re-run
quorum_status; a stable leader andonlinemembers at or above majority confirm the queue is available again.
Prevention and Best Practices
- Use an odd number of members (3 or 5) so majority math is unambiguous and tolerates 1 or 2 failures respectively.
- Spread members across distinct hosts and availability zones so no single failure removes a majority.
- Restore failed nodes promptly; do not run for long at minimum membership where one more failure loses quorum.
- During rolling restarts, wait for each member to re-synchronise before restarting the next.
- Promptly
forget_cluster_nodefor permanently destroyed nodes so dead members do not count against majority. - Alert on quorum queues where
online<members(under-replicated) before they reach actual quorum loss.
Related Errors
- quorum Ra command timeout: members slow but a majority present — a latency problem rather than quorum loss.
- quorum queue no leader elected: the symptom you see when majority loss prevents any leader from being chosen.
- publisher nack received: publishes to a queue without majority are nacked or time out.
- consumer cancelled notification: a leader change or unavailable queue can cancel attached consumers.
Frequently Asked Questions
How many members can a quorum queue lose?
A queue with N members tolerates losing floor((N-1)/2): a 3-member queue survives 1 failure, a 5-member queue survives 2. Lose more and it cannot reach majority.
Why does RabbitMQ refuse to operate with a minority? To prevent split-brain. A minority cannot safely accept writes without risking divergence from the rest of the membership, so Raft makes the queue unavailable instead.
What is the safest way to recover? Bring the down members back online. Once a majority is present and caught up, the queue elects a leader and resumes with no data loss.
When should I force recovery from a minority? Only when a majority can never be restored (e.g., nodes permanently destroyed). Forcing recovery from a minority risks data loss and should be a documented last resort.
How do I avoid this entirely? Use 3 or 5 members spread across hosts/AZs, restore failures quickly, remove dead nodes from membership, and alert on under-replicated queues before they lose quorum.
Download the Free 500-Prompt DevOps AI Toolkit
500 battle-tested, copy-paste AI prompts engineered by a senior systems engineer — every one with fill-in placeholders and safety/back-out notes. Drop your email and it's yours.
- 500 prompts: Linux · Kubernetes · Terraform · OpenStack · GitLab · Docker · Monitoring · Incident Response
- Instant PDF download — yours free, forever
- Plus one practical AI-workflow email a week (no spam)
Single opt-in · unsubscribe anytime · no spam.