Skip to content
DevOps AI ToolKit
Newsletter
All guides
AI for RabbitMQ By James Joyner IV · · 9 min read

RabbitMQ Error Guide: 'cannot reach majority' Quorum Queue Lost Quorum

Fix RabbitMQ quorum queues that lost majority: diagnose 'cannot reach majority', under-replicated members, and recover queues after losing over half the nodes.

  • #rabbitmq
  • #troubleshooting
  • #errors
  • #quorum-queues

Exact Error Message

When a quorum queue loses more than half its members, it can no longer elect a leader or commit writes, and the broker reports it cannot reach a majority:

[warning] <0.1142.0> quorum queue 'orders' in vhost '/' cannot reach majority
 of nodes [rabbit@mq-01, rabbit@mq-02, rabbit@mq-03]; only rabbit@mq-03 is online
[error] <0.1142.0> failed to recover quorum queue 'orders': not enough members online
[warning] <0.1190.0> queue 'orders': replica on rabbit@mq-03 is not synchronised
 with the leader; waiting for quorum

Clients publishing to or consuming from the queue hang and then receive operation timeouts; rabbitmq-diagnostics quorum_status shows fewer online members than required for a majority.

What the Error Means

A quorum queue of N members needs a majorityfloor(N/2) + 1 — online to elect a leader and commit operations. For a 3-member queue that is 2 members; for a 5-member queue, 3. If more than half the members are down (e.g., 2 of 3 nodes lost), the surviving minority cannot form a majority, so it refuses to elect a leader or accept writes. This is Raft working correctly: it would rather become unavailable than risk split-brain or data divergence.

“Cannot reach majority,” “waiting for quorum,” and “replica not synchronised” all mean the queue does not have enough healthy, caught-up members to make progress. Unlike a Ra command timeout (members slow but a majority present), this is genuine quorum loss: the queue is unavailable until enough members return or you intervene.

Common Causes

  • More than half the member nodes are down. Two of three nodes crashed, were stopped, or lost network — leaving a minority that cannot reach majority.
  • A network partition isolated the members. Each side has a minority, so neither can elect a leader (cluster_status shows a partition).
  • Members were never spread correctly. A 3-member queue with two members on the same host/AZ loses majority when that host/AZ fails.
  • A permanently destroyed node. A node was terminated (cloud instance lost) and never removed from the queue’s membership, so the queue counts a member that will never return.
  • Followers fell too far behind and are not synchronised. Replicas exist but have not caught up to the leader, so they do not count toward a healthy majority.
  • Rolling restart taken too aggressively. Restarting members faster than they re-sync drops the online count below majority mid-operation.

How to Reproduce the Error

Create a 3-member quorum queue, then stop two of its three nodes:

queue.declare(queue='orders', durable=true,
  arguments={'x-queue-type':'quorum'})   # auto-placed across 3 nodes
# stop two members:
#   rabbitmqctl -n rabbit@mq-01 stop_app
#   rabbitmqctl -n rabbit@mq-02 stop_app
# now only rabbit@mq-03 (a minority of 3) is online:
basic.publish(routing_key='orders', ...)   # hangs; queue cannot reach majority

rabbitmq-diagnostics quorum_status orders will show one online member out of three — below the majority of two.

Diagnostic Commands

# Which quorum queues are under-replicated / missing majority?
rabbitmqctl list_queues name type members online leader --sort=name | grep -i quorum

# Detailed per-queue Raft membership and online state
rabbitmq-diagnostics quorum_status orders

# Cluster node status and any network partition
rabbitmqctl cluster_status

# Confirm which nodes are actually reachable/running
rabbitmq-diagnostics ping -n rabbit@mq-01
rabbitmq-diagnostics check_running

# Find queues with no leader or too few online members
rabbitmqctl list_queues name leader online | awk '$3 < 2'

# Majority-loss / recovery messages in the log
journalctl -u rabbitmq-server --since "30 min ago" | grep -iE 'majority|quorum|not synchronised|failed to recover'

In quorum_status, compare the number of online members against the majority for the membership size. Online < majority is the definitive signal of quorum loss; cluster_status then tells you whether nodes are down or partitioned.

Step-by-Step Resolution

  1. Confirm quorum is actually lost, not just slow. Run rabbitmq-diagnostics quorum_status <queue>. If online members are below the majority, this is quorum loss (this guide). If a majority is online but operations time out, it is a Ra command-timeout problem instead.

  2. Bring the down members back — this is the clean fix. Restart the stopped nodes (rabbitmqctl start_app on each) or recover the failed hosts. Once a majority is online and caught up, the queue elects a leader and resumes automatically. Always prefer restoring nodes over forcing recovery.

  3. Resolve any network partition. If cluster_status shows a partition, fix the network and follow your partition-handling strategy. Each isolated minority cannot proceed until the partition heals.

  4. For permanently lost members, shrink the membership. If a node is gone for good, remove it from the queue so the remaining members can form a majority. Forget the dead cluster node, then grow/restore replicas:

# remove the permanently dead node from the cluster
rabbitmqctl forget_cluster_node rabbit@mq-02
  1. Re-balance replicas after recovery. Once enough nodes are healthy, add replicas back so the queue regains its target redundancy and is not left at minimum membership.

  2. As a last resort, force recovery (data-loss risk). If a majority can never be restored and the queue is stuck, RabbitMQ provides operator commands to force a minority to recover. This can lose uncommitted (and possibly committed-but-unreplicated) data — use only when restoring members is impossible and document it.

  3. Verify. Re-run quorum_status; a stable leader and online members at or above majority confirm the queue is available again.

Prevention and Best Practices

  • Use an odd number of members (3 or 5) so majority math is unambiguous and tolerates 1 or 2 failures respectively.
  • Spread members across distinct hosts and availability zones so no single failure removes a majority.
  • Restore failed nodes promptly; do not run for long at minimum membership where one more failure loses quorum.
  • During rolling restarts, wait for each member to re-synchronise before restarting the next.
  • Promptly forget_cluster_node for permanently destroyed nodes so dead members do not count against majority.
  • Alert on quorum queues where online < members (under-replicated) before they reach actual quorum loss.
  • quorum Ra command timeout: members slow but a majority present — a latency problem rather than quorum loss.
  • quorum queue no leader elected: the symptom you see when majority loss prevents any leader from being chosen.
  • publisher nack received: publishes to a queue without majority are nacked or time out.
  • consumer cancelled notification: a leader change or unavailable queue can cancel attached consumers.

Frequently Asked Questions

How many members can a quorum queue lose? A queue with N members tolerates losing floor((N-1)/2): a 3-member queue survives 1 failure, a 5-member queue survives 2. Lose more and it cannot reach majority.

Why does RabbitMQ refuse to operate with a minority? To prevent split-brain. A minority cannot safely accept writes without risking divergence from the rest of the membership, so Raft makes the queue unavailable instead.

What is the safest way to recover? Bring the down members back online. Once a majority is present and caught up, the queue elects a leader and resumes with no data loss.

When should I force recovery from a minority? Only when a majority can never be restored (e.g., nodes permanently destroyed). Forcing recovery from a minority risks data loss and should be a documented last resort.

How do I avoid this entirely? Use 3 or 5 members spread across hosts/AZs, restore failures quickly, remove dead nodes from membership, and alert on under-replicated queues before they lose quorum.

Free download · 368-page PDF

Download the Free 500-Prompt DevOps AI Toolkit

500 battle-tested, copy-paste AI prompts engineered by a senior systems engineer — every one with fill-in placeholders and safety/back-out notes. Drop your email and it's yours.

  • 500 prompts: Linux · Kubernetes · Terraform · OpenStack · GitLab · Docker · Monitoring · Incident Response
  • Instant PDF download — yours free, forever
  • Plus one practical AI-workflow email a week (no spam)

Single opt-in · unsubscribe anytime · no spam.