Skip to content
DevOps AI ToolKit
Newsletter
All guides
AI for RabbitMQ By James Joyner IV · · 9 min read

RabbitMQ Error Guide: 'quorum queue ... no leader elected' Raft Quorum Loss

Fix RabbitMQ quorum queues with no leader: down replicas, lost majority, partitioned Ra members, and stalled Raft leader elections.

  • #rabbitmq
  • #troubleshooting
  • #errors
  • #quorum-queues

Exact Error Message

When a quorum queue cannot elect a leader, you typically see a mix of broker-side Ra log lines and client-side timeouts. A representative snippet from /var/log/rabbitmq/:

2026-06-27 09:14:02.118 [warning] <0.812.0> ra: leader election for cluster %2F_orders pending, no quorum available
2026-06-27 09:14:05.451 [warning] <0.812.0> ra: cluster %2F_orders has no elected leader, candidate rabbit@node1 timed out waiting for votes
2026-06-27 09:14:05.452 [info]    <0.812.0> ra: %2F_orders members rabbit@node1 up, rabbit@node2 down, rabbit@node3 down
2026-06-27 09:14:09.880 [error]   <0.901.0> Channel error on connection <0.760.0> (10.0.3.21:53344 -> 10.0.3.10:5672):
operation queue.declare caused a channel exception not_found:
queue 'orders' in vhost '/' has no leader (no quorum), retry later

A publishing application logs something closer to this:

Error: operation basic.publish (or queue.declare) timed out
  reason: {noproc, ...} / timeout
  queue: orders  vhost: /
  note: quorum queue has no leader elected — cluster cannot reach majority

The unifying symptom: a quorum queue is declared and known to the cluster, but no node is currently the Raft leader, so declares, publishes, and consumes fail with timeout, noproc, or no leader.

What the Error Means

Quorum queues are built on Ra, RabbitMQ’s implementation of the Raft consensus protocol. Every quorum queue is a small Raft cluster whose members are replicas spread across broker nodes. All writes (publishes, acks, consumer state) go through an elected leader, and the leader only commits an entry once a majority of members have acknowledged it.

A leader can only be elected if a majority (quorum) of the members are online and can talk to each other. If too many members are down or partitioned away, the surviving members can never gather enough votes. They stay as candidates, repeatedly time out their election rounds, and emit the no quorum available / no elected leader warnings above. Until quorum returns, the queue is genuinely unavailable, not just slow.

The majority math is unforgiving and worth memorizing:

  • A 3-member quorum queue needs 2 online. It tolerates 1 member down.
  • A 5-member quorum queue needs 3 online. It tolerates 2 members down.
  • A 2-member queue needs both — it tolerates 0 failures, which is why even replica counts are a bad idea.

Lose the majority and there is no leader, so the queue stops accepting and delivering messages until enough members rejoin. Already-committed messages are safe on disk; the queue is simply paused, not lost.

Common Causes

  • More than half the replicas are down. The most common cause: in a 3-member queue, two of the three broker nodes hosting replicas have stopped or crashed.
  • Network partition between Ra members. The nodes are running but cannot reach each other (security group, firewall, or overlay-network change blocking the Erlang distribution port 25672). Each side sees the others as down.
  • A member node never finished starting. After a rolling restart or a node replacement, one broker is stuck booting (slow timeout_waiting_for_tables, disk full, or feature-flag mismatch), so it never rejoins the Raft group.
  • Membership grown or shrunk incorrectly. Adding members without them syncing, or removing a healthy member while another is already down, can drop the online count below majority.
  • Asymmetric clustering. A node was reset or re-clustered and its view of members no longer matches its peers, so votes never converge.

How to Reproduce the Error

Reproduce only in a disposable test cluster, never production.

  1. Stand up a 3-node cluster (rabbit@node1, rabbit@node2, rabbit@node3).
  2. Declare a quorum queue with replicas on all three nodes (x-queue-type: quorum).
  3. Confirm a leader exists and publish a few messages successfully.
  4. Stop two of the three nodes (systemctl stop rabbitmq-server on node2 and node3).
  5. Publish to the queue from a client. The surviving node1 has only 1 of 3 votes — no majority — so leader election stalls and your publish/declare times out with no leader / no quorum.

Bringing node2 (or node3) back online restores the 2-of-3 majority, an election completes, and the queue resumes.

Diagnostic Commands

All commands below are read-only. Start by confirming which queues lack a leader.

# List quorum queues with their type, leader, members, and online members
rabbitmqctl list_queues name type leader members online

A healthy queue shows a non-empty leader and online matching members. The broken one shows an empty leader:

name    type    leader        members                                    online
orders  quorum  (empty)       rabbit@node1 rabbit@node2 rabbit@node3      rabbit@node1
events  quorum  rabbit@node1  rabbit@node1 rabbit@node2 rabbit@node3      rabbit@node1 rabbit@node2 rabbit@node3

Inspect the Raft state of the affected queue directly:

# Detailed Raft/quorum state for one queue
rabbitmq-diagnostics quorum_status orders

# Equivalent via rabbitmqctl
rabbitmqctl quorum_status orders
Status of quorum queue orders on node rabbit@node1 ...
Node             Raft State   Membership   Term   Match Index
rabbit@node1     candidate    voter        14     2087
rabbit@node2     unknown      voter        -      -
rabbit@node3     unknown      voter        -      -

Two voters unknown against one candidate confirms the majority is gone. Then check the cluster and node health:

# Is this node hosting queues that are one failure away from losing quorum?
rabbitmq-diagnostics check_if_node_is_quorum_critical

# Overall cluster membership, partitions, and running nodes
rabbitmqctl cluster_status

# Per-node health, listeners, and alarms
rabbitmq-diagnostics status

Finally, read the logs for the election churn and any partition notices:

# Live broker log
journalctl -u rabbitmq-server -n 200 --no-pager

# Election and partition lines in the file log
grep -E "ra:|no quorum|leader election|partition" /var/log/rabbitmq/rabbit@node1.log

Step-by-Step Resolution

  1. Confirm it is a quorum problem, not a single crash. Run rabbitmqctl list_queues name type leader members online and rabbitmq-diagnostics quorum_status <queue>. An empty leader plus fewer online members than majority means lost quorum.

  2. Find the missing members. Use rabbitmqctl cluster_status to see which nodes are down or listed under network partitions. Cross-check with rabbitmq-diagnostics status on each reachable node.

  3. Bring the down members back — this is the real fix. Restart the stopped brokers (systemctl start rabbitmq-server). As soon as a majority is online, Ra completes an election automatically; the leader reappears and the queue resumes. No data is lost.

  4. If it is a partition, restore connectivity. Fix the firewall/security-group rule blocking the Erlang distribution port (25672) between members. Once the members can see each other again, quorum returns on its own.

  5. If a member is stuck booting, resolve the underlying cause (disk space, timeout_waiting_for_tables, feature-flag mismatch) before trying anything membership-related. A half-started node will not vote.

  6. Only if a node is permanently gone should you touch membership. As recovery actions, RabbitMQ provides rabbitmq-queues add_member / rabbitmq-queues grow (or the older rabbitmqctl grow) to add a fresh replica on a healthy node, and rabbitmq-queues shrink to remove a member. When a queue is stuck without quorum and the lost nodes are unrecoverable, rabbitmq-queues force_shrink_member_to_current_member can reduce the membership to the single surviving node so it becomes a one-member majority and elects itself leader. Treat force_shrink_member_to_current_member as a last resort — it changes the queue’s fault tolerance and should be followed by growing replicas back to an odd count once the cluster is healthy.

  7. Verify recovery. Re-run rabbitmq-diagnostics quorum_status <queue> and confirm a single leader with the expected number of voter members online, then test a publish and consume.

If a quorum incident is escalating, our incident response workflow can help structure triage and comms while you bring members back.

Prevention and Best Practices

  • Always use an odd number of replicas (3 or 5). Even counts add cost without improving fault tolerance.
  • Spread members across failure domains (racks, AZs) so a single zone outage cannot remove the majority at once.
  • Restart nodes one at a time during maintenance, and wait for rabbitmq-diagnostics quorum_status to show all members back online before touching the next node.
  • Alert on check_if_node_is_quorum_critical so you know when a queue is one failure away from going dark.
  • Monitor the Erlang distribution port and inter-node latency; partitions, not crashes, cause the trickiest quorum loss.
  • Keep cluster sizes and queue replica counts in sync so adding capacity does not accidentally leave queues under-replicated.

More RabbitMQ guides live under the RabbitMQ category.

  • Mnesia network partition / partitions in cluster_status. Classic (Mnesia-backed) metadata partitions can coexist with Ra quorum loss; both stem from the same broken inter-node connectivity.
  • timeout_waiting_for_tables on boot. A node that logs this never finishes starting, so it cannot rejoin a Raft group — a frequent reason a member stays missing and quorum cannot recover.
  • node down / nodedown rabbit@nodeX. Seen when a peer is unreachable; enough of these against one queue is exactly what drops it below majority.

Frequently Asked Questions

Why does my quorum queue stop working when only one node is down? It should not, if the queue has 3 members — 3 tolerates 1 failure. If a single node-down breaks it, the queue likely has only 2 members (so it tolerates 0), or another member was already unhealthy. Check members and online with list_queues.

Are my messages lost when there is no leader? No. Committed messages are safely replicated on disk. The queue is paused, not deleted. Once a majority of members return, the leader is re-elected and the queue resumes exactly where it left off.

Will the leader come back automatically? Yes, as long as a majority of members rejoin and can communicate. Ra runs continuous elections; the moment quorum is met, a new leader is chosen with no manual command needed.

When should I use force_shrink_member_to_current_member? Only when the other members are permanently lost and the queue is stuck without quorum. It collapses membership to the surviving node so it self-elects. Afterward, grow replicas back to an odd number to restore fault tolerance.

How do I avoid this entirely? Use 3 or 5 replicas across separate failure domains, restart nodes one at a time, and alert on check_if_node_is_quorum_critical so you act before the majority is at risk.

Free download · 368-page PDF

Download the Free 500-Prompt DevOps AI Toolkit

500 battle-tested, copy-paste AI prompts engineered by a senior systems engineer — every one with fill-in placeholders and safety/back-out notes. Drop your email and it's yours.

  • 500 prompts: Linux · Kubernetes · Terraform · OpenStack · GitLab · Docker · Monitoring · Incident Response
  • Instant PDF download — yours free, forever
  • Plus one practical AI-workflow email a week (no spam)

Single opt-in · unsubscribe anytime · no spam.