Skip to content
DevOps AI ToolKit
Newsletter
All guides
AI for RabbitMQ By James Joyner IV · · 9 min read

RabbitMQ Error Guide: 'Error: operation ... timed out' rabbitmqctl RPC Timeout

Fix RabbitMQ operation timed out errors from rabbitmqctl and cluster ops: overloaded nodes, slow internal RPC, long-running queries, and the --timeout flag.

  • #rabbitmq
  • #troubleshooting
  • #errors
  • #cli

Exact Error Message

A CLI command or cluster operation aborts because an internal RPC to the broker did not return in time:

Error: operation list_queues on node rabbit@node1 timed out.
Timeout value used: 60000 ms. Some queues may not have responded in time.

Error:
{badrpc,timeout}

Error: operation list_connections on node rabbit@node1 timed out.
Timeout value used: 60000 ms.

Cluster operations show a similar shape:

Error: operation await_online_nodes on node rabbit@node2 timed out.
Timeout value used: 300000 ms.

And in the broker log you may see the matching internal call timing out:

2026-06-29 17:31:09.884 [warning] <0.4521.0> rabbit_mgmt or core query exceeded deadline:
{timeout,{gen_server,call,[<0.812.0>,emit_info_all,60000]}}

What the Error Means

rabbitmqctl, rabbitmq-diagnostics, and rabbitmqadmin are thin clients: they connect to the Erlang node and make a remote procedure call (RPC) that runs the real work inside the broker, then wait for a reply up to a timeout. operation ... timed out / {badrpc,timeout} means the broker did not answer within that window. The command itself is fine — the node was too busy, or the specific query was too expensive, to respond in time.

This is deliberately distinct from heartbeat timeouts (which close AMQP client connections) and consumer acknowledgement timeouts (which redeliver messages). Here, no message delivery is involved; it is a control-plane call from your shell to the broker that exceeded its deadline. The default for most listing commands is 60 seconds; cluster-coordination commands use longer defaults.

Common Causes

  • Overloaded node. High CPU, memory pressure, or GC pauses keep the broker from servicing the RPC promptly.
  • Very large topology. list_queues/list_connections must gather state for tens of thousands of objects, exceeding the deadline.
  • Expensive columns requested. Asking for computed fields (e.g. messages, memory, consumer_details) forces per-object work that is far slower than names alone.
  • A blocked or partitioned node. A node in a network partition or stuck in startup cannot answer cluster RPCs.
  • Resource alarm active. A memory/disk alarm throttles the node and slows control operations.
  • Timeout set too low. A short explicit --timeout/-t makes a normal-but-slow query fail.
  • Mnesia/cluster contention. Cluster-wide operations waiting on a slow or unreachable peer.

How to Reproduce the Error

Force a heavy query on a large topology with a tight timeout:

# create a large number of queues first
for i in $(seq 1 50000); do rabbitmqadmin declare queue name=q-$i; done

# request expensive per-queue columns with a 1-second deadline
rabbitmqctl list_queues name messages memory consumers --timeout 1
# Error: operation list_queues on node rabbit@node1 timed out. Timeout value used: 1000 ms.

The same command with just name and the default timeout usually succeeds, demonstrating that the cost — not the command — is the problem.

Diagnostic Commands

# Is the node even responsive at the control plane? (short, cheap call)
rabbitmq-diagnostics ping
rabbitmq-diagnostics check_running

# Is the node under load or in an alarm/partition state?
rabbitmq-diagnostics status | grep -iE 'Memory|Alarms|Uptime'
rabbitmq-diagnostics cluster_status
Alarms: [memory]            <-- a resource alarm slows control ops
Network Partitions: (none)
# Run the cheapest possible version of the failing query first
rabbitmqctl list_queues name --timeout 120

# Then add columns one at a time to find the expensive one
rabbitmqctl list_queues name messages --timeout 120
# Pull RPC/timeout evidence from the log
sudo grep -iE 'timed out|badrpc|gen_server,call.*timeout' \
  /var/log/rabbitmq/rabbit@$(hostname -s).log | tail -15

# Watch live node load (schedulers, memory, busy processes)
rabbitmq-diagnostics observer --interval 5

If ping and check_running are instant but list_queues name messages is slow, the cost is the query. If even ping lags, the node itself is overloaded or unhealthy.

Step-by-Step Resolution

  1. Check node health first. Run rabbitmq-diagnostics ping and status. If the node is slow to even ping, the problem is node load/alarms/partition, not the command.

  2. Clear any resource alarm. An Alarms: [memory] or [disk] entry throttles the node; free memory/disk (or raise the watermark/threshold appropriately) so control operations speed up.

  3. Make the query cheaper. Request only the columns you need — name is nearly free, while messages, memory, and consumer_details are expensive:

    rabbitmqctl list_queues name messages --timeout 120

    Avoid pulling every computed column across a huge topology in one call.

  4. Raise the timeout for genuinely large clusters. Use a longer deadline when the work legitimately takes time:

    rabbitmqctl list_queues name messages --timeout 300

    --timeout 0 waits indefinitely; prefer a generous finite value.

  5. Target the right node. Run the command on (or with -n) the node that owns the data, and avoid querying a node that is mid-startup or partitioned.

  6. Resolve partitions/peer issues for cluster ops. If cluster_status shows a partition or an unreachable node, fix that first — await_online_nodes and similar will keep timing out until the peer returns.

  7. Prefer the metrics API/Prometheus for routine large reads. For dashboards over big topologies, scrape rabbitmq_prometheus (port 15692) instead of repeatedly running expensive list_* RPCs.

Verify by re-running the cheapest form of the command and confirming it returns well within the timeout.

Prevention and Best Practices

  • Always request only the columns you need in list_* commands; name-only queries rarely time out.
  • Set explicit, realistic --timeout values in automation sized to your topology rather than relying on the 60s default.
  • Monitor node memory, disk, and alarm state; a node in alarm slows every control operation and produces these timeouts.
  • Use rabbitmq_prometheus for ongoing metrics so you are not running heavy CLI RPCs on a schedule.
  • Keep clusters partition-free and nodes with CPU headroom so RPCs return promptly.
  • In scripts, check rabbitmq-diagnostics check_running before issuing expensive queries, and back off rather than retrying immediately on {badrpc,timeout}.
  • CRASH REPORT … gen_server terminated — the same {timeout,{gen_server,call,...}} reason seen from inside the broker.
  • statistics database could not be contacted — the management-API analogue of an overloaded query path.
  • missed heartbeats timeout — an AMQP connection timeout, a different mechanism from this control-plane RPC timeout.
  • timeout waiting for tables / mnesia overloaded — startup and cluster-coordination timeouts with their own causes.

See the broader RabbitMQ guides.

Frequently Asked Questions

Is operation ... timed out the same as a heartbeat timeout? No. Heartbeat timeouts close AMQP client connections on 5672. This is a control-plane RPC from rabbitmqctl to the broker that exceeded its deadline.

Should I just raise --timeout? Raise it for legitimately large clusters, but first check whether the node is in an alarm or you are requesting expensive columns — those are the usual real causes.

Why does list_queues name work but list_queues name messages time out? messages (and memory, consumer_details) require per-queue computation; over a large topology that work can exceed the deadline while name alone is cheap.

What does {badrpc,timeout} mean? The Erlang RPC from the CLI to the broker node did not return in time — the node was too busy or unhealthy to answer.

How should I monitor large clusters without triggering this? Scrape rabbitmq_prometheus on port 15692 instead of repeatedly running expensive list_* commands.

Free download · 368-page PDF

Download the Free 500-Prompt DevOps AI Toolkit

500 battle-tested, copy-paste AI prompts engineered by a senior systems engineer — every one with fill-in placeholders and safety/back-out notes. Drop your email and it's yours.

  • 500 prompts: Linux · Kubernetes · Terraform · OpenStack · GitLab · Docker · Monitoring · Incident Response
  • Instant PDF download — yours free, forever
  • Plus one practical AI-workflow email a week (no spam)

Single opt-in · unsubscribe anytime · no spam.