RabbitMQ Troubleshooting Toolkit
Diagnose RPC timeouts, queue backlogs, missed heartbeats, memory/disk alarms, and cluster partitions — with queue-level runbooks and prompts.
Top RabbitMQ errors
Start with the most common production issues and troubleshooting paths.
RabbitMQ RPC timeout (oslo.messaging)
MessagingTimeout and missed heartbeats across Nova, Cinder, Neutron, and Heat.
NOT_FOUND - no exchange
Fix RabbitMQ publish failures: 404 NOT_FOUND on basic.publish, returned messages, channel closed on publish, and missing publis…
{socket_error, epipe}
Fix RabbitMQ epipe / broken pipe errors: trace writes to a closed socket from slow consumers, vanished clients, and network dro…
CHANNEL_ERROR - expected channel.open
Fix RabbitMQ CHANNEL_ERROR and 'channel closed' exceptions: using a closed channel, unexpected frames, protocol violations, and…
Node rabbit@host is down
Fix RabbitMQ 'Node rabbit@host is down' and 'not responding' errors: a crashed beam, stopped service, or blocked distribution p…
{socket_error, econnreset}
Fix RabbitMQ econnreset / connection reset by peer: trace LB and proxy idle timeouts, client crashes, and firewall resets that…
consumer cancelled
Fix RabbitMQ consumer cancel notifications: diagnose why basic.cancel is pushed to clients when a queue is deleted, its node fa…
consumer_timeout
Fix RabbitMQ consumer_timeout: diagnose 'delivery acknowledgement timed out' channel closures from long processing or unacked m…
Best RabbitMQ prompts
Use these prompts to turn symptoms, logs, and config into a structured troubleshooting plan.
RabbitMQ Alternate Exchange & Unroutable Message Design
Design handling for unroutable and rejected messages using alternate exchanges, mandatory-flag returns, and a catch-all topology so messages that match no binding are captured instead of silently dropped.
RabbitMQ Backup & Disaster Recovery Design
Design a RabbitMQ backup and DR strategy covering definitions export, message durability assumptions, cross-region replication options, and a tested recovery runbook with realistic RPO/RTO.
RabbitMQ Cluster Capacity & Sizing Review
Right-size a RabbitMQ cluster's node count, memory/disk headroom, file descriptors, and Erlang scheduler settings against measured publish/consume rates and queue depth before scaling or a traffic event.
RabbitMQ Heartbeat & Connection Churn Triage
Diagnose missed-heartbeat disconnects, connection/channel churn, and 'connection_closed_abruptly' noise by correlating client timeouts, proxy idle limits, and broker heartbeat settings.
Free RabbitMQ tools
Validate, troubleshoot, or analyze your configuration before production changes.
RabbitMQ RPC Timeout runbook
Cluster health, queue depth, and a service-restart decision tree — with a downloadable pack.
Open the runbookRabbitMQ runbook
Use a repeatable checklist for production troubleshooting.
A checklist for a broker that’s backing up, partitioned, or alarmed.
- 1 Check cluster status and partitions (rabbitmqctl cluster_status)
- 2 Inspect queues, consumers, and message backlog
- 3 Review memory and disk-free alarms
- 4 Check connection/consumer churn and blocked connections
- 5 Validate service dependencies and heartbeats