RabbitMQ Error Guide: 'CRASH REPORT ... gen_server terminated' Reading Erlang Crashes
Read RabbitMQ Erlang crash reports: decode gen_server terminated and supervisor reports with noproc, badmatch, function_clause, case_clause, and badarg reasons.
- #rabbitmq
- #troubleshooting
- #errors
- #erlang
Exact Error Message
RabbitMQ runs on the Erlang VM, so internal failures surface as Erlang CRASH REPORT and SUPERVISOR REPORT entries rather than tidy English. A typical pair looks like this:
2026-06-29 16:08:41.220 [error] <0.1731.0> CRASH REPORT Process <0.1731.0> with 0 neighbours
exited with reason: {{badmatch,{error,not_found}},
[{rabbit_amqqueue,with,2,[{file,"rabbit_amqqueue.erl"},{line,540}]},
{rabbit_channel,handle_method,3,[{file,"rabbit_channel.erl"},{line,1284}]},
{gen_server2,handle_msg,2,[{file,"gen_server2.erl"},{line,1056}]},
{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,247}]}]}
in gen_server2:terminate/3 line 1183
2026-06-29 16:08:41.222 [error] <0.1729.0> SUPERVISOR REPORT Supervisor:
{<0.1729.0>,rabbit_channel_sup}. Context: child_terminated.
Reason: {{badmatch,{error,not_found}},...}. Offender: id=channel,pid=<0.1731.0>.
Other common reasons in the first tuple include noproc, function_clause, case_clause, badarg, and {timeout, {gen_server,call,[...]}}.
What the Error Means
A CRASH REPORT says an Erlang process died and prints why (the reason) and where (the stacktrace). A SUPERVISOR REPORT says the dead process’s supervisor noticed and reacted (usually by restarting the child). The two almost always appear together: the crash is the cause, the supervisor entry is the effect.
The reason is the most important part. It is an Erlang term whose head names the failure class:
noproc— sent a call to a process that no longer exists (a race or an already-dead dependency).{badmatch, V}— a=pattern match failed because the value wasV(e.g. matchingokbut getting{error, not_found}).function_clause— no function clause matched the arguments (unexpected/malformed input).case_clause— acasehad no branch for the actual value.badarg— a built-in function got an argument of the wrong type (bad binary, bad ETS table, etc.).{timeout, {gen_server, call, [...]}}— a synchronous call to another process timed out (the callee was overloaded or stuck).
The stacktrace below the reason lists {Module, Function, Arity, [{file,...},{line,...}]} frames, newest first. The top frame is where it actually broke; the frames below show how it got there.
Common Causes
- A resource vanished mid-operation (
noproc/{badmatch,{error,not_found}}) — a queue, connection, or process was deleted while another process referenced it. - Unexpected input from a client or plugin (
function_clause/case_clause) — a malformed frame, header, or argument the code did not anticipate. - Corrupt or wrong-type data (
badarg) — a damaged message store entry, a bad ETS/Mnesia read, or an invalid binary. - Overloaded internal dependency (
{timeout,{gen_server,call,...}}) — a busy queue or metrics process not answering in time. - A bug or version mismatch — a plugin compiled against a different broker version triggering
undef/function_clause. - Cascading supervisor restarts — repeated child crashes hitting the restart intensity and taking down the supervisor.
How to Reproduce the Error
The cleanest reproducible case is a noproc/not_found race: operate on a queue that is being deleted concurrently.
# terminal 1: repeatedly delete and recreate a queue
while true; do
rabbitmqadmin delete queue name=racy
rabbitmqadmin declare queue name=racy
done
# terminal 2: repeatedly publish to it
while true; do rabbitmqadmin publish routing_key=racy payload=hi; done
A channel occasionally references the queue in the instant it does not exist, and a {badmatch,{error,not_found}} or noproc crash report appears for that channel.
Diagnostic Commands
# Pull the crash and supervisor reports together, with context
sudo grep -iE 'CRASH REPORT|SUPERVISOR REPORT' \
/var/log/rabbitmq/rabbit@$(hostname -s).log | tail -30
# Read the full multi-line reason and stacktrace for one crash
sudo grep -nA15 'CRASH REPORT' \
/var/log/rabbitmq/rabbit@$(hostname -s).log | tail -40
# How often are processes crashing? Count over the current log
sudo grep -c 'CRASH REPORT' /var/log/rabbitmq/rabbit@$(hostname -s).log
# Is a supervisor hitting its restart limit (reached_max_restart_intensity)?
sudo grep -i 'reached_max_restart_intensity' \
/var/log/rabbitmq/rabbit@$(hostname -s).log
# Confirm broker, Erlang/OTP, and plugin versions match expectations
rabbitmq-diagnostics status | grep -iE 'RabbitMQ version|Erlang'
rabbitmq-plugins list -e
Read the reason first, then the top stacktrace frame — {rabbit_amqqueue,with,2,...,line 540} above tells you the failure happened while looking up a queue, which combined with {badmatch,{error,not_found}} pinpoints a missing queue.
Step-by-Step Resolution
-
Pair the reports. For each
SUPERVISOR REPORT, find the matchingCRASH REPORTwith the same PID — that crash is the real cause. -
Decode the reason head. Map it to a class:
noproc/not_foundis a missing dependency;badmatch/case_clause/function_clauseis unexpected data;badargis wrong-type/corrupt data;{timeout,...}is an overloaded callee. -
Read the top stacktrace frame. The first
{Module,Function,Arity,...}is where it broke. The module name (rabbit_channel,rabbit_amqqueue,rabbit_mgmt_db, a plugin module) localises the subsystem. -
Act on the class:
- noproc/not_found: expected during deletes/restarts if rare and the supervisor restarts the child — confirm it is not a flood. If constant, fix the client that references deleted objects.
- function_clause/case_clause: trace the offending client or plugin sending unexpected input; capture the frame before the crash.
- badarg: suspect corruption — check the message store/Mnesia and recent disk errors.
- timeout: investigate the overloaded callee (busy queue, stats DB) rather than the crashing process.
-
Check restart intensity. If you see
reached_max_restart_intensity, the crashes are frequent enough to take down a supervisor; treat it as an outage, not noise. -
Verify versions when the reason is
undef/function_clausein a plugin — a plugin built for another broker/OTP version must be rebuilt or replaced. -
Confirm resolution by watching the crash count after the fix:
watch -n 5 "sudo grep -c 'CRASH REPORT' /var/log/rabbitmq/rabbit@$(hostname -s).log"
Prevention and Best Practices
- Treat a steady stream of
CRASH REPORTlines as a signal, not background noise — alert on the rate and onreached_max_restart_intensity. - Make clients tolerant of races: handle queues/connections disappearing instead of assuming they persist, which removes most
noproc/not_foundcrashes. - Keep plugins built against the exact broker/OTP version you run; mismatches are a top source of
undef/function_clause. - Capture and ship the full multi-line reason and stacktrace (not just the first line) so the top frame is always available for diagnosis.
- Watch disk health and the message store for
badarg/corruption-class crashes before they spread. - Keep nodes with CPU/memory headroom so
{timeout,{gen_server,call,...}}crashes from overloaded internal processes do not occur.
Related Errors
- operation timed out (RPC) — the
{timeout,{gen_server,call,...}}reason seen from therabbitmqctlside. - timeout waiting for tables / mnesia overloaded — startup-time crash patterns with their own reasons.
- statistics database could not be contacted — a
rabbit_mgmt_dbtimeout that can also appear as a crash reason. - shovel / federation worker terminated — plugin workers whose
shutdownreasons follow the same tuple format.
More in the RabbitMQ troubleshooting series.
Frequently Asked Questions
Which line of a crash report matters most?
The reason tuple (the failure class) and the top stacktrace frame (where it broke). Together they localise the bug.
Are crash reports always a problem?
Not necessarily. A rare noproc/not_found during deletes that the supervisor restarts is often benign. A high or rising rate, or reached_max_restart_intensity, is a real problem.
What does {badmatch,{error,not_found}} mean?
Code expected a success value but received {error,not_found} — typically a referenced queue/resource that did not exist at that moment.
How do I read the stacktrace order?
Newest first. The first {Module,Function,Arity,{file,line}} is the immediate failure site; lower frames are the callers.
A plugin module appears in the crash — what now? Suspect a version mismatch or plugin bug. Rebuild the plugin for your exact RabbitMQ/Erlang version or disable it to confirm.
Download the Free 500-Prompt DevOps AI Toolkit
500 battle-tested, copy-paste AI prompts engineered by a senior systems engineer — every one with fill-in placeholders and safety/back-out notes. Drop your email and it's yours.
- 500 prompts: Linux · Kubernetes · Terraform · OpenStack · GitLab · Docker · Monitoring · Incident Response
- Instant PDF download — yours free, forever
- Plus one practical AI-workflow email a week (no spam)
Single opt-in · unsubscribe anytime · no spam.