Skip to content
DevOps AI ToolKit
Newsletter
All guides
AI for RabbitMQ By James Joyner IV · · 9 min read

RabbitMQ Error Guide: 'CRASH REPORT ... gen_server terminated' Reading Erlang Crashes

Read RabbitMQ Erlang crash reports: decode gen_server terminated and supervisor reports with noproc, badmatch, function_clause, case_clause, and badarg reasons.

  • #rabbitmq
  • #troubleshooting
  • #errors
  • #erlang

Exact Error Message

RabbitMQ runs on the Erlang VM, so internal failures surface as Erlang CRASH REPORT and SUPERVISOR REPORT entries rather than tidy English. A typical pair looks like this:

2026-06-29 16:08:41.220 [error] <0.1731.0> CRASH REPORT Process <0.1731.0> with 0 neighbours
exited with reason: {{badmatch,{error,not_found}},
  [{rabbit_amqqueue,with,2,[{file,"rabbit_amqqueue.erl"},{line,540}]},
   {rabbit_channel,handle_method,3,[{file,"rabbit_channel.erl"},{line,1284}]},
   {gen_server2,handle_msg,2,[{file,"gen_server2.erl"},{line,1056}]},
   {proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,247}]}]}
in gen_server2:terminate/3 line 1183

2026-06-29 16:08:41.222 [error] <0.1729.0> SUPERVISOR REPORT Supervisor:
{<0.1729.0>,rabbit_channel_sup}. Context: child_terminated.
Reason: {{badmatch,{error,not_found}},...}. Offender: id=channel,pid=<0.1731.0>.

Other common reasons in the first tuple include noproc, function_clause, case_clause, badarg, and {timeout, {gen_server,call,[...]}}.

What the Error Means

A CRASH REPORT says an Erlang process died and prints why (the reason) and where (the stacktrace). A SUPERVISOR REPORT says the dead process’s supervisor noticed and reacted (usually by restarting the child). The two almost always appear together: the crash is the cause, the supervisor entry is the effect.

The reason is the most important part. It is an Erlang term whose head names the failure class:

  • noproc — sent a call to a process that no longer exists (a race or an already-dead dependency).
  • {badmatch, V} — a = pattern match failed because the value was V (e.g. matching ok but getting {error, not_found}).
  • function_clause — no function clause matched the arguments (unexpected/malformed input).
  • case_clause — a case had no branch for the actual value.
  • badarg — a built-in function got an argument of the wrong type (bad binary, bad ETS table, etc.).
  • {timeout, {gen_server, call, [...]}} — a synchronous call to another process timed out (the callee was overloaded or stuck).

The stacktrace below the reason lists {Module, Function, Arity, [{file,...},{line,...}]} frames, newest first. The top frame is where it actually broke; the frames below show how it got there.

Common Causes

  • A resource vanished mid-operation (noproc/{badmatch,{error,not_found}}) — a queue, connection, or process was deleted while another process referenced it.
  • Unexpected input from a client or plugin (function_clause/case_clause) — a malformed frame, header, or argument the code did not anticipate.
  • Corrupt or wrong-type data (badarg) — a damaged message store entry, a bad ETS/Mnesia read, or an invalid binary.
  • Overloaded internal dependency ({timeout,{gen_server,call,...}}) — a busy queue or metrics process not answering in time.
  • A bug or version mismatch — a plugin compiled against a different broker version triggering undef/function_clause.
  • Cascading supervisor restarts — repeated child crashes hitting the restart intensity and taking down the supervisor.

How to Reproduce the Error

The cleanest reproducible case is a noproc/not_found race: operate on a queue that is being deleted concurrently.

# terminal 1: repeatedly delete and recreate a queue
while true; do
  rabbitmqadmin delete queue name=racy
  rabbitmqadmin declare queue name=racy
done

# terminal 2: repeatedly publish to it
while true; do rabbitmqadmin publish routing_key=racy payload=hi; done

A channel occasionally references the queue in the instant it does not exist, and a {badmatch,{error,not_found}} or noproc crash report appears for that channel.

Diagnostic Commands

# Pull the crash and supervisor reports together, with context
sudo grep -iE 'CRASH REPORT|SUPERVISOR REPORT' \
  /var/log/rabbitmq/rabbit@$(hostname -s).log | tail -30

# Read the full multi-line reason and stacktrace for one crash
sudo grep -nA15 'CRASH REPORT' \
  /var/log/rabbitmq/rabbit@$(hostname -s).log | tail -40
# How often are processes crashing? Count over the current log
sudo grep -c 'CRASH REPORT' /var/log/rabbitmq/rabbit@$(hostname -s).log

# Is a supervisor hitting its restart limit (reached_max_restart_intensity)?
sudo grep -i 'reached_max_restart_intensity' \
  /var/log/rabbitmq/rabbit@$(hostname -s).log

# Confirm broker, Erlang/OTP, and plugin versions match expectations
rabbitmq-diagnostics status | grep -iE 'RabbitMQ version|Erlang'
rabbitmq-plugins list -e

Read the reason first, then the top stacktrace frame{rabbit_amqqueue,with,2,...,line 540} above tells you the failure happened while looking up a queue, which combined with {badmatch,{error,not_found}} pinpoints a missing queue.

Step-by-Step Resolution

  1. Pair the reports. For each SUPERVISOR REPORT, find the matching CRASH REPORT with the same PID — that crash is the real cause.

  2. Decode the reason head. Map it to a class: noproc/not_found is a missing dependency; badmatch/case_clause/function_clause is unexpected data; badarg is wrong-type/corrupt data; {timeout,...} is an overloaded callee.

  3. Read the top stacktrace frame. The first {Module,Function,Arity,...} is where it broke. The module name (rabbit_channel, rabbit_amqqueue, rabbit_mgmt_db, a plugin module) localises the subsystem.

  4. Act on the class:

    • noproc/not_found: expected during deletes/restarts if rare and the supervisor restarts the child — confirm it is not a flood. If constant, fix the client that references deleted objects.
    • function_clause/case_clause: trace the offending client or plugin sending unexpected input; capture the frame before the crash.
    • badarg: suspect corruption — check the message store/Mnesia and recent disk errors.
    • timeout: investigate the overloaded callee (busy queue, stats DB) rather than the crashing process.
  5. Check restart intensity. If you see reached_max_restart_intensity, the crashes are frequent enough to take down a supervisor; treat it as an outage, not noise.

  6. Verify versions when the reason is undef/function_clause in a plugin — a plugin built for another broker/OTP version must be rebuilt or replaced.

  7. Confirm resolution by watching the crash count after the fix:

    watch -n 5 "sudo grep -c 'CRASH REPORT' /var/log/rabbitmq/rabbit@$(hostname -s).log"

Prevention and Best Practices

  • Treat a steady stream of CRASH REPORT lines as a signal, not background noise — alert on the rate and on reached_max_restart_intensity.
  • Make clients tolerant of races: handle queues/connections disappearing instead of assuming they persist, which removes most noproc/not_found crashes.
  • Keep plugins built against the exact broker/OTP version you run; mismatches are a top source of undef/function_clause.
  • Capture and ship the full multi-line reason and stacktrace (not just the first line) so the top frame is always available for diagnosis.
  • Watch disk health and the message store for badarg/corruption-class crashes before they spread.
  • Keep nodes with CPU/memory headroom so {timeout,{gen_server,call,...}} crashes from overloaded internal processes do not occur.
  • operation timed out (RPC) — the {timeout,{gen_server,call,...}} reason seen from the rabbitmqctl side.
  • timeout waiting for tables / mnesia overloaded — startup-time crash patterns with their own reasons.
  • statistics database could not be contacted — a rabbit_mgmt_db timeout that can also appear as a crash reason.
  • shovel / federation worker terminated — plugin workers whose shutdown reasons follow the same tuple format.

More in the RabbitMQ troubleshooting series.

Frequently Asked Questions

Which line of a crash report matters most? The reason tuple (the failure class) and the top stacktrace frame (where it broke). Together they localise the bug.

Are crash reports always a problem? Not necessarily. A rare noproc/not_found during deletes that the supervisor restarts is often benign. A high or rising rate, or reached_max_restart_intensity, is a real problem.

What does {badmatch,{error,not_found}} mean? Code expected a success value but received {error,not_found} — typically a referenced queue/resource that did not exist at that moment.

How do I read the stacktrace order? Newest first. The first {Module,Function,Arity,{file,line}} is the immediate failure site; lower frames are the callers.

A plugin module appears in the crash — what now? Suspect a version mismatch or plugin bug. Rebuild the plugin for your exact RabbitMQ/Erlang version or disable it to confirm.

Free download · 368-page PDF

Download the Free 500-Prompt DevOps AI Toolkit

500 battle-tested, copy-paste AI prompts engineered by a senior systems engineer — every one with fill-in placeholders and safety/back-out notes. Drop your email and it's yours.

  • 500 prompts: Linux · Kubernetes · Terraform · OpenStack · GitLab · Docker · Monitoring · Incident Response
  • Instant PDF download — yours free, forever
  • Plus one practical AI-workflow email a week (no spam)

Single opt-in · unsubscribe anytime · no spam.