Skip to content
DevOps AI ToolKit
Newsletter
All guides
AI for RabbitMQ By James Joyner IV · · 9 min read

RabbitMQ Error Guide: 'Discarding message in an old incarnation' Stale Node Reference

Fix 'Discarding message in an old incarnation' in RabbitMQ: node restarts, partition recovery, stale process references, and mirrored queue leftovers.

  • #rabbitmq
  • #troubleshooting
  • #errors
  • #clustering

Exact Error Message

You will typically find this in the RabbitMQ log on a node that recently restarted or rejoined a cluster:

=WARNING REPORT==== 24-Jun-2026::14:02:17.913445 ===
Discarding message {'$gen_cast',{gm,{publish,<<131,...>>}}} from <0.812.0>
to <0.0.0> in an old incarnation (3) of this node (4)

You may also see variants pointing at a specific named node and a different process target:

=WARNING REPORT==== 24-Jun-2026::14:02:18.004221 ===
Discarding message {'$gen_call',{<0.991.0>,#Ref<0.0.1.4471>},which_children}
from <0.812.0> to <0.0.0> in an old incarnation (3) of this node (4)

In a journalctl view of rabbitmq-server the same line is collapsed onto one row:

Jun 24 14:02:17 node1 rabbitmq-server[2471]: Discarding message {'$gen_cast',{gm,...}} from <0.812.0> to <0.0.0> in an old incarnation (3) of this node (4)

The numbers in parentheses are the key signal: (3) is the old incarnation the message was addressed to, and (4) is the current incarnation of rabbit@node1 after its latest boot.

What the Error Means

Every time an Erlang node (a RabbitMQ broker) boots, it is assigned an incarnation number — a monotonically increasing boot generation. Peers in the cluster hold references to your node’s processes that are tagged with the incarnation that was current when the reference was created.

When your node restarts, its incarnation increments (here, from 3 to 4). A peer that has not yet noticed the restart may still send a message addressed to a process from incarnation 3. The local distribution layer recognizes that the message targets a stale incarnation and refuses to deliver it. Rather than misrouting it to an unrelated process that happens to reuse the same pid, RabbitMQ logs a WARNING and drops the message.

This is almost always a transient WARNING, not a fatal error. The peer’s reference becomes valid again as soon as it learns the new node identity and re-establishes its links (usually within seconds). A handful of these lines clustered around a single restart timestamp is expected and benign.

The distinction that matters: a small burst right after a known restart is harmless. A continuous stream of these warnings, or repeating bursts every few minutes, signals a flapping node or an unhealed partition — the node keeps restarting or peers keep losing and regaining contact. That is what you should investigate.

Common Causes

  • Node restart or crash-recovery. The most common trigger. The node bumped its incarnation; a peer briefly used the old reference. One burst, then silence.
  • Network partition healing. After a partition resolves, peers reconnect and replay queued internal messages, some addressed to the pre-partition incarnation.
  • Mirrored queue / GM leftovers. The Guaranteed Multicast (gm) group that backs classic mirrored queues keeps member references. A member that restarted leaves stale gm casts in flight from surviving mirrors.
  • Stale process references in $gen_call / $gen_cast. Long-lived gen_server links across nodes (queue masters, slaves, federation links) cache pids that no longer point at a live incarnation.
  • Flapping node. Repeated OOM kills, liveness-probe restarts in Kubernetes, or a crash loop cause the incarnation to climb rapidly, producing a steady drip of these warnings.

How to Reproduce the Error

You can deliberately produce a clean, single-burst occurrence in a lab cluster. Do this only in a non-production environment.

# On a three-node cluster, gracefully restart one node and watch a peer's log.
# (Run the restart in your orchestrator / systemd; tail the OTHER node's log.)
tail -f /var/log/rabbitmq/rabbit@node1.log

When node2 restarts, node1 (which still references node2’s previous incarnation through mirrored queues or cluster links) emits the Discarding message ... in an old incarnation warning for a few seconds, then settles. Restarting a node that hosts classic mirrored queue mirrors makes the gm variant appear most reliably.

Diagnostic Commands

Start by confirming the cluster is healthy and identifying which node restarted and when. All of these are read-only.

# Overall cluster view: running nodes, partitions, and alarms
rabbitmqctl cluster_status
# Broker status, including uptime and the running applications
rabbitmq-diagnostics status
# Quick liveness check; non-zero exit means the node is not running
rabbitmq-diagnostics check_running
# When did the host (and likely the broker) last boot?
uptime
who -b
# RabbitMQ service restart history with timestamps
journalctl -u rabbitmq-server --since "1 hour ago" --no-pager
# Count the discard warnings to tell a one-off burst from a flap
grep -c "old incarnation" /var/log/rabbitmq/rabbit@node1.log
# See the surrounding context and timestamps of the warnings
grep "old incarnation" /var/log/rabbitmq/rabbit@node1.log | tail -n 20
# Mirrored/quorum queue health — look for queues missing members
rabbitmqctl list_queues name type state messages

A healthy cluster_status should show every node running and an empty partitions list:

Cluster status of node rabbit@node1 ...
Basics

Cluster name: rabbit@node1.example.com

Disk Nodes

rabbit@node1
rabbit@node2
rabbit@node3

Running Nodes

rabbit@node1
rabbit@node2
rabbit@node3

Network Partitions

(none)

If you instead see entries under Network Partitions, the warnings are a symptom of a real split that still needs healing.

Step-by-Step Resolution

  1. Confirm it correlates with a restart. Cross-reference the warning timestamps from your grep with the boot time from who -b and the journalctl -u rabbitmq-server history. If the warnings stop within a minute of a single restart, no action is needed — the cluster self-heals as peers refresh their references.

  2. Check for partitions. Run rabbitmqctl cluster_status and inspect the Network Partitions section. If it is empty, the warning was transient. If it lists nodes, follow your partition-recovery runbook (the correct cluster_partition_handling strategy will dictate whether RabbitMQ auto-heals or waits for a manual restart of the losing side).

  3. Quantify the noise. Use grep -c "old incarnation" over a known window. A handful of lines per restart event is normal. Hundreds, or new lines appearing continuously, indicates a flapping node.

  4. Find the flap source if it is ongoing. Inspect journalctl -u rabbitmq-server for repeated start/stop cycles, and the RabbitMQ log for memory resource limit alarm or OOM-related shutdowns. On Kubernetes, check whether the liveness probe is killing the pod (probe timeout too aggressive vs. a slow rabbitmq-diagnostics check_running).

  5. Verify queue members recovered. After things settle, rabbitmqctl list_queues name type state should show every queue running with the expected mirrors/quorum members. Classic mirrored queues re-sync on their own once the restarted node is back in the gm group.

  6. Stop chasing it once stable. If cluster_status is clean, no partitions exist, and the warnings have ceased, the incident is closed. There is no command to “clear” old incarnations — they expire naturally.

If you want a guided, automated walk-through that ties these signals together during an incident, the incident response assistant can correlate restart timestamps with cluster state for you.

Prevention and Best Practices

  • Right-size memory and watch the high watermark. Most flapping traces back to the broker hitting vm_memory_high_watermark and being killed. Provision headroom and monitor memory alarms.
  • Tune liveness probes. In Kubernetes, give check_running / check_port_connectivity a generous timeoutSeconds and failureThreshold so a momentarily busy node is not restarted mid-recovery.
  • Prefer quorum queues over classic mirrored queues. Quorum queues use Raft and recover from member restarts more predictably, eliminating the gm-flavored variant of this warning.
  • Use pause_minority partition handling (for three or more nodes) so the cluster fails fast and recovers cleanly instead of producing prolonged stale-reference noise.
  • Restart nodes one at a time with full re-sync between restarts during rolling upgrades, so peers always have a clean current incarnation to reference.
  • Alert on the rate, not the presence. Treat occasional warnings as informational; alert only when the per-minute count crosses a threshold, which catches genuine flaps.

These messages often appear alongside the discard warning during the same cluster event:

  • Mnesia(...): ** ERROR ** mnesia_event got {inconsistent_database, running_partitioned_network, ...} — the canonical Mnesia network partition signal. If you see this, the discard warnings are partition symptoms, not restart noise.
  • timeout_waiting_for_tables — emitted at boot when a rejoining node cannot reach the peer holding the authoritative schema. Common when nodes are restarted in the wrong order.
  • node down / rabbit_node_monitor “node rabbit@node2 down” — the monitor noticing a peer left, which is precisely the event that leaves stale incarnation references behind.

Browse more broker fixes in the RabbitMQ category.

Frequently Asked Questions

Is this error safe to ignore? Usually, yes. A short burst of Discarding message ... in an old incarnation immediately after a node restart is expected and self-healing. Investigate only if the warnings are continuous or recur in bursts, which points to a flapping node or an unhealed partition.

Why does the incarnation number keep increasing? The incarnation is a boot generation counter — it bumps every time the node starts. A rapidly climbing number across a short window means the node is restarting repeatedly, which is the real problem to chase.

Does this mean I lost messages? No. The discarded item is an internal Erlang cluster control message (a $gen_cast or $gen_call) addressed to a process that no longer exists, not an AMQP message from a publisher. Peers re-establish their links and re-send the relevant internal traffic to the current incarnation.

How do I tell a benign restart from a flapping node? Correlate the warning timestamps with who -b and journalctl -u rabbitmq-server. One cluster of warnings tied to a single boot is benign. Multiple restart events, or warnings that never stop, mean the node is unstable — check memory alarms and liveness probes.

Can I force the cluster to clear the old incarnation references? There is no command for that, and you do not need one. Stale references expire as soon as peers detect the new node identity. Forcing additional restarts only increases the incarnation again and prolongs the noise.

Free download · 368-page PDF

Download the Free 500-Prompt DevOps AI Toolkit

500 battle-tested, copy-paste AI prompts engineered by a senior systems engineer — every one with fill-in placeholders and safety/back-out notes. Drop your email and it's yours.

  • 500 prompts: Linux · Kubernetes · Terraform · OpenStack · GitLab · Docker · Monitoring · Incident Response
  • Instant PDF download — yours free, forever
  • Plus one practical AI-workflow email a week (no spam)

Single opt-in · unsubscribe anytime · no spam.