Debugging RabbitMQ Message Flow With Tracing and Firehose

The hardest RabbitMQ bug to argue about is the one where a publisher swears it sent the message and a consumer swears it never arrived. Both are usually telling the truth as they see it — the message was published to an exchange that routed it nowhere, or to a queue nobody was bound to, and it evaporated silently. RabbitMQ’s tracing plugin and the underlying firehose are how you stop guessing: they let you watch messages as they’re published and delivered, so you can see the exact point where the message stops existing. The catch is that tracing is expensive, and a careless trace on a busy broker becomes its own incident.

This is a debugging workflow with a real-time data feed, which is a great use of AI — the model is good at reading trace output and reasoning about where in the publish-route-deliver path a message died. But it’s also where AI will cheerfully tell you to trace everything on production, so the discipline is to scope the trace tightly and let the model interpret a small, targeted capture rather than firehose the whole broker.

Frame the question as “where does the message die”

Don’t ask AI to “debug RabbitMQ.” Describe the publish-to-consume gap and ask it to design a scoped trace that isolates the failure point.

A publisher reports it’s publishing order events to the “orders” topic exchange with routing key “order.created”, but the consumer bound to the “order-processing” queue says it never receives them. Help me set up a targeted RabbitMQ trace to find where the message is lost — is it failing to route at the exchange, landing in a different queue, or being delivered and dropped? Keep the trace scoped so it doesn’t overload the broker.

The reasoning you want walks the path: published-but-not-routed shows up as a published event with no corresponding delivery (a routing/binding problem); routed-to-wrong-queue shows a delivery to an unexpected queue (a binding-key mismatch); delivered-then-lost shows delivery but a consumer that nacks or disconnects. The model should tell you which trace events distinguish these.

Turn on tracing, scoped and brief

Enable the tracing plugin, then start a trace bound narrowly to the exchange and routing key in question — not the whole vhost.

rabbitmq-plugins enable rabbitmq_tracing

# Scoped trace: only publishes to the orders exchange, written to a file
rabbitmqctl trace_on

rabbitmqadmin declare queue name=trace-capture durable=false auto_delete=true
rabbitmqadmin declare binding source=amq.rabbitmq.trace \
  destination=trace-capture \
  routing_key='publish.orders'

The firehose publishes trace events to the amq.rabbitmq.trace topic exchange with routing keys like publish.<exchange> and deliver.<queue>. Binding only publish.orders (and the relevant deliver.* keys) keeps the capture to the messages you care about. Pull a small sample, then turn tracing off — leaving it on, especially via the management tracing UI which logs to disk, adds overhead to every message and fills disk on a busy broker.

rabbitmqctl trace_off

Let AI read the capture

Hand the model a handful of captured trace events and ask it to locate the break.

Here are the trace events for one order.created message: a publish event to the “orders” exchange with routing key “order.created”, and no matching deliver event to any queue. What does the absence of a deliver event tell me, and what should I check next?

A publish event with no deliver event is the smoking gun for a routing failure — the exchange accepted the message but no binding matched, so it was dropped (or would have gone to an alternate exchange if one were configured). The next check is the binding between the orders exchange and the order-processing queue: a topic-exchange binding key of order.* matches order.created, but order or orders.created does not. AI reads this pattern reliably once you give it the events.

Where AI oversteps on tracing

The dangerous overreach is scope. Ask for a trace and the model will often suggest enabling the management tracing UI across the whole vhost, which writes every message to a log file and can swamp a production broker. I always constrain it: “Give me a trace scoped to one exchange and routing key, captured to a short-lived queue, that I turn off after a small sample.” A trace plan that doesn’t include turning it back off is incomplete.

The other gap is forgetting the unroutable-message story. AI will reason about delivery failures and skip the possibility that the message was never routable in the first place — no matching binding, no alternate exchange, silently dropped. When a publish event has no deliver, make sure the model considers binding mismatch and the absence of an alternate exchange or mandatory flag, not just consumer-side problems. That’s the most common cause of the “I published it and it vanished” complaint.

My tracing loop

I describe the publish-to-consume gap, have the AI design a tightly scoped trace bound to the specific exchange and routing key, and capture a small sample to a short-lived queue with tracing turned off immediately after. Then I hand the captured events back to the model to locate where the message died — at the exchange, at the wrong queue, or after delivery. I verify its read against the actual bindings with list_bindings before changing anything. The AI interprets the trace; the scoped, brief capture is what keeps the debugging from becoming the next incident.

A publish-with-no-deliver almost always points back to bindings, so this pairs with the exchanges and routing keys guide, and the consumer-side drops connect to the queue investigation workflow. The broader RabbitMQ category collects these debugging topics, and the tracing prompts I run this with live with my other prompts.

Debugging RabbitMQ Message Flow With Tracing and Firehose Using AI