RabbitMQ Error Guide: 'statistics database could not be

Exact Error Message

The management UI banner and the HTTP API both report that metrics cannot be produced:

Statistics database could not be contacted. Message rates and queue lengths
may be temporarily unavailable.

The API returns a 503 or 500 with a body like:

HTTP/1.1 503 Service Unavailable
{"error":"Internal Server Error","reason":"{error,{timeout,...}}"}

The broker log shows the metrics collector falling behind or timing out:

2026-06-29 11:42:18.004 [warning] <0.987.0> Statistics database could not be contacted.
2026-06-29 11:42:48.219 [error] <0.991.0> Error generating metrics: {timeout,
  {gen_server,call,[rabbit_mgmt_db,{get_overview,...},30000]}}
2026-06-29 11:43:01.560 [warning] <0.987.0> Management DB: queue events backlog 184213,
  dropping older samples

What the Error Means

The management plugin keeps an in-memory statistics database (the rabbit_mgmt_db / metrics collector) that aggregates events emitted by every connection, channel, queue, and node — message rates, queue depths, delivery counts. The UI and /api/* endpoints query this aggregator. When the aggregator cannot keep up with the volume of events, queries to it time out, and the plugin reports “statistics database could not be contacted” or “error generating metrics.”

Crucially, this is a metrics-plane failure, not a data-plane failure. Messages keep flowing through queues normally; only the dashboard and the metrics API are degraded. The collector is a single process per node, so on a large topology with high event rates it becomes a bottleneck while AMQP itself stays healthy.

Common Causes

Too many objects emitting events. Tens of thousands of queues/connections/channels generate more samples than the collector can aggregate.
Fine-grained rates mode under load. detailed/basic rates with short sample intervals multiply the per-object work.
A burst of short-lived connections/channels. Connection churn floods the collector with create/delete events.
Undersized node. CPU-starved or memory-pressured nodes cannot drain the event backlog.
A long, expensive API query. Fetching /api/queues for the whole cluster with no pagination forces a huge aggregation in one call.
Stats collection interval misconfigured. A very low collect_statistics_interval increases overhead disproportionately.

How to Reproduce the Error

Create churn and a large topology, then hammer the metrics API:

# create many queues to inflate the event volume
for i in $(seq 1 20000); do
  rabbitmqadmin declare queue name=load-$i durable=false
done

# repeatedly request the full, unpaginated queue list
while true; do curl -s -u admin:admin \
  http://localhost:15672/api/queues >/dev/null; done

On a modest node the collector backlog grows, /api/overview starts timing out, and the UI shows “statistics database could not be contacted.”

Diagnostic Commands

# Is the metrics/stats collector overloaded? Check the management DB process
rabbitmq-diagnostics observer --interval 5   # watch rabbit_mgmt_db / metrics procs

# How large is the topology the collector must aggregate?
rabbitmqctl list_queues --no-table-headers name | wc -l
rabbitmqctl list_connections --no-table-headers name | wc -l
rabbitmqctl list_channels --no-table-headers name | wc -l

48213   # queues
9120    # connections
26540   # channels

# Pull metrics/timeout errors from the log
sudo grep -iE 'statistics database|generating metrics|Management DB' \
  /var/log/rabbitmq/rabbit@$(hostname -s).log | tail -15

# Check the configured rates mode and collection interval
rabbitmq-diagnostics environment | grep -iE 'rates_mode|collect_statistics'

# Time a lightweight vs heavyweight API call to see where it stalls
time curl -s -u admin:admin http://localhost:15672/api/overview >/dev/null
time curl -s -u admin:admin 'http://localhost:15672/api/queues?page=1&page_size=100' >/dev/null

A fast /api/overview but slow/failing full /api/queues points squarely at aggregation volume.

Step-by-Step Resolution

Confirm it is metrics-only. Verify AMQP is healthy (rabbitmq-diagnostics check_running, queues still draining). If publishing/consuming works, the problem is the stats collector, not the broker.
Reduce rates granularity. In rabbitmq.conf, set a lighter rates mode:
```
management.rates_mode = basic
```
Use none if you only need static topology and not per-object rate charts; this dramatically cuts collector work.
Paginate every API query. Stop fetching the whole cluster at once; request ?page=1&page_size=100 and select only needed columns with ?columns=name,messages so the aggregator returns less.
Cut connection/channel churn. Make clients use long-lived connections and channels instead of opening one per operation; churn is a top driver of collector backlog.
Lengthen the collection interval if it was set aggressively, trading dashboard freshness for headroom.
Move metrics off the management DB entirely. For large clusters, scrape with rabbitmq_prometheus (port 15692), which reads native per-object metrics without the aggregating collector, and keep the UI for ad-hoc use.
Scale or rebalance the node if it is simply CPU/memory starved, so the collector can drain its backlog.

Verify recovery by re-running the timed /api/overview call and confirming the log backlog warnings stop.

Prevention and Best Practices

Prefer rabbitmq_prometheus for ongoing monitoring; reserve the management DB for interactive debugging.
Always paginate and column-filter management API calls in automation — never pull /api/queues for the whole cluster unbounded.
Keep connection and channel counts down with pooling and long-lived clients; churn is the silent killer of the stats collector.
Right-size rates mode: basic or none on large topologies, detailed only on small clusters or short investigations.
Alert on statistics database could not be contacted and on collector backlog log lines so degradation is caught before the UI goes dark.
Size broker nodes with CPU headroom; the collector competes with AMQP for cores under load.

operation timed out (RPC) — rabbitmqctl calls time out against a busy node; a related overload symptom on the control plane.
management listener failed to start — the UI never comes up at all, versus loading but failing on metrics.
memory/disk alarm (resource alarm set) — node-level pressure that also starves the collector.
HTTP access denied — a 401/403 auth failure, not a 500/503 metrics failure.

Frequently Asked Questions

Are my messages being lost when I see this error? No. This is a metrics-plane failure. Queues keep delivering; only the dashboard and metrics API are degraded.

What is the fastest mitigation? Lower management.rates_mode to basic or none and paginate API calls. Both cut collector load immediately.

Should I keep using the management API for monitoring at scale? No. Use rabbitmq_prometheus on port 15692 for production monitoring; it bypasses the aggregating stats database.

Why does /api/overview work but /api/queues time out? overview is a small aggregate; an unpaginated queues call forces the collector to assemble every queue’s stats in one request.

Does connection churn really matter that much? Yes. Each create/delete emits events the collector must process. Long-lived connections and channels are one of the biggest reductions you can make.

RabbitMQ Error Guide: 'statistics database could not be contacted' Metrics Failure

Exact Error Message

What the Error Means

Common Causes

How to Reproduce the Error

Diagnostic Commands

Step-by-Step Resolution

Prevention and Best Practices

Frequently Asked Questions

Download the Free 500-Prompt DevOps AI Toolkit

Exact Error Message

What the Error Means

Common Causes

How to Reproduce the Error

Diagnostic Commands

Step-by-Step Resolution

Prevention and Best Practices

Related Errors

Frequently Asked Questions

Download the Free 500-Prompt DevOps AI Toolkit