Kafka Error Guide: 'CoordinatorNotAvailableException' Group

Exact Error Message

A consumer or admin operation that cannot reach its group coordinator fails like this:

org.apache.kafka.common.errors.CoordinatorNotAvailableException: The coordinator is not available.
        at org.apache.kafka.clients.consumer.internals.AbstractCoordinator$FindCoordinatorResponseHandler.onSuccess(AbstractCoordinator.java:920)

The lead-up in the client log shows the consumer repeatedly trying to discover the coordinator:

[Consumer clientId=consumer-1, groupId=orders-service] Group coordinator broker-2:9092 (id: 2147483645) is unavailable or invalid due to cause: coordinator unavailable. Rediscovery will be attempted.
[Consumer clientId=consumer-1, groupId=orders-service] FindCoordinator request failed: COORDINATOR_NOT_AVAILABLE
[Consumer clientId=consumer-1, groupId=orders-service] Coordinator unavailable; discovering new coordinator

You may also see the closely related COORDINATOR_LOAD_IN_PROGRESS while a coordinator is loading offsets.

What the Error Means

Every consumer group is managed by a group coordinator — a specific broker that owns the __consumer_offsets partition for that group (the partition is chosen by hashing the group id). The coordinator handles join/sync (rebalancing), heartbeats, and offset commits. To use a group, a client first sends a FindCoordinator request to learn which broker is the coordinator, then talks to that broker.

CoordinatorNotAvailableException means the broker that should be the coordinator for this group is not currently able to serve that role. The most common reasons: the __consumer_offsets partition for the group has no available leader, the coordinator broker is restarting or has just taken over and is still loading offsets into memory (COORDINATOR_LOAD_IN_PROGRESS), or the offsets topic is under-replicated/offline. It is usually transient and resolves once the partition has a healthy leader and finishes loading — but a persistent occurrence points at a real availability problem with __consumer_offsets.

Common Causes

__consumer_offsets partition offline or leaderless: The partition that maps to the group has no in-sync leader, so no broker can act as coordinator.
Coordinator broker restarting: During a rolling restart or crash recovery, the new coordinator loads offsets before serving; clients see COORDINATOR_LOAD_IN_PROGRESS then transient unavailability.
Under-replicated offsets topic: __consumer_offsets has too few in-sync replicas (e.g., after broker loss), preventing leadership/availability.
Offsets topic misconfigured at first start: offsets.topic.replication.factor set higher than available brokers means the topic never fully creates, so coordination never works.
Broker overload: A coordinator broker under heavy load is slow to respond to FindCoordinator, surfacing as intermittent unavailability.
Network partition: The client can reach bootstrap but not the specific coordinator broker.

How to Reproduce the Error

On a single-broker test cluster, set the offsets topic to require more replicas than exist, then start a consumer:

# server.properties on a 1-broker cluster
offsets.topic.replication.factor=3
offsets.topic.num.partitions=50

With only one broker, __consumer_offsets cannot reach replication factor 3, partitions stay unhealthy, and any consumer’s FindCoordinator returns COORDINATOR_NOT_AVAILABLE. Restarting the broker that leads the group’s offsets partition while a consumer is active reproduces the transient COORDINATOR_LOAD_IN_PROGRESS → unavailable sequence.

Diagnostic Commands

Look for coordinator-load and availability events on the brokers:

grep -nE "COORDINATOR_NOT_AVAILABLE|COORDINATOR_LOAD_IN_PROGRESS|Loading group metadata|Finished loading offsets" /var/log/kafka/server.log | tail -40

Inspect the health of the internal offsets topic — this is the key check:

kafka-topics.sh --bootstrap-server localhost:9092 --describe --topic __consumer_offsets | grep -E "Leader: -1|Isr:" | head -30

List under-replicated partitions across the cluster (includes __consumer_offsets):

kafka-topics.sh --bootstrap-server localhost:9092 --describe --under-replicated-partitions

Check the group’s state and which broker is its coordinator:

kafka-consumer-groups.sh --bootstrap-server localhost:9092 --describe --group orders-service --state

Confirm all expected brokers are alive and reachable:

kafka-broker-api-versions.sh --bootstrap-server localhost:9092 | grep -E "^[0-9.]+:9092"

In KRaft mode, confirm the metadata quorum is healthy (an unstable controller delays leadership for offsets partitions):

kafka-metadata-quorum.sh --bootstrap-server localhost:9092 describe --status

Step-by-Step Resolution

Check if it’s transient. During a restart or election the error self-heals within seconds once the coordinator finishes loading. If clients recover on retry, no action is needed beyond confirming the restart completed.
Inspect __consumer_offsets. If any partition shows Leader: -1 or a shrunken ISR, that is why no coordinator is available. Bring the responsible broker(s) back so the offsets partitions regain a leader.
Fix under-replication. If the offsets topic is under-replicated after broker loss, restore the missing brokers; leadership and ISR recovery makes the coordinator available again.
Fix first-start misconfiguration. If offsets.topic.replication.factor exceeds your broker count, reduce it to a value the cluster can satisfy so the topic creates healthily.
Relieve coordinator overload. If the coordinator broker is saturated, reduce load or rebalance partitions so it can answer FindCoordinator promptly.
Verify network path. Ensure clients can reach the specific coordinator broker, not just bootstrap.
Confirm recovery with kafka-consumer-groups.sh --describe --state showing the group Stable and committing offsets again.

Prevention and Best Practices

Set offsets.topic.replication.factor to at least 3 in production (and never above your broker count) so the coordinator survives a broker loss.
Monitor under-replicated partitions, and alert specifically when __consumer_offsets is affected — it impacts every consumer group.
Pace rolling restarts so only one broker is down at a time and offsets partitions always retain a leader.
Keep the controller/metadata quorum healthy; slow leadership election delays coordinator availability after restarts.
Avoid overloading brokers that host many __consumer_offsets partitions; spread leadership evenly.
For transient blips, ensure clients use sane retry/backoff so brief unavailability during elections does not surface as application errors. The free incident assistant can help confirm whether an occurrence is transient or a real offsets-topic outage.

COORDINATOR_LOAD_IN_PROGRESS / CoordinatorLoadInProgressException — the coordinator is loading offsets; retry shortly.
NotCoordinatorException — the client contacted a broker that is no longer the coordinator (rediscovery needed).
session timeout expired / member left group — a related group-stability failure.
Failed to update metadata — broader metadata-time failure that can accompany coordinator discovery problems.

Frequently Asked Questions

Is CoordinatorNotAvailableException always a serious problem? Often not. During elections, restarts, and offsets loading it is expected and transient; well-behaved clients retry and recover within seconds. It becomes serious when it persists, which signals a genuinely unavailable or under-replicated __consumer_offsets topic.

Why is __consumer_offsets so important here? That internal topic stores committed offsets and group metadata, and its partitions determine which broker is each group’s coordinator. If a group’s offsets partition has no leader, no broker can coordinate that group, producing this error.

What’s the difference from COORDINATOR_LOAD_IN_PROGRESS? COORDINATOR_LOAD_IN_PROGRESS means the correct coordinator is known but still reading offsets into memory — retry shortly. COORDINATOR_NOT_AVAILABLE means no coordinator can currently be designated, usually because the offsets partition lacks a healthy leader.

I set offsets.topic.replication.factor=3 on a 1-broker cluster and consumers can’t start — why? The offsets topic can’t reach replication factor 3 with one broker, so its partitions never become healthy and no coordinator is available. Set the factor to a value your broker count can satisfy (and raise it later as you add brokers).

Should my application crash on this error? No. Treat it as retryable. Configure reasonable retry/backoff so consumers ride out transient coordinator unavailability during normal cluster events instead of failing the application.

Kafka Error Guide: 'CoordinatorNotAvailableException' Group Coordinator Down

Exact Error Message

What the Error Means

Common Causes

How to Reproduce the Error

Diagnostic Commands

Step-by-Step Resolution

Prevention and Best Practices

Frequently Asked Questions

Download the Free 500-Prompt DevOps AI Toolkit

Exact Error Message

What the Error Means

Common Causes

How to Reproduce the Error

Diagnostic Commands

Step-by-Step Resolution

Prevention and Best Practices

Related Errors

Frequently Asked Questions

Download the Free 500-Prompt DevOps AI Toolkit