Kafka Error Guide: 'UnknownMemberIdException' Coordinator Is

Exact Error Message

org.apache.kafka.common.errors.UnknownMemberIdException: The coordinator is not aware of this member.
	at org.apache.kafka.clients.consumer.internals.AbstractCoordinator$HeartbeatResponseHandler.handle(AbstractCoordinator.java:1486)
	at org.apache.kafka.clients.consumer.internals.AbstractCoordinator$HeartbeatResponseHandler.handle(AbstractCoordinator.java:1455)
	at org.apache.kafka.clients.consumer.internals.AbstractCoordinator$CoordinatorResponseHandler.onSuccess(AbstractCoordinator.java:1264)
	at org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1230)
	at com.example.ingest.IngestConsumer.run(IngestConsumer.java:67)

It commonly follows a heartbeat-failure WARN as the coordinator drops the member:

INFO  o.a.k.c.c.internals.AbstractCoordinator - [Consumer clientId=ingest-7, groupId=ingest-pipeline]
  Member ingest-7-9f2c1a3b sending LeaveGroup request to coordinator due to consumer poll timeout has expired.
WARN  o.a.k.c.c.internals.ConsumerCoordinator - [Consumer clientId=ingest-7, groupId=ingest-pipeline]
  Offset commit failed: The coordinator is not aware of this member; it is likely the member was removed
  from the group after the session timeout elapsed.

What the Error Means

When a consumer joins a group, the coordinator assigns it a member id (e.g., ingest-7-9f2c1a3b). The member must keep that id alive by sending heartbeats within session.timeout.ms and by calling poll() within max.poll.interval.ms. UnknownMemberIdException is thrown when the member sends a request (heartbeat, commit, or sync) using a member id the coordinator has already discarded — because it decided the member was gone and evicted it from the group.

The coordinator removes a member id when heartbeats stop arriving within the session timeout, when the poll interval is exceeded, or when the member explicitly sent a LeaveGroup. After eviction the id is forgotten; any request carrying it gets UnknownMemberIdException. The recovery is for the client to reset its member id, rejoin the group from scratch (it receives a brand-new id), and resume. The Kafka consumer client does this automatically on the next poll(), so the exception is recoverable — the real work is stopping whatever caused the eviction.

Common Causes

Session timeout exceeded — heartbeats stopped (network blip, paused thread, broker unreachable) for longer than session.timeout.ms, so the coordinator evicted the member.
max.poll.interval.ms exceeded — slow processing meant poll() was not called in time; the client proactively leaves the group and the id becomes unknown.
Heartbeat thread starvation — long GC pauses or a blocked JVM prevented the background heartbeat thread from running.
Coordinator change/broker restart — the group coordinator moved (broker failover) and the member’s stale id is not recognized by the new coordinator until it rejoins.
session.timeout.ms too low relative to network jitter, so brief delays trip eviction.
Network partition between consumer and broker spanning the session timeout.

How to Reproduce the Error

Set a short session timeout and starve the heartbeat thread by blocking the JVM:

props.put(ConsumerConfig.SESSION_TIMEOUT_MS_CONFIG, 6000);
props.put(ConsumerConfig.HEARTBEAT_INTERVAL_MS_CONFIG, 2000);

consumer.poll(Duration.ofMillis(500));
// Block long enough that the background heartbeat thread cannot keep the session alive
// (e.g., a synchronous external call or a forced full GC) for > 6s.
heavyBlockingCall(10_000);
consumer.poll(Duration.ofMillis(500)); // coordinator evicted the member -> UnknownMemberIdException

After the session timeout passes with no heartbeat, the coordinator removes the member id; the next interaction throws.

Diagnostic Commands

All read-only. Confirm the member was evicted and find why heartbeats lapsed.

# Group state and member list — an evicted member disappears here
kafka-consumer-groups.sh --bootstrap-server localhost:9092 \
  --group ingest-pipeline --describe --members --verbose

# Group state — Empty/PreparingRebalance right after the error confirms eviction
kafka-consumer-groups.sh --bootstrap-server localhost:9092 \
  --group ingest-pipeline --state

# Lag — jumps when a member is evicted and its partitions are reassigned
kafka-consumer-groups.sh --bootstrap-server localhost:9092 \
  --group ingest-pipeline --describe

# Effective timeouts driving eviction
grep -iE "session.timeout.ms|heartbeat.interval.ms|max.poll.interval.ms" /var/log/ingest-consumer/app.log

# Find the LeaveGroup / poll-timeout / coordinator-change events
journalctl -u ingest-consumer --since "1 hour ago" | grep -iE "LeaveGroup|poll timeout|coordinator|not aware of this member"

Step-by-Step Resolution

Confirm eviction, not a one-off. Check --members and the journalctl LeaveGroup line. A single occurrence during a broker failover is benign; repeated occurrences mean a persistent heartbeat problem.

Let the client rejoin. The consumer resets its member id and rejoins on the next poll(). Catch the exception so the loop continues rather than dying:

try {
    records = consumer.poll(Duration.ofSeconds(1));
} catch (UnknownMemberIdException e) {
    // member id was evicted; the next poll() rejoins with a fresh id
}

Fix the heartbeat lapse. If the trigger is the session timeout, ensure heartbeat.interval.ms is about one-third of session.timeout.ms, and confirm the JVM is not being frozen by GC or blocking calls on the consumer thread.
Fix the poll-interval lapse. If the trigger is max.poll.interval.ms, reduce max.poll.records or move slow processing off the poll thread (worker pool + pause/resume) so poll() is called on time.
Tune timeouts for your network. If brief jitter causes eviction, raise session.timeout.ms within broker-allowed bounds (group.min.session.timeout.ms/group.max.session.timeout.ms) to tolerate it.
Stabilize with static membership. Set group.instance.id so a quick restart reuses the same identity and survives within the session timeout without a full rejoin.
Verify. Re-run --state for Stable and confirm the member reappears in --members with shrinking lag.

Prevention and Best Practices

Keep heartbeat.interval.ms near one-third of session.timeout.ms so transient delays do not evict the member.
Keep processing under max.poll.interval.ms; offload heavy work and use pause/resume to keep polling.
Use static membership (group.instance.id) for fleets that restart frequently.
Tune the JVM to avoid long stop-the-world pauses that starve the heartbeat thread.
Monitor consumer heartbeat/last-poll metrics and alert before they approach the timeouts.
Make processing idempotent so the redelivery after a rejoin is harmless. For a fast read on a specific eviction, the free incident assistant can turn the log and group state into a likely cause.

CommitFailedException — surfaced when a commit follows the eviction; the human-facing symptom of the same root problem.
IllegalGenerationException — the member is a generation stale rather than fully unknown; a milder form of the same staleness.
RebalanceInProgressException — the group is mid-rebalance; recoverable by re-polling.
FencedInstanceIdException — for static members, a duplicate group.instance.id fences this one instead of evicting on timeout.

Frequently Asked Questions

Is UnknownMemberIdException recoverable? Yes. The client resets its member id and rejoins on the next poll(). The exception is the symptom of an eviction that already happened; your job is to prevent the eviction from recurring and to ensure processing is idempotent.

What is the difference from IllegalGenerationException? UnknownMemberIdException means the coordinator has entirely forgotten your member id (typically after a session timeout or LeaveGroup). IllegalGenerationException means it still knows you but you are a generation behind. The unknown-member case implies a longer absence.

Why did this happen right after a broker restart? The group coordinator may have moved to another broker during failover. The member’s old id is not recognized by the new coordinator until it rejoins, producing a transient UnknownMemberIdException.

Does raising session.timeout.ms fix it? It helps when brief network jitter is the cause, but it also delays detection of genuinely dead consumers. Fix heartbeat starvation and poll-interval violations first; raise the timeout only to absorb real jitter.

Kafka Error Guide: 'UnknownMemberIdException' Coordinator Is Not Aware of This Member

Exact Error Message

What the Error Means

Common Causes

How to Reproduce the Error

Diagnostic Commands

Step-by-Step Resolution

Prevention and Best Practices

Frequently Asked Questions

Download the Free 500-Prompt DevOps AI Toolkit

Exact Error Message

What the Error Means

Common Causes

How to Reproduce the Error

Diagnostic Commands

Step-by-Step Resolution

Prevention and Best Practices

Related Errors

Frequently Asked Questions

Download the Free 500-Prompt DevOps AI Toolkit