Kafka Error Guide: 'UnknownMemberIdException' Coordinator Is Not Aware of This Member
Fix Kafka UnknownMemberIdException: why the group coordinator evicts a consumer member id after session timeouts, and how to keep heartbeats alive to rejoin cleanly.
- #kafka
- #troubleshooting
- #errors
- #consumer
Exact Error Message
org.apache.kafka.common.errors.UnknownMemberIdException: The coordinator is not aware of this member.
at org.apache.kafka.clients.consumer.internals.AbstractCoordinator$HeartbeatResponseHandler.handle(AbstractCoordinator.java:1486)
at org.apache.kafka.clients.consumer.internals.AbstractCoordinator$HeartbeatResponseHandler.handle(AbstractCoordinator.java:1455)
at org.apache.kafka.clients.consumer.internals.AbstractCoordinator$CoordinatorResponseHandler.onSuccess(AbstractCoordinator.java:1264)
at org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1230)
at com.example.ingest.IngestConsumer.run(IngestConsumer.java:67)
It commonly follows a heartbeat-failure WARN as the coordinator drops the member:
INFO o.a.k.c.c.internals.AbstractCoordinator - [Consumer clientId=ingest-7, groupId=ingest-pipeline]
Member ingest-7-9f2c1a3b sending LeaveGroup request to coordinator due to consumer poll timeout has expired.
WARN o.a.k.c.c.internals.ConsumerCoordinator - [Consumer clientId=ingest-7, groupId=ingest-pipeline]
Offset commit failed: The coordinator is not aware of this member; it is likely the member was removed
from the group after the session timeout elapsed.
What the Error Means
When a consumer joins a group, the coordinator assigns it a member id (e.g., ingest-7-9f2c1a3b). The member must keep that id alive by sending heartbeats within session.timeout.ms and by calling poll() within max.poll.interval.ms. UnknownMemberIdException is thrown when the member sends a request (heartbeat, commit, or sync) using a member id the coordinator has already discarded — because it decided the member was gone and evicted it from the group.
The coordinator removes a member id when heartbeats stop arriving within the session timeout, when the poll interval is exceeded, or when the member explicitly sent a LeaveGroup. After eviction the id is forgotten; any request carrying it gets UnknownMemberIdException. The recovery is for the client to reset its member id, rejoin the group from scratch (it receives a brand-new id), and resume. The Kafka consumer client does this automatically on the next poll(), so the exception is recoverable — the real work is stopping whatever caused the eviction.
Common Causes
- Session timeout exceeded — heartbeats stopped (network blip, paused thread, broker unreachable) for longer than
session.timeout.ms, so the coordinator evicted the member. max.poll.interval.msexceeded — slow processing meantpoll()was not called in time; the client proactively leaves the group and the id becomes unknown.- Heartbeat thread starvation — long GC pauses or a blocked JVM prevented the background heartbeat thread from running.
- Coordinator change/broker restart — the group coordinator moved (broker failover) and the member’s stale id is not recognized by the new coordinator until it rejoins.
session.timeout.mstoo low relative to network jitter, so brief delays trip eviction.- Network partition between consumer and broker spanning the session timeout.
How to Reproduce the Error
Set a short session timeout and starve the heartbeat thread by blocking the JVM:
props.put(ConsumerConfig.SESSION_TIMEOUT_MS_CONFIG, 6000);
props.put(ConsumerConfig.HEARTBEAT_INTERVAL_MS_CONFIG, 2000);
consumer.poll(Duration.ofMillis(500));
// Block long enough that the background heartbeat thread cannot keep the session alive
// (e.g., a synchronous external call or a forced full GC) for > 6s.
heavyBlockingCall(10_000);
consumer.poll(Duration.ofMillis(500)); // coordinator evicted the member -> UnknownMemberIdException
After the session timeout passes with no heartbeat, the coordinator removes the member id; the next interaction throws.
Diagnostic Commands
All read-only. Confirm the member was evicted and find why heartbeats lapsed.
# Group state and member list — an evicted member disappears here
kafka-consumer-groups.sh --bootstrap-server localhost:9092 \
--group ingest-pipeline --describe --members --verbose
# Group state — Empty/PreparingRebalance right after the error confirms eviction
kafka-consumer-groups.sh --bootstrap-server localhost:9092 \
--group ingest-pipeline --state
# Lag — jumps when a member is evicted and its partitions are reassigned
kafka-consumer-groups.sh --bootstrap-server localhost:9092 \
--group ingest-pipeline --describe
# Effective timeouts driving eviction
grep -iE "session.timeout.ms|heartbeat.interval.ms|max.poll.interval.ms" /var/log/ingest-consumer/app.log
# Find the LeaveGroup / poll-timeout / coordinator-change events
journalctl -u ingest-consumer --since "1 hour ago" | grep -iE "LeaveGroup|poll timeout|coordinator|not aware of this member"
Step-by-Step Resolution
-
Confirm eviction, not a one-off. Check
--membersand the journalctl LeaveGroup line. A single occurrence during a broker failover is benign; repeated occurrences mean a persistent heartbeat problem. -
Let the client rejoin. The consumer resets its member id and rejoins on the next
poll(). Catch the exception so the loop continues rather than dying:try { records = consumer.poll(Duration.ofSeconds(1)); } catch (UnknownMemberIdException e) { // member id was evicted; the next poll() rejoins with a fresh id } -
Fix the heartbeat lapse. If the trigger is the session timeout, ensure
heartbeat.interval.msis about one-third ofsession.timeout.ms, and confirm the JVM is not being frozen by GC or blocking calls on the consumer thread. -
Fix the poll-interval lapse. If the trigger is
max.poll.interval.ms, reducemax.poll.recordsor move slow processing off the poll thread (worker pool + pause/resume) sopoll()is called on time. -
Tune timeouts for your network. If brief jitter causes eviction, raise
session.timeout.mswithin broker-allowed bounds (group.min.session.timeout.ms/group.max.session.timeout.ms) to tolerate it. -
Stabilize with static membership. Set
group.instance.idso a quick restart reuses the same identity and survives within the session timeout without a full rejoin. -
Verify. Re-run
--stateforStableand confirm the member reappears in--memberswith shrinking lag.
Prevention and Best Practices
- Keep
heartbeat.interval.msnear one-third ofsession.timeout.msso transient delays do not evict the member. - Keep processing under
max.poll.interval.ms; offload heavy work and use pause/resume to keep polling. - Use static membership (
group.instance.id) for fleets that restart frequently. - Tune the JVM to avoid long stop-the-world pauses that starve the heartbeat thread.
- Monitor consumer heartbeat/last-poll metrics and alert before they approach the timeouts.
- Make processing idempotent so the redelivery after a rejoin is harmless. For a fast read on a specific eviction, the free incident assistant can turn the log and group state into a likely cause.
Related Errors
CommitFailedException— surfaced when a commit follows the eviction; the human-facing symptom of the same root problem.IllegalGenerationException— the member is a generation stale rather than fully unknown; a milder form of the same staleness.RebalanceInProgressException— the group is mid-rebalance; recoverable by re-polling.FencedInstanceIdException— for static members, a duplicategroup.instance.idfences this one instead of evicting on timeout.
Frequently Asked Questions
Is UnknownMemberIdException recoverable?
Yes. The client resets its member id and rejoins on the next poll(). The exception is the symptom of an eviction that already happened; your job is to prevent the eviction from recurring and to ensure processing is idempotent.
What is the difference from IllegalGenerationException?
UnknownMemberIdException means the coordinator has entirely forgotten your member id (typically after a session timeout or LeaveGroup). IllegalGenerationException means it still knows you but you are a generation behind. The unknown-member case implies a longer absence.
Why did this happen right after a broker restart?
The group coordinator may have moved to another broker during failover. The member’s old id is not recognized by the new coordinator until it rejoins, producing a transient UnknownMemberIdException.
Does raising session.timeout.ms fix it? It helps when brief network jitter is the cause, but it also delays detection of genuinely dead consumers. Fix heartbeat starvation and poll-interval violations first; raise the timeout only to absorb real jitter.
Download the Free 500-Prompt DevOps AI Toolkit
500 battle-tested, copy-paste AI prompts engineered by a senior systems engineer — every one with fill-in placeholders and safety/back-out notes. Drop your email and it's yours.
- 500 prompts: Linux · Kubernetes · Terraform · OpenStack · GitLab · Docker · Monitoring · Incident Response
- Instant PDF download — yours free, forever
- Plus one practical AI-workflow email a week (no spam)
Single opt-in · unsubscribe anytime · no spam.