Kafka Error Guide: 'FencedInstanceIdException' Static Member Has Been Fenced
Fix Kafka FencedInstanceIdException: why duplicate group.instance.id values fence a static consumer member, and how to keep static membership ids unique.
- #kafka
- #troubleshooting
- #errors
- #consumer
Exact Error Message
org.apache.kafka.common.errors.FencedInstanceIdException: The broker rejected this static consumer since another consumer with the same group.instance.id has registered with a different member.id.
at org.apache.kafka.clients.consumer.internals.AbstractCoordinator$HeartbeatResponseHandler.handle(AbstractCoordinator.java:1492)
at org.apache.kafka.clients.consumer.internals.AbstractCoordinator$HeartbeatResponseHandler.handle(AbstractCoordinator.java:1455)
at org.apache.kafka.clients.consumer.internals.AbstractCoordinator$CoordinatorResponseHandler.onSuccess(AbstractCoordinator.java:1264)
at org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1230)
at com.example.ledger.LedgerConsumer.run(LedgerConsumer.java:74)
The fenced instance typically logs a fatal error and stops, because this is not auto-recoverable like a normal rebalance:
ERROR o.a.k.c.c.internals.AbstractCoordinator - [Consumer clientId=ledger-1, groupId=ledger-stream]
Received fatal exception: group.instance.id gets fenced
ERROR c.e.ledger.LedgerConsumer - Consumer for instance ledger-node-2 fenced; another process is using
the same group.instance.id. Shutting down to avoid split-brain.
What the Error Means
FencedInstanceIdException is specific to static group membership. When you set group.instance.id, the consumer registers a stable identity with the coordinator so that quick restarts do not trigger a rebalance — the returning instance reclaims its old assignment within session.timeout.ms. The coordinator enforces that each group.instance.id maps to exactly one live member at a time. FencedInstanceIdException is thrown when a member sends a heartbeat or join using a group.instance.id that the coordinator has since associated with a different member.id — i.e., another process registered the same static id.
The coordinator “fences” the older member: it rejects its requests so that only one consumer owns that static identity and its partitions. This prevents a split-brain where two processes both believe they own the same partitions. Unlike UnknownMemberIdException or IllegalGenerationException, fencing is treated as fatal and non-retryable for the fenced instance — rejoining with the same duplicate id would just be fenced again. The correct resolution is to ensure the group.instance.id is genuinely unique per running process and shut down the duplicate.
Common Causes
- Two processes configured with the same
group.instance.id— a copy-paste config, a templating bug, or a hardcoded id deployed to multiple replicas. - A new pod/instance starting before the old one fully exited — during a non-graceful restart, the old process is still heartbeating when the replacement registers the same id, fencing the old one (or vice versa).
- Kubernetes StatefulSet vs Deployment mismatch — using a Deployment (random pod names) but deriving
group.instance.idfrom something non-unique, so two replicas collide. - Scaling beyond the number of distinct static ids — more replicas than unique
group.instance.idvalues. - A zombie/orphaned process holding the static id while a new one starts.
- Reusing the same id across blue/green or canary deployments running concurrently.
How to Reproduce the Error
Start two consumers in the same group with an identical group.instance.id:
// Process A and Process B both use the same static id
props.put(ConsumerConfig.GROUP_ID_CONFIG, "ledger-stream");
props.put(ConsumerConfig.GROUP_INSTANCE_ID_CONFIG, "ledger-node-2"); // identical in both!
props.put(ConsumerConfig.SESSION_TIMEOUT_MS_CONFIG, 30000);
consumer.subscribe(List.of("ledger"));
while (true) {
consumer.poll(Duration.ofSeconds(1)); // the earlier member gets FencedInstanceIdException
}
When the second process registers ledger-node-2 with a new member.id, the coordinator fences the first, which throws FencedInstanceIdException on its next poll/heartbeat.
Diagnostic Commands
All read-only. The goal is to find duplicate static ids and which hosts run them.
# Members with their group.instance.id and host — look for the same instance id on two hosts
kafka-consumer-groups.sh --bootstrap-server localhost:9092 \
--group ledger-stream --describe --members --verbose
# Group state — fencing churn may show rebalancing
kafka-consumer-groups.sh --bootstrap-server localhost:9092 \
--group ledger-stream --state
# Lag — confirm partitions are still being consumed by the surviving member
kafka-consumer-groups.sh --bootstrap-server localhost:9092 \
--group ledger-stream --describe
# Find the configured static id on this host
grep -iE "group.instance.id|group.id" /var/log/ledger-consumer/app.log
# Correlate fencing with concurrent starts of the same id across hosts
journalctl -u ledger-consumer --since "1 hour ago" | grep -iE "fenced|group.instance.id|Starting|Shutting down"
Step-by-Step Resolution
-
Find the duplicate. From
--members --verbose, look for two members sharing onegroup.instance.id(or check whether one was just fenced). TheCLIENT-ID/HOSTcolumns reveal which processes collide. -
Make the static id unique per process. Derive
group.instance.idfrom a per-instance identity that is guaranteed unique and stable, such as the StatefulSet ordinal/pod name:// Kubernetes StatefulSet: pod names are <name>-0, <name>-1, ... String instanceId = System.getenv("HOSTNAME"); // e.g. ledger-stream-0 props.put(ConsumerConfig.GROUP_INSTANCE_ID_CONFIG, instanceId);Do not use a Deployment with a hardcoded id; use a StatefulSet (or otherwise inject a unique, stable id).
-
Do not blindly retry the fenced instance. Fencing is fatal by design. Catch it, log loudly, and shut the process down so it does not fight for the same id:
} catch (FencedInstanceIdException e) { log.error("Static id fenced — duplicate group.instance.id detected; exiting", e); consumer.close(); System.exit(1); // let the orchestrator restart with the correct unique id } -
Fix restart ordering. Ensure graceful shutdown (call
consumer.close()so the member leaves promptly) and that orchestration does not start a replacement with the same id before the old one exits. With static membership, prefer in-place restarts within the session timeout. -
Right-size replicas vs ids. Confirm the number of replicas matches the number of distinct static ids; never have more processes than unique
group.instance.idvalues. -
Verify. After redeploy,
--members --verboseshould show eachgroup.instance.idexactly once, the groupStable, and lag shrinking.
Prevention and Best Practices
- Derive
group.instance.idfrom a unique, stable per-process source (StatefulSet pod ordinal, host id), never a hardcoded constant shared across replicas. - Use StatefulSets (not Deployments) for static-membership consumers so identities are stable and unique.
- Treat
FencedInstanceIdExceptionas fatal: log and exit, let the orchestrator restart cleanly. - Ensure graceful shutdown with
consumer.close()so a returning instance reclaims its id without overlap. - Keep
session.timeout.mslong enough to cover normal restart time so static members reclaim assignments without a rebalance. - Audit configs and templates in CI to guarantee
group.instance.iduniqueness across the fleet.
Related Errors
UnknownMemberIdException— for dynamic members evicted on session timeout; the non-static analog, but recoverable by rejoining.IllegalGenerationException— a generation-stale commit; recoverable, unlike fencing.RebalanceInProgressException— group mid-rebalance; recoverable by re-polling.CommitFailedException— commit rejected after a member fell out of the active group.
Frequently Asked Questions
Why is FencedInstanceIdException fatal when other group errors are not? Because it signals a configuration conflict — two processes claiming the same static identity — that cannot be resolved by retrying. The coordinator fences one to prevent split-brain; the right action is to make ids unique and exit the duplicate.
What is group.instance.id for?
It enables static membership: a consumer keeps a stable identity across restarts so brief restarts (within session.timeout.ms) do not trigger a rebalance. The trade-off is that the id must be unique per live process.
How do I make ids unique in Kubernetes?
Use a StatefulSet and derive the id from the pod ordinal (e.g., $HOSTNAME like consumer-0, consumer-1). Avoid Deployments with a shared hardcoded id.
Can I just catch and retry it? No. Retrying with the same duplicate id gets fenced again. Fix the duplicate id, then let the orchestrator restart the process. Only after the id is unique will the member rejoin successfully.
Download the Free 500-Prompt DevOps AI Toolkit
500 battle-tested, copy-paste AI prompts engineered by a senior systems engineer — every one with fill-in placeholders and safety/back-out notes. Drop your email and it's yours.
- 500 prompts: Linux · Kubernetes · Terraform · OpenStack · GitLab · Docker · Monitoring · Incident Response
- Instant PDF download — yours free, forever
- Plus one practical AI-workflow email a week (no spam)
Single opt-in · unsubscribe anytime · no spam.