Skip to content
DevOps AI ToolKit
Newsletter
All guides
AI for Kafka By James Joyner IV · · 9 min read

Kafka Error Guide: 'RebalanceInProgressException' Consumer Group Is Rebalancing

Fix Kafka RebalanceInProgressException: why offset commits fail mid-rebalance, how cooperative rebalancing changes it, and how to retry the poll cycle safely.

  • #kafka
  • #troubleshooting
  • #errors
  • #consumer

Exact Error Message

org.apache.kafka.common.errors.RebalanceInProgressException: Offset commit cannot be completed since the consumer group is executing a rebalance. The group is rebalancing because a member joined or left the group; you should rejoin the group by calling poll() again.
	at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.sendOffsetCommitRequest(ConsumerCoordinator.java:1198)
	at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.commitOffsetsSync(ConsumerCoordinator.java:1093)
	at org.apache.kafka.clients.consumer.KafkaConsumer.commitSync(KafkaConsumer.java:1517)
	at com.example.metrics.MetricsConsumer.run(MetricsConsumer.java:83)

You may also see it logged from the heartbeat thread as the group transitions:

INFO  o.a.k.c.c.internals.ConsumerCoordinator - [Consumer clientId=metrics-4, groupId=metrics-agg]
  Attempt to heartbeat failed since group is rebalancing
WARN  o.a.k.c.c.internals.ConsumerCoordinator - [Consumer clientId=metrics-4, groupId=metrics-agg]
  Offset commit failed on partition metrics-3 at offset 552190: The group is rebalancing, so a rejoin is needed.

What the Error Means

RebalanceInProgressException means you tried to commit offsets (or the coordinator tried to process your request) while the consumer group was actively rebalancing — moving from PreparingRebalance to CompletingRebalance. During a rebalance, partition ownership is in flux: a member that owned partition 3 a moment ago may not own it once the new assignment lands. Allowing a commit mid-flight could write offsets for partitions that are about to belong to someone else, so the broker rejects the commit and tells you to rejoin by calling poll() again.

Crucially, unlike CommitFailedException (where you were evicted from the group), RebalanceInProgressException is recoverable: you are still a member, the group is simply mid-transition. The correct response is to re-enter the poll loop. The next poll() completes the rebalance, your ConsumerRebalanceListener runs, and you receive a fresh assignment. With cooperative (incremental) rebalancing, you may even keep most of your partitions and only the moved ones change hands.

Common Causes

  • Committing during an active rebalance — a member joined or left (deploy, scale-up/down, crash), and your commitSync()/commitAsync() landed in the rebalance window.
  • Frequent membership churn — rolling restarts or autoscaling that constantly add/remove members keep the group rebalancing.
  • Long processing pushing a member past max.poll.interval.ms, which evicts it and triggers a rebalance that other members’ commits then collide with.
  • Eager vs cooperative protocol mismatch — a mix of range/roundrobin (eager) and cooperative-sticky assignors across rolling members can prolong rebalances.
  • Committing inside onPartitionsRevoked at exactly the wrong moment under the eager protocol.
  • Static membership flapping when group.instance.id instances restart faster than session.timeout.ms.

How to Reproduce the Error

Start a consumer committing synchronously in a tight loop, then add a second member to force a rebalance:

// Member A: tight commit loop
while (true) {
    ConsumerRecords<String, String> r = consumer.poll(Duration.ofMillis(200));
    process(r);
    consumer.commitSync(); // will throw when member B joins
}

Start a second consumer in the same group.id against the same topic. The join triggers a rebalance, and member A’s next commitSync() throws RebalanceInProgressException.

Diagnostic Commands

All read-only. The goal is to confirm the group is rebalancing and find what is causing churn.

# Current group state — PreparingRebalance / CompletingRebalance confirms the cause
kafka-consumer-groups.sh --bootstrap-server localhost:9092 \
  --group metrics-agg --state
# Members and their assignor — mixed assignors prolong rebalances
kafka-consumer-groups.sh --bootstrap-server localhost:9092 \
  --group metrics-agg --describe --members --verbose
# Lag during and after the rebalance
kafka-consumer-groups.sh --bootstrap-server localhost:9092 \
  --group metrics-agg --describe
# How often is the group rebalancing? Count rejoin events
journalctl -u metrics-consumer --since "1 hour ago" | grep -ciE "rebalanc|rejoin|Revoke previously assigned"
# Confirm the configured partition assignor
grep -iE "partition.assignment.strategy" /var/log/metrics-consumer/app.log

Step-by-Step Resolution

  1. Recognize it as recoverable. This is not eviction. Do not exit the thread or treat the batch as committed — just let the loop call poll() again to complete the rebalance.

  2. Do not retry commitSync in a tight loop. Catch the exception and continue the poll loop rather than immediately re-issuing the same commit, which will keep failing until the rebalance finishes:

    try {
        consumer.commitSync();
    } catch (RebalanceInProgressException e) {
        // expected; the next poll() rejoins and we re-commit after reassignment
    }
  3. Commit at the right time. Prefer committing after processing each batch and rely on commitAsync() during steady state with a commitSync() in onPartitionsRevoked and on shutdown, so commits align with assignment boundaries.

  4. Switch to cooperative rebalancing. Set the assignor to CooperativeStickyAssignor so rebalances become incremental — most partitions stay put and stop-the-world pauses shrink, reducing the window where commits collide.

    props.put(ConsumerConfig.PARTITION_ASSIGNMENT_STRATEGY_CONFIG,
              "org.apache.kafka.clients.consumer.CooperativeStickyAssignor");

    Roll this out carefully: all members must move off eager assignors during the transition.

  5. Reduce churn. Stagger rolling restarts, use static membership (group.instance.id) with a sensible session.timeout.ms so brief restarts do not trigger rebalances, and fix any max.poll.interval.ms violations causing evictions.

  6. Verify. Confirm --state returns to Stable and the rejoin count in logs drops.

Prevention and Best Practices

  • Treat RebalanceInProgressException as a normal, retryable signal — never as a fatal error.
  • Adopt CooperativeStickyAssignor to minimize stop-the-world rebalances and the commit window they create.
  • Use static membership for stable fleets so quick restarts do not churn the group.
  • Stagger deploys and avoid restarting many members simultaneously.
  • Keep processing under max.poll.interval.ms so evictions do not pile rebalances onto otherwise healthy members.
  • Commit on assignment boundaries (in the rebalance listener and on shutdown), not in a blind tight loop.
  • CommitFailedException — you were actually evicted from the group, not merely mid-rebalance; not recoverable by retry alone.
  • IllegalGenerationException — the commit carried a stale generation after the rebalance completed.
  • UnknownMemberIdException — the coordinator no longer knows your member id, often the next step if churn continues.
  • FencedInstanceIdException — a static-membership conflict that can itself trigger rebalances.

Frequently Asked Questions

Is RebalanceInProgressException safe to ignore? Effectively yes — catch it and continue the poll loop. The group is rebalancing and you remain a member; the next poll() rejoins and you can commit again after reassignment. Ensure processing is idempotent in case of replay.

How is it different from CommitFailedException? RebalanceInProgressException means “the group is mid-rebalance, rejoin.” CommitFailedException means “you were kicked out.” The first is recoverable by polling again; the second requires rejoining from scratch and the batch is uncommitted.

Does cooperative rebalancing eliminate it? It reduces both the frequency and the blast radius of rebalances, so you will see the exception far less, but it does not make rebalances impossible. Your code should still handle it.

Why does my group rebalance so often? Common drivers are rolling restarts, autoscaling, max.poll.interval.ms violations, and static-membership flapping. Use the journalctl rejoin count and group --state to correlate rebalances with deploys or evictions.

Free download · 368-page PDF

Download the Free 500-Prompt DevOps AI Toolkit

500 battle-tested, copy-paste AI prompts engineered by a senior systems engineer — every one with fill-in placeholders and safety/back-out notes. Drop your email and it's yours.

  • 500 prompts: Linux · Kubernetes · Terraform · OpenStack · GitLab · Docker · Monitoring · Incident Response
  • Instant PDF download — yours free, forever
  • Plus one practical AI-workflow email a week (no spam)

Single opt-in · unsubscribe anytime · no spam.