Kafka Error Guide: 'Error in log cleaner thread' Retention

Exact Error Message

When the background cleaner that handles retention deletion and compaction hits an unrecoverable error, the thread dies and logs to server.log (and often a dedicated log-cleaner.log):

[2026-06-29 04:17:22,406] ERROR [kafka-log-cleaner-thread-0]: Error due to (kafka.log.LogCleaner)
java.lang.IllegalArgumentException: requirement failed: 1048576 messages in segment compacted-state-5/00000000000000000000.log but offset map can fit only 393216. You can increase log.cleaner.dedupe.buffer.size or decrease log.cleaner.threads
        at scala.Predef$.require(Predef.scala:337)
        at kafka.log.Cleaner.buildOffsetMap(LogCleaner.scala:1024)
        at kafka.log.Cleaner.clean(LogCleaner.scala:580)
[2026-06-29 04:17:22,410] INFO [kafka-log-cleaner-thread-0]: Stopped (kafka.log.LogCleaner)
[2026-06-29 04:17:22,411] ERROR [kafka-log-cleaner-thread-0]: Failed to clean up log for compacted-state-5 in dir /var/lib/kafka due to IOException (kafka.log.LogCleaner)

Signatures include Error in log cleaner thread, Failed to clean up log, the dedupe-buffer offset map can fit only ... requirement failure, and the thread reporting Stopped.

What the Error Means

Kafka has two cleanup mechanisms, both driven by background threads. Time/size retention deletes whole segments older than retention.ms or beyond retention.bytes. Log compaction (for cleanup.policy=compact topics) rewrites segments to keep only the latest value per key, using an in-memory offset map (the “dedupe buffer”) to track keys.

When the LogCleaner thread throws an uncaught exception, it dies and stops cleaning the partitions it owned. Crucially, a dead cleaner does not crash the broker — it keeps serving traffic while disk usage on compacted topics grows unbounded, because old segments are never compacted or deleted. The classic cause is the dedupe buffer being too small to hold all keys in the segments being cleaned, producing the offset map can fit only N requirement failure. Other causes are IO errors during the rewrite or corrupt segments in a compacted partition.

The danger is silent: cleanup quietly stops and the disk fills until a KafkaStorageException eventually takes the directory offline.

Common Causes

Dedupe buffer too small. log.cleaner.dedupe.buffer.size divided across log.cleaner.threads cannot hold the number of unique keys in the segments to compact, triggering the offset map can fit only requirement failure.
Corrupt segment in a compacted topic. The cleaner reads a damaged record while rewriting and throws, killing the thread.
IO error during compaction. A disk error while writing the cleaned segment surfaces as Failed to clean up log ... due to IOException.
Retention deletion blocked. A file lock, permission issue, or open file handle prevents deleting an expired segment.
Cleaner thread previously crashed and was never restarted. One uncaught exception leaves the thread dead until the broker restarts, so disk creeps up unnoticed.

How to Reproduce the Error

On a disposable test broker you can trigger the dedupe-buffer failure deterministically. Set an artificially tiny buffer and feed a compacted topic more unique keys than it can hold:

# Test broker only: start with a deliberately tiny dedupe buffer in server.properties
# log.cleaner.dedupe.buffer.size=10485760
# then produce many unique keys to a cleanup.policy=compact topic

When the cleaner runs on a segment with more keys than the offset map can fit, it logs offset map can fit only ..., the thread Stopped, and compaction halts. Watching du -sh on that topic shows disk usage climbing because old segments are no longer compacted. Restore a sane buffer size and restart to recover the test broker.

Diagnostic Commands

All commands below are read-only. The first question is whether the cleaner thread is alive.

# Cleaner errors, dead threads, and dedupe-buffer requirement failures
grep -E "log cleaner|LogCleaner|Failed to clean|offset map can fit|Stopped|dedupe" \
  /var/log/kafka/server.log /var/log/kafka/log-cleaner.log 2>/dev/null | tail -40

# Is the cleaner thread still running in the JVM?
jstack "$(pgrep -f 'kafka.Kafka')" | grep -i "kafka-log-cleaner-thread" || echo "NO LIVE CLEANER THREADS"

# Disk growth on compacted/retained topics — the visible symptom
df -h /var/lib/kafka
du -sh /var/lib/kafka/*/ 2>/dev/null | sort -rh | head
ls -la /var/lib/kafka/compacted-state-5/ | grep -E "\.log$|\.deleted$"

# Inspect a compacted segment offline to spot corruption the cleaner choked on
kafka-dump-log.sh --files /var/lib/kafka/compacted-state-5/00000000000000000000.log \
  --deep-iteration | tail -20

# Effective cleaner config for buffer/threads sizing
grep -E "log.cleaner|retention|dedupe|cleanup.policy" /etc/kafka/server.properties
journalctl -u kafka --since "6 hours ago" | grep -iE "cleaner|retention"

If jstack finds no cleaner threads, the cleaner died and cleanup is stalled. If du shows a compacted topic far larger than its key cardinality warrants, compaction is not running.

Step-by-Step Resolution

Confirm the cleaner is dead with jstack and the Stopped line in the log, then read the exception to find the partition and cause.
For the dedupe-buffer failure, increase log.cleaner.dedupe.buffer.size (e.g. from 128 MB toward 512 MB or 1 GB) and/or reduce log.cleaner.threads so each thread gets a larger share of the buffer. The offset map per thread must hold the keys in one cleanable segment group.
For a corrupt segment in a compacted topic, identify it with kafka-dump-log.sh --deep-iteration. If the topic has replicas, remove this broker’s copy and re-replicate; otherwise move the bad segment aside so the cleaner can proceed past it.
For an IO error, treat it as a disk fault (check SMART and kernel logs) and fix the storage before expecting cleanup to resume.
Restart the broker to revive the cleaner. A dead LogCleaner thread is only restarted on broker restart; there is no online restart of the thread. After restart it resumes compaction/deletion and disk usage falls.
Watch reclamation. Monitor du -sh on the affected topics and df -h to confirm space is being recovered, and check the kafka.log:type=LogCleanerManager,name=max-dirty-percent style metrics for backlog.
If the disk already filled, you may also be hitting a KafkaStorageException; clear space first, then restart so both the directory and the cleaner recover.

Prevention and Best Practices

Monitor the cleaner explicitly: alert on kafka.log:type=LogCleanerManager,name=time-since-last-run-ms and on uncleanable-partition/bytes metrics so a dead cleaner is caught before the disk fills.
Size log.cleaner.dedupe.buffer.size for your highest-cardinality compacted topic; high-key-count state topics need a generous buffer per thread.
Alert on disk headroom (80%/90%) so silently stalled cleanup surfaces as a capacity warning, not an outage.
Set sensible segment.ms/segment.bytes on compacted topics so the active segment rolls and becomes eligible for compaction promptly.
Keep replication factor 3 so a corrupt compacted segment can be replaced by re-replication rather than data loss.
For fast triage of a dead-cleaner page, the free incident assistant can turn the cleaner log and disk metrics into a likely cause.

KafkaStorageException: Stopping serving logs in dir — what eventually happens when stalled cleanup fills the disk.
Found a corrupted segment / CorruptRecordException — corruption in a compacted topic that can kill the cleaner thread.
Error while flushing log — an fsync failure that can also surface while the cleaner rewrites a segment.

Frequently Asked Questions

Why didn’t the broker crash when the cleaner died? The LogCleaner runs in its own thread. Its death does not stop the broker; it silently halts compaction/retention, so disk usage grows until something else fails.

What does offset map can fit only N mean? The dedupe buffer (offset map) cannot hold all the unique keys in the segments being compacted. Increase log.cleaner.dedupe.buffer.size or decrease log.cleaner.threads.

How do I restart just the cleaner thread? There is no online restart. A dead LogCleaner is only respawned when you restart the broker. Fix the root cause first, then restart.

Will I lose data when compaction resumes? No. Compaction only removes superseded values for a key (and tombstones past delete.retention.ms). Resuming it reclaims space without losing the latest value per key.

Kafka Error Guide: 'Error in log cleaner thread' Retention Cleanup Failed

Exact Error Message

What the Error Means

Common Causes

How to Reproduce the Error

Diagnostic Commands

Step-by-Step Resolution

Prevention and Best Practices

Frequently Asked Questions

Download the Free 500-Prompt DevOps AI Toolkit

Exact Error Message

What the Error Means

Common Causes

How to Reproduce the Error

Diagnostic Commands

Step-by-Step Resolution

Prevention and Best Practices

Related Errors

Frequently Asked Questions

Download the Free 500-Prompt DevOps AI Toolkit