Kafka Error Guide: 'Error in log cleaner thread' Retention Cleanup Failed
Fix Kafka log cleaner and retention cleanup failures: diagnose a dead LogCleaner thread, dedupe buffer memory limits, compaction errors, and growing disk usage.
- #kafka
- #troubleshooting
- #errors
- #storage
Exact Error Message
When the background cleaner that handles retention deletion and compaction hits an unrecoverable error, the thread dies and logs to server.log (and often a dedicated log-cleaner.log):
[2026-06-29 04:17:22,406] ERROR [kafka-log-cleaner-thread-0]: Error due to (kafka.log.LogCleaner)
java.lang.IllegalArgumentException: requirement failed: 1048576 messages in segment compacted-state-5/00000000000000000000.log but offset map can fit only 393216. You can increase log.cleaner.dedupe.buffer.size or decrease log.cleaner.threads
at scala.Predef$.require(Predef.scala:337)
at kafka.log.Cleaner.buildOffsetMap(LogCleaner.scala:1024)
at kafka.log.Cleaner.clean(LogCleaner.scala:580)
[2026-06-29 04:17:22,410] INFO [kafka-log-cleaner-thread-0]: Stopped (kafka.log.LogCleaner)
[2026-06-29 04:17:22,411] ERROR [kafka-log-cleaner-thread-0]: Failed to clean up log for compacted-state-5 in dir /var/lib/kafka due to IOException (kafka.log.LogCleaner)
Signatures include Error in log cleaner thread, Failed to clean up log, the dedupe-buffer offset map can fit only ... requirement failure, and the thread reporting Stopped.
What the Error Means
Kafka has two cleanup mechanisms, both driven by background threads. Time/size retention deletes whole segments older than retention.ms or beyond retention.bytes. Log compaction (for cleanup.policy=compact topics) rewrites segments to keep only the latest value per key, using an in-memory offset map (the “dedupe buffer”) to track keys.
When the LogCleaner thread throws an uncaught exception, it dies and stops cleaning the partitions it owned. Crucially, a dead cleaner does not crash the broker — it keeps serving traffic while disk usage on compacted topics grows unbounded, because old segments are never compacted or deleted. The classic cause is the dedupe buffer being too small to hold all keys in the segments being cleaned, producing the offset map can fit only N requirement failure. Other causes are IO errors during the rewrite or corrupt segments in a compacted partition.
The danger is silent: cleanup quietly stops and the disk fills until a KafkaStorageException eventually takes the directory offline.
Common Causes
- Dedupe buffer too small.
log.cleaner.dedupe.buffer.sizedivided acrosslog.cleaner.threadscannot hold the number of unique keys in the segments to compact, triggering theoffset map can fit onlyrequirement failure. - Corrupt segment in a compacted topic. The cleaner reads a damaged record while rewriting and throws, killing the thread.
- IO error during compaction. A disk error while writing the cleaned segment surfaces as
Failed to clean up log ... due to IOException. - Retention deletion blocked. A file lock, permission issue, or open file handle prevents deleting an expired segment.
- Cleaner thread previously crashed and was never restarted. One uncaught exception leaves the thread dead until the broker restarts, so disk creeps up unnoticed.
How to Reproduce the Error
On a disposable test broker you can trigger the dedupe-buffer failure deterministically. Set an artificially tiny buffer and feed a compacted topic more unique keys than it can hold:
# Test broker only: start with a deliberately tiny dedupe buffer in server.properties
# log.cleaner.dedupe.buffer.size=10485760
# then produce many unique keys to a cleanup.policy=compact topic
When the cleaner runs on a segment with more keys than the offset map can fit, it logs offset map can fit only ..., the thread Stopped, and compaction halts. Watching du -sh on that topic shows disk usage climbing because old segments are no longer compacted. Restore a sane buffer size and restart to recover the test broker.
Diagnostic Commands
All commands below are read-only. The first question is whether the cleaner thread is alive.
# Cleaner errors, dead threads, and dedupe-buffer requirement failures
grep -E "log cleaner|LogCleaner|Failed to clean|offset map can fit|Stopped|dedupe" \
/var/log/kafka/server.log /var/log/kafka/log-cleaner.log 2>/dev/null | tail -40
# Is the cleaner thread still running in the JVM?
jstack "$(pgrep -f 'kafka.Kafka')" | grep -i "kafka-log-cleaner-thread" || echo "NO LIVE CLEANER THREADS"
# Disk growth on compacted/retained topics — the visible symptom
df -h /var/lib/kafka
du -sh /var/lib/kafka/*/ 2>/dev/null | sort -rh | head
ls -la /var/lib/kafka/compacted-state-5/ | grep -E "\.log$|\.deleted$"
# Inspect a compacted segment offline to spot corruption the cleaner choked on
kafka-dump-log.sh --files /var/lib/kafka/compacted-state-5/00000000000000000000.log \
--deep-iteration | tail -20
# Effective cleaner config for buffer/threads sizing
grep -E "log.cleaner|retention|dedupe|cleanup.policy" /etc/kafka/server.properties
journalctl -u kafka --since "6 hours ago" | grep -iE "cleaner|retention"
If jstack finds no cleaner threads, the cleaner died and cleanup is stalled. If du shows a compacted topic far larger than its key cardinality warrants, compaction is not running.
Step-by-Step Resolution
- Confirm the cleaner is dead with
jstackand theStoppedline in the log, then read the exception to find the partition and cause. - For the dedupe-buffer failure, increase
log.cleaner.dedupe.buffer.size(e.g. from 128 MB toward 512 MB or 1 GB) and/or reducelog.cleaner.threadsso each thread gets a larger share of the buffer. The offset map per thread must hold the keys in one cleanable segment group. - For a corrupt segment in a compacted topic, identify it with
kafka-dump-log.sh --deep-iteration. If the topic has replicas, remove this broker’s copy and re-replicate; otherwise move the bad segment aside so the cleaner can proceed past it. - For an IO error, treat it as a disk fault (check SMART and kernel logs) and fix the storage before expecting cleanup to resume.
- Restart the broker to revive the cleaner. A dead LogCleaner thread is only restarted on broker restart; there is no online restart of the thread. After restart it resumes compaction/deletion and disk usage falls.
- Watch reclamation. Monitor
du -shon the affected topics anddf -hto confirm space is being recovered, and check thekafka.log:type=LogCleanerManager,name=max-dirty-percentstyle metrics for backlog. - If the disk already filled, you may also be hitting a
KafkaStorageException; clear space first, then restart so both the directory and the cleaner recover.
Prevention and Best Practices
- Monitor the cleaner explicitly: alert on
kafka.log:type=LogCleanerManager,name=time-since-last-run-msand on uncleanable-partition/bytes metrics so a dead cleaner is caught before the disk fills. - Size
log.cleaner.dedupe.buffer.sizefor your highest-cardinality compacted topic; high-key-count state topics need a generous buffer per thread. - Alert on disk headroom (80%/90%) so silently stalled cleanup surfaces as a capacity warning, not an outage.
- Set sensible
segment.ms/segment.byteson compacted topics so the active segment rolls and becomes eligible for compaction promptly. - Keep replication factor 3 so a corrupt compacted segment can be replaced by re-replication rather than data loss.
- For fast triage of a dead-cleaner page, the free incident assistant can turn the cleaner log and disk metrics into a likely cause.
Related Errors
KafkaStorageException: Stopping serving logs in dir— what eventually happens when stalled cleanup fills the disk.Found a corrupted segment / CorruptRecordException— corruption in a compacted topic that can kill the cleaner thread.Error while flushing log— an fsync failure that can also surface while the cleaner rewrites a segment.
Frequently Asked Questions
Why didn’t the broker crash when the cleaner died? The LogCleaner runs in its own thread. Its death does not stop the broker; it silently halts compaction/retention, so disk usage grows until something else fails.
What does offset map can fit only N mean?
The dedupe buffer (offset map) cannot hold all the unique keys in the segments being compacted. Increase log.cleaner.dedupe.buffer.size or decrease log.cleaner.threads.
How do I restart just the cleaner thread? There is no online restart. A dead LogCleaner is only respawned when you restart the broker. Fix the root cause first, then restart.
Will I lose data when compaction resumes?
No. Compaction only removes superseded values for a key (and tombstones past delete.retention.ms). Resuming it reclaims space without losing the latest value per key.
Download the Free 500-Prompt DevOps AI Toolkit
500 battle-tested, copy-paste AI prompts engineered by a senior systems engineer — every one with fill-in placeholders and safety/back-out notes. Drop your email and it's yours.
- 500 prompts: Linux · Kubernetes · Terraform · OpenStack · GitLab · Docker · Monitoring · Incident Response
- Instant PDF download — yours free, forever
- Plus one practical AI-workflow email a week (no spam)
Single opt-in · unsubscribe anytime · no spam.