Kafka Error Guide: 'Corrupted index found' Offset Index Corrupted on Startup
Fix Kafka 'Corrupted index found' and 'Found invalid offset index' errors: understand index rebuilds on restart, time index corruption, and slow recovery startups.
- #kafka
- #troubleshooting
- #errors
- #storage
Exact Error Message
When a broker loads its logs and finds an offset or time index that does not match its data segment, it logs a corrupted-index warning and rebuilds it. From server.log:
[2026-06-29 02:08:17,344] WARN [Log partition=clicks-7, dir=/var/lib/kafka] Found a corrupted index file corresponding to log file /var/lib/kafka/clicks-7/00000000000002097152.log due to Corrupt index found, index file (/var/lib/kafka/clicks-7/00000000000002097152.index) has non-zero size but the last offset is 2097151 which is no greater than the base offset 2097152. (kafka.log.LogSegment)
[2026-06-29 02:08:17,351] INFO [Log partition=clicks-7] Recovering unflushed segment 2097152 (kafka.log.UnifiedLog)
[2026-06-29 02:08:18,002] WARN [Log partition=clicks-7] Found invalid offset index for /var/lib/kafka/clicks-7/00000000000002097152.index, rebuilding index (kafka.log.OffsetIndex)
[2026-06-29 02:08:18,540] WARN [Log partition=clicks-7] Time index for 00000000000002097152.timeindex is corrupt, rebuilding it (kafka.log.TimeIndex)
[2026-06-29 02:08:21,118] INFO [Log partition=clicks-7] Loading producer state ... and rebuilt 1 index(es) (kafka.log.UnifiedLog)
The key phrases are Corrupted index found, Found invalid offset index, and Time index ... is corrupt, rebuilding it.
What the Error Means
Every Kafka data segment (.log) has two sidecar index files: an offset index (.index) mapping logical offsets to physical byte positions, and a time index (.timeindex) mapping timestamps to offsets. These indexes are sparse, memory-mapped files that are appended as the segment grows. They are derived data — they can always be regenerated from the .log file.
When the broker loads a segment, it sanity-checks the index: the last indexed offset must be greater than the base offset, entries must be monotonically increasing, and the file size must be a multiple of the entry size. If any check fails, Kafka logs the corruption, discards the bad index, and rebuilds it by scanning the data segment. The data itself is intact; only the lookup structure was damaged.
The cost is time, not data loss. Rebuilding indexes for large segments scans the full .log, so a broker recovering many partitions after an unclean shutdown can take a long time to start.
Common Causes
- Unclean shutdown. A crash or
kill -9leaves the active segment’s index files not fully flushed or pre-allocated but not yet written, so they fail the size/offset checks on restart. - Pre-allocated index not trimmed. Kafka pre-allocates index files to
log.index.size.max.bytes; an unclean stop can leave a non-zero-size index whose tail is zeros, tripping thelast offset <= base offsetcheck. - Time index corruption. The
.timeindexwas partially written; Kafka rebuilds it independently of the offset index. - Page-cache loss on power failure. Index pages in the OS cache were lost before fsync.
- Storage bit-rot. A bad block flips bytes inside an index file, breaking monotonicity.
How to Reproduce the Error
On a disposable test broker, an unclean shutdown during active writes reliably leaves an index that must be rebuilt:
# Test broker only. Hard-kill while the active segment index is being appended.
kill -9 "$(pgrep -f 'kafka.Kafka')"
On the next start the broker logs Found a corrupted index file ... rebuilding index for the active segment. You can also stage it deterministically on a stopped test broker by truncating an index file to an invalid length so the size is no longer a multiple of the 8-byte entry size; startup then reports the index as corrupt and rebuilds it. Do not modify index files on production data.
Diagnostic Commands
All commands below are read-only. Confirm which segments rebuilt and how long recovery is taking.
# Pull index corruption and rebuild lines from the broker log
grep -E "Corrupted index|invalid offset index|Time index .* corrupt|rebuilding index|Recovering unflushed" \
/var/log/kafka/server.log | tail -40
# Inspect a segment and its index offline; --index-sanity-check validates the index
kafka-dump-log.sh --files /var/lib/kafka/clicks-7/00000000000002097152.index \
--index-sanity-check
kafka-dump-log.sh --files /var/lib/kafka/clicks-7/00000000000002097152.log \
--print-data-log | head -10
# List index/timeindex files and their sizes for a partition
ls -la /var/lib/kafka/clicks-7/ | grep -E "\.index$|\.timeindex$"
# Was the shutdown unclean, and how long is recovery running?
journalctl -u kafka --since "1 hour ago" | grep -iE "SIGKILL|terminated|starting|started|Loading logs|Recovering"
# Rule out underlying disk problems corrupting the index files
sudo smartctl --health /dev/nvme1n1
journalctl -k --since "2 hours ago" | grep -iE "I/O error|EXT4-fs error"
kafka-dump-log.sh --index-sanity-check validates the offset index against the data segment without modifying anything, confirming whether the rebuild was warranted.
Step-by-Step Resolution
- Recognize it as self-healing.
Found invalid offset index ... rebuilding indexis a warning, not a fatal error. Kafka regenerates the index from the data segment automatically — no action is required for the data. - Let startup finish. Do not restart the broker mid-recovery. Interrupting it forces the rebuilds to start over and lengthens total downtime.
- If startup is slow, check how many partitions are recovering. Each large segment scan adds time. You can raise
num.recovery.threads.per.data.dirso multiple segments rebuild in parallel on the next restart. - If a rebuild fails (rather than just rebuilds), the underlying
.logdata is likely corrupt too — fall back to the corrupted-segment workflow and re-replicate the partition from a healthy leader. - Check the disk if index corruption recurs without crashes; bit-rot from a failing device will keep re-corrupting indexes until the drive is replaced.
- Verify after startup. Confirm with
kafka-dump-log.sh --index-sanity-checkon the affected segments and ensurekafka-topics.sh --describe --under-replicated-partitionsis clean.
Because index files are derived data, the safe manual recovery for a stuck index is to remove only the .index/.timeindex sidecars (never the .log) on a stopped broker and let Kafka regenerate them on start — but in practice automatic rebuild handles this without manual intervention.
Prevention and Best Practices
- Shut down brokers gracefully (SIGTERM, not SIGKILL) so index files flush and skip rebuild on the next start.
- Tune
num.recovery.threads.per.data.dirto parallelize index rebuilds and shorten startup after an unclean stop, especially on brokers with many partitions. - Keep replication factor 3 so a broker doing a long recovery does not threaten availability — leaders stay on other brokers.
- Monitor broker startup duration; a sudden jump usually means an unclean shutdown forced mass index rebuilds, which is a signal to fix whatever caused the crash.
- Run SMART monitoring on data disks so recurring index corruption from bit-rot is caught and the drive replaced.
- Browse related storage failures in the Kafka guides.
Related Errors
Found a corrupted segment / CorruptRecordException— data-segment corruption, which is real damage, versus index corruption that simply rebuilds.Could not recover log— when recovery (including index rebuild) cannot complete and blocks startup.KafkaStorageException: Stopping serving logs in dir— a directory-level IO failure that can occur if the disk corrupting indexes is also failing.
Frequently Asked Questions
Is a corrupted index data loss?
No. Index files are derived from the data segment. Kafka rebuilds them by scanning the .log; the messages are intact.
Why is my broker so slow to start after a crash?
It is rebuilding indexes for every unflushed segment, scanning each data file. More partitions and larger segments mean longer recovery. Increase num.recovery.threads.per.data.dir.
Should I delete the index files manually?
Only the .index/.timeindex sidecars, only on a stopped broker, and only if automatic rebuild is stuck. Never delete the .log. Usually the automatic rebuild handles it.
What is the difference between the offset index and the time index? The offset index maps offsets to byte positions for fast reads; the time index maps timestamps to offsets for time-based lookups. Each is validated and rebuilt independently.
Download the Free 500-Prompt DevOps AI Toolkit
500 battle-tested, copy-paste AI prompts engineered by a senior systems engineer — every one with fill-in placeholders and safety/back-out notes. Drop your email and it's yours.
- 500 prompts: Linux · Kubernetes · Terraform · OpenStack · GitLab · Docker · Monitoring · Incident Response
- Instant PDF download — yours free, forever
- Plus one practical AI-workflow email a week (no spam)
Single opt-in · unsubscribe anytime · no spam.