Skip to content
DevOps AI ToolKit
Newsletter
All guides
AI for Kafka By James Joyner IV · · 9 min read

Kafka Error Guide: 'Metadata log corruption detected' Corrupted __cluster_metadata Segment

Fix KRaft 'Metadata log corruption detected': diagnose a corrupted __cluster_metadata segment from CRC mismatch, partial writes, or disk faults, and recover safely.

  • #kafka
  • #troubleshooting
  • #errors
  • #kraft

Exact Error Message

When a controller or broker reads the __cluster_metadata log and a record batch fails its integrity check, KRaft refuses to load it and reports corruption, often aborting startup:

[2026-06-29 06:55:02,193] ERROR [RaftManager id=2] Metadata log corruption detected in segment /var/lib/kafka/__cluster_metadata-0/00000000000009850000.log at position 4456221 (org.apache.kafka.raft.KafkaRaftClient)
org.apache.kafka.common.errors.CorruptRecordException: Record batch CRC mismatch: stored=2847119045, computed=1190442781 at offset 9,861,204
    at org.apache.kafka.common.record.DefaultRecordBatch.ensureValid(DefaultRecordBatch.java:178)
[2026-06-29 06:55:02,205] ERROR [SharedServer id=2] Fatal error during controller startup; aborting (kafka.server.SharedServer)

Sometimes it surfaces as a truncation/invalid-batch error on recovery:

[2026-06-29 06:55:02,210] WARN [Log partition=__cluster_metadata-0] Found invalid record batch; corrupted index or segment beyond offset 9,861,204 (kafka.log.UnifiedLog)

What the Error Means

Every record batch in the __cluster_metadata log carries a CRC. On read, KRaft recomputes the CRC and compares it to the stored value. “Metadata log corruption detected” means a batch’s stored CRC does not match the computed one (or the batch header is malformed), so the on-disk bytes are not what was written. Because metadata is the source of truth for the whole cluster, KRaft will not silently skip a bad batch — it fails loudly, frequently aborting startup of that node.

This is a single-node, on-disk problem: the corrupted segment exists on one node’s disk. The Raft log is replicated across the quorum, so a healthy majority still holds good copies. The goal is to recover the affected node from the quorum rather than to “repair” bytes in place, which is not safe.

Common Causes

  • Unclean shutdown / power loss mid-write, leaving a partially written batch at the tail of the active segment.
  • Disk or filesystem faults — bad sectors, controller/firmware bugs, or silent bit-rot flipping bytes in a segment.
  • Filesystem without proper fsync semantics or a misbehaving virtualized/network volume that acknowledged writes it did not persist.
  • Out-of-space events that truncated a batch mid-flush (often paired with snapshot failures).
  • Manual tampering — someone edited, copied, or partially restored files in __cluster_metadata-0.
  • Memory errors (non-ECC RAM) corrupting data before it was written.

How to Reproduce the Error

Deliberately damage a byte in a metadata segment on a lab node, then restart it:

# Lab only, on a NON-quorum-critical node: corrupt a byte mid-segment
sudo systemctl stop kafka
SEG=$(ls /var/lib/kafka/__cluster_metadata-0/*.log | head -1)
sudo dd if=/dev/urandom of="$SEG" bs=1 count=8 seek=4456221 conv=notrunc

sudo systemctl start kafka
sudo journalctl -u kafka -f | grep -iE 'corrupt|CRC'

On restart the node detects the CRC mismatch and aborts with “Metadata log corruption detected.” (Never do this on a production node or on enough nodes to threaten quorum.)

Diagnostic Commands

All read-only.

# Decode and validate the suspect segment; the decoder reports bad batches
kafka-dump-log.sh --cluster-metadata-decoder \
  --files /var/lib/kafka/__cluster_metadata-0/00000000000009850000.log \
  | tail -60

# Pinpoint the corruption line and offset in the log
grep -iE 'corruption detected|CRC mismatch|invalid record batch|CorruptRecord' \
  /var/log/kafka/controller.log | tail -40

# Is the rest of the quorum healthy and holding good copies?
kafka-metadata-quorum.sh --bootstrap-controller c1:9093 describe --status
kafka-metadata-quorum.sh --bootstrap-controller c1:9093 describe --replication

# Disk / filesystem health that may have caused it
dmesg | grep -iE 'i/o error|ext4-fs error|xfs.*corrupt|bad sector' | tail -20

# List segments and their sizes/timestamps to spot a truncated tail
ls -la /var/lib/kafka/__cluster_metadata-0/

describe --status against a healthy controller confirms the quorum still has a leader and good data — that is what you will recover the bad node from.

Step-by-Step Resolution

  1. Confirm the quorum is otherwise healthy. If a majority of controllers are up with a leader and good metadata, the corrupted node can be recovered from them without data loss. If multiple nodes are corrupt and quorum is lost, escalate to your DR runbook before touching disks.
  2. Identify exactly which node and segment are corrupt using the dump-log decoder and the controller log offset.
  3. Check the underlying disk (dmesg, SMART). If the hardware is failing, replace it before recovering — re-replicating onto a bad disk just recreates corruption.
  4. Recover the node from the quorum. Following your tested runbook, stop the affected node and let KRaft re-replicate the metadata log and latest snapshot from the healthy leader (typically by clearing only that node’s __cluster_metadata-0 directory so it bootstraps fresh from a snapshot). This is the supported path — do not edit bytes in the segment.
  5. Restart and verify the node rejoins, describe --replication shows its Lag shrinking to zero, and no further CRC errors appear.
  6. Address the root cause (UPS/clean shutdown, fsync-safe storage, ECC RAM) so it does not recur.

Prevention and Best Practices

  • Use storage with honest fsync semantics and battery-backed/UPS-protected power so in-flight writes are not lost on power events.
  • Prefer ECC memory on controller hosts to avoid in-memory corruption before writes.
  • Ensure clean shutdowns; configure systemd to allow Kafka enough stop time to flush.
  • Monitor disks with SMART and act on the first I/O errors in dmesg.
  • Run an odd, multi-node quorum so any single corrupted node is recoverable from the majority.
  • Never hand-edit, copy, or partially restore files inside __cluster_metadata-0; treat it as opaque, quorum-managed state.
  • For triage help turning a CRC error into a recovery plan, the free incident assistant can summarize the log and next steps.
  • Snapshot generation failed — disk/space faults that often accompany or precede corruption.
  • Unable to fetch metadata log — how a recovering node catches back up from the quorum.
  • Failed to append metadata record — write-path failures that can leave a truncated, corrupt tail. See the Kafka guides.

Frequently Asked Questions

Can I just delete the corrupt segment file? Not safely on its own — gaps or hand-truncation can leave inconsistent metadata. The supported recovery is to re-bootstrap that node’s metadata from the quorum snapshot, not to surgically remove a file.

Will the cluster lose metadata? No, as long as a healthy majority holds good copies. The corrupt node is recovered from them. Losing quorum to corruption on multiple nodes is the serious case requiring DR.

Why does Kafka abort instead of skipping the bad batch? Metadata is the cluster’s source of truth; silently skipping a batch could apply an inconsistent view. Failing loudly forces a correct recovery.

Is corruption always a disk problem? Usually disk, power, or memory. But out-of-space truncation and manual file tampering are common non-hardware causes.

How do I know the recovered node is truly clean? After re-bootstrapping, describe --replication shows zero lag and the dump-log decoder reports no invalid batches, with no recurring CRC errors in the log.

Free download · 368-page PDF

Download the Free 500-Prompt DevOps AI Toolkit

500 battle-tested, copy-paste AI prompts engineered by a senior systems engineer — every one with fill-in placeholders and safety/back-out notes. Drop your email and it's yours.

  • 500 prompts: Linux · Kubernetes · Terraform · OpenStack · GitLab · Docker · Monitoring · Incident Response
  • Instant PDF download — yours free, forever
  • Plus one practical AI-workflow email a week (no spam)

Single opt-in · unsubscribe anytime · no spam.