Kafka Error Guide: 'Metadata loader failed' Broker Cannot

Exact Error Message

In KRaft, every broker runs a metadata loader that consumes the __cluster_metadata log and applies each record to its local in-memory image. When applying a record throws, the loader fails and the broker stops publishing metadata, often shutting down:

[2026-06-29 08:24:13,902] ERROR [BrokerMetadataLoader id=21] Metadata loader failed while applying batch ending at offset 9,915,402 (org.apache.kafka.image.loader.MetadataLoader)
java.lang.RuntimeException: Unable to apply RegisterBrokerRecord: unsupported metadata.version 3.9-IV0 (broker supports up to 3.7-IV4)
    at org.apache.kafka.image.loader.MetadataLoader.maybePublishMetadata(MetadataLoader.java:374)
[2026-06-29 08:24:13,915] ERROR [BrokerServer id=21] Encountered fatal fault in metadata loader; shutting down (org.apache.kafka.server.fault.ProcessTerminatingFaultHandler)

A corrupt/unexpected-record variant:

java.lang.IllegalStateException: Metadata loader failed: cannot apply TopicRecord for topic id Xk2... that already exists at offset 9,915,388 (org.apache.kafka.image.MetadataDelta)

What the Error Means

The metadata loader is the broker-side component that turns the replicated __cluster_metadata log into the broker’s working view of the cluster (topics, partitions, leaders, configs, broker registrations). “Metadata loader failed” means the broker fetched the log fine but could not apply a record to build a consistent image — because the record uses a feature/metadata.version the broker does not understand, the record is inconsistent with the broker’s current image, or a local fault (memory, bug, corrupt local replica) interrupted application.

This is distinct from “Unable to fetch metadata log,” which is a fetch/transport problem; here the bytes arrived but applying them failed. KRaft treats a failed metadata apply as a fatal fault by default, because a broker running on a half-applied or inconsistent metadata image is dangerous — so the broker typically shuts down rather than serve wrong metadata.

Common Causes

metadata.version / feature skew — the controller quorum upgraded the metadata version (or enabled a feature) beyond what this broker’s Kafka version supports, so it cannot apply newer records. Common during a botched or out-of-order upgrade.
Broker binary older than the cluster — a node left behind on an old release after a rolling upgrade.
Corrupt or inconsistent local metadata replica producing impossible deltas (e.g., creating a topic ID that already exists).
A genuine Kafka bug in record application for a specific record type/version.
Local resource faults (OOM, disk error) interrupting the loader mid-apply.
Manual tampering with the broker’s __cluster_metadata-0 directory leaving an inconsistent starting image.

How to Reproduce the Error

Introduce a version skew: upgrade the cluster’s metadata.version beyond what one broker’s binary supports, then start that broker:

# Check the cluster's current metadata.version / features (read-only)
kafka-features.sh --bootstrap-server c1:9092 describe

# Lab only: with the controllers on a newer metadata.version than this broker's
# binary supports, start the older broker and watch the loader fail
sudo systemctl start kafka
sudo journalctl -u kafka -f | grep -iE 'metadata loader|metadata.version|fatal fault'

The lagging broker fetches the metadata log, hits a RegisterBrokerRecord/feature record with an unsupported metadata.version, logs “Metadata loader failed,” and shuts down via the fault handler.

Diagnostic Commands

All read-only.

# What metadata.version / features does the cluster expect?
kafka-features.sh --bootstrap-server c1:9092 describe

# Confirm this broker's installed Kafka version vs the cluster's expectation
kafka-broker-api-versions.sh --bootstrap-server localhost:9092 2>&1 | head -5
grep -i 'version' /opt/kafka/libs/kafka-clients-*.jar 2>/dev/null || ls /opt/kafka/libs/ | grep -i kafka_

# The exact record/offset the loader choked on
grep -iE 'metadata loader failed|unable to apply|unsupported metadata.version|fatal fault' \
  /var/log/kafka/server.log | tail -50

# Decode the metadata around the failing offset to see the record type (read-only)
kafka-dump-log.sh --cluster-metadata-decoder \
  --files /var/lib/kafka/__cluster_metadata-0/*.log | grep -iE 'FeatureLevel|RegisterBroker|Topic' | tail -40

# Inspect the broker's local metadata snapshot/image (read-only browse)
kafka-metadata-shell.sh --snapshot \
  /var/lib/kafka/__cluster_metadata-0/$(ls /var/lib/kafka/__cluster_metadata-0/ | grep checkpoint | tail -1)

# Local faults that could interrupt apply
journalctl -u kafka --since "30 min ago" | grep -iE 'oom|out of memory|i/o error'

If kafka-features.sh describe shows a metadata.version higher than the failing broker’s binary supports, the cause is version skew. If versions match, decode the failing offset to see which record type is inconsistent.

Step-by-Step Resolution

Read the loader error to identify the failing record and reason. “unsupported metadata.version” points to version skew; “already exists”/inconsistent-delta points to a corrupt local image; OOM/I/O points to a local fault.
For version skew, upgrade the broker binary to a release that supports the cluster’s current metadata.version, then restart. Never finalize a metadata.version upgrade until every broker runs a binary that supports it — this is the canonical cause and fix.
Do not downgrade metadata.version to fix this; metadata version upgrades are generally one-way. Bring the broker up to the cluster’s level instead.
For a corrupt/inconsistent local image, recover that broker’s metadata from the quorum per your runbook (clear only its __cluster_metadata-0 so it re-bootstraps from a snapshot) — same approach as metadata-corruption recovery. Verify the disk is healthy first.
For local resource faults, fix OOM (heap sizing) or disk errors, then restart so the loader can apply cleanly.
Restart and confirm the broker applies metadata to current offset (no loader errors), registers with the controller, and begins serving partitions; kafka-metadata-quorum.sh describe --replication shows it caught up.

Prevention and Best Practices

Upgrade in the correct order: roll all broker/controller binaries to the new version first, confirm stability, and only then finalize metadata.version/feature upgrades — never the reverse.
Keep all nodes on matching Kafka versions; do not leave a straggler broker behind after a rolling upgrade.
Treat metadata.version upgrades as one-way; plan them deliberately and document the minimum binary version each requires.
Never hand-edit a broker’s __cluster_metadata-0; recover inconsistent images from the quorum.
Size broker heap to avoid OOM during metadata apply, and monitor disk health.
Alert on server.log lines matching “Metadata loader failed” — they usually mean a broker is about to (or did) shut down.

Unable to fetch metadata log — the fetch-side counterpart; here fetch works but apply fails.
Metadata log corruption detected — a corrupt local image can surface as a loader apply failure.
Metadata quorum unavailable — if the controller side is down, brokers have nothing to load. See the Kafka guides.

Frequently Asked Questions

Why does the broker shut down instead of skipping the bad record? Running on a partially applied metadata image risks serving wrong leadership/config. KRaft treats apply failures as fatal so the broker fails safe rather than corrupt client behavior.

Is this always a version problem? No, but version/feature skew is the most common cause. Inconsistent local images and local resource faults (OOM, disk) also trigger it.

Can I downgrade metadata.version to recover? Generally no — metadata version upgrades are one-way. The fix is to upgrade the lagging broker’s binary, not downgrade the cluster.

Do other brokers keep working while one loader fails? Yes. The failure is local to the affected broker. Its partitions fail over to other replicas while it is down; fix and restart it to restore capacity.

How do I avoid this during upgrades? Finalize metadata.version only after every node runs a binary that supports it. Confirm with kafka-features.sh describe and per-broker version checks before finalizing.

Kafka Error Guide: 'Metadata loader failed' Broker Cannot Apply Controller Updates

Exact Error Message

What the Error Means

Common Causes

How to Reproduce the Error

Diagnostic Commands

Step-by-Step Resolution

Prevention and Best Practices

Frequently Asked Questions

Download the Free 500-Prompt DevOps AI Toolkit

Exact Error Message

What the Error Means

Common Causes

How to Reproduce the Error

Diagnostic Commands

Step-by-Step Resolution

Prevention and Best Practices

Related Errors

Frequently Asked Questions

Download the Free 500-Prompt DevOps AI Toolkit