Kafka Error Guide: 'Metadata loader failed' Broker Cannot Apply Controller Updates
Fix KRaft 'Metadata loader failed': diagnose why a broker cannot apply __cluster_metadata updates from the controller due to bad records, version skew, or local faults.
- #kafka
- #troubleshooting
- #errors
- #kraft
Exact Error Message
In KRaft, every broker runs a metadata loader that consumes the __cluster_metadata log and applies each record to its local in-memory image. When applying a record throws, the loader fails and the broker stops publishing metadata, often shutting down:
[2026-06-29 08:24:13,902] ERROR [BrokerMetadataLoader id=21] Metadata loader failed while applying batch ending at offset 9,915,402 (org.apache.kafka.image.loader.MetadataLoader)
java.lang.RuntimeException: Unable to apply RegisterBrokerRecord: unsupported metadata.version 3.9-IV0 (broker supports up to 3.7-IV4)
at org.apache.kafka.image.loader.MetadataLoader.maybePublishMetadata(MetadataLoader.java:374)
[2026-06-29 08:24:13,915] ERROR [BrokerServer id=21] Encountered fatal fault in metadata loader; shutting down (org.apache.kafka.server.fault.ProcessTerminatingFaultHandler)
A corrupt/unexpected-record variant:
java.lang.IllegalStateException: Metadata loader failed: cannot apply TopicRecord for topic id Xk2... that already exists at offset 9,915,388 (org.apache.kafka.image.MetadataDelta)
What the Error Means
The metadata loader is the broker-side component that turns the replicated __cluster_metadata log into the broker’s working view of the cluster (topics, partitions, leaders, configs, broker registrations). “Metadata loader failed” means the broker fetched the log fine but could not apply a record to build a consistent image — because the record uses a feature/metadata.version the broker does not understand, the record is inconsistent with the broker’s current image, or a local fault (memory, bug, corrupt local replica) interrupted application.
This is distinct from “Unable to fetch metadata log,” which is a fetch/transport problem; here the bytes arrived but applying them failed. KRaft treats a failed metadata apply as a fatal fault by default, because a broker running on a half-applied or inconsistent metadata image is dangerous — so the broker typically shuts down rather than serve wrong metadata.
Common Causes
metadata.version/ feature skew — the controller quorum upgraded the metadata version (or enabled a feature) beyond what this broker’s Kafka version supports, so it cannot apply newer records. Common during a botched or out-of-order upgrade.- Broker binary older than the cluster — a node left behind on an old release after a rolling upgrade.
- Corrupt or inconsistent local metadata replica producing impossible deltas (e.g., creating a topic ID that already exists).
- A genuine Kafka bug in record application for a specific record type/version.
- Local resource faults (OOM, disk error) interrupting the loader mid-apply.
- Manual tampering with the broker’s
__cluster_metadata-0directory leaving an inconsistent starting image.
How to Reproduce the Error
Introduce a version skew: upgrade the cluster’s metadata.version beyond what one broker’s binary supports, then start that broker:
# Check the cluster's current metadata.version / features (read-only)
kafka-features.sh --bootstrap-server c1:9092 describe
# Lab only: with the controllers on a newer metadata.version than this broker's
# binary supports, start the older broker and watch the loader fail
sudo systemctl start kafka
sudo journalctl -u kafka -f | grep -iE 'metadata loader|metadata.version|fatal fault'
The lagging broker fetches the metadata log, hits a RegisterBrokerRecord/feature record with an unsupported metadata.version, logs “Metadata loader failed,” and shuts down via the fault handler.
Diagnostic Commands
All read-only.
# What metadata.version / features does the cluster expect?
kafka-features.sh --bootstrap-server c1:9092 describe
# Confirm this broker's installed Kafka version vs the cluster's expectation
kafka-broker-api-versions.sh --bootstrap-server localhost:9092 2>&1 | head -5
grep -i 'version' /opt/kafka/libs/kafka-clients-*.jar 2>/dev/null || ls /opt/kafka/libs/ | grep -i kafka_
# The exact record/offset the loader choked on
grep -iE 'metadata loader failed|unable to apply|unsupported metadata.version|fatal fault' \
/var/log/kafka/server.log | tail -50
# Decode the metadata around the failing offset to see the record type (read-only)
kafka-dump-log.sh --cluster-metadata-decoder \
--files /var/lib/kafka/__cluster_metadata-0/*.log | grep -iE 'FeatureLevel|RegisterBroker|Topic' | tail -40
# Inspect the broker's local metadata snapshot/image (read-only browse)
kafka-metadata-shell.sh --snapshot \
/var/lib/kafka/__cluster_metadata-0/$(ls /var/lib/kafka/__cluster_metadata-0/ | grep checkpoint | tail -1)
# Local faults that could interrupt apply
journalctl -u kafka --since "30 min ago" | grep -iE 'oom|out of memory|i/o error'
If kafka-features.sh describe shows a metadata.version higher than the failing broker’s binary supports, the cause is version skew. If versions match, decode the failing offset to see which record type is inconsistent.
Step-by-Step Resolution
- Read the loader error to identify the failing record and reason. “unsupported metadata.version” points to version skew; “already exists”/inconsistent-delta points to a corrupt local image; OOM/I/O points to a local fault.
- For version skew, upgrade the broker binary to a release that supports the cluster’s current
metadata.version, then restart. Never finalize ametadata.versionupgrade until every broker runs a binary that supports it — this is the canonical cause and fix. - Do not downgrade
metadata.versionto fix this; metadata version upgrades are generally one-way. Bring the broker up to the cluster’s level instead. - For a corrupt/inconsistent local image, recover that broker’s metadata from the quorum per your runbook (clear only its
__cluster_metadata-0so it re-bootstraps from a snapshot) — same approach as metadata-corruption recovery. Verify the disk is healthy first. - For local resource faults, fix OOM (heap sizing) or disk errors, then restart so the loader can apply cleanly.
- Restart and confirm the broker applies metadata to current offset (no loader errors), registers with the controller, and begins serving partitions;
kafka-metadata-quorum.sh describe --replicationshows it caught up.
Prevention and Best Practices
- Upgrade in the correct order: roll all broker/controller binaries to the new version first, confirm stability, and only then finalize
metadata.version/feature upgrades — never the reverse. - Keep all nodes on matching Kafka versions; do not leave a straggler broker behind after a rolling upgrade.
- Treat
metadata.versionupgrades as one-way; plan them deliberately and document the minimum binary version each requires. - Never hand-edit a broker’s
__cluster_metadata-0; recover inconsistent images from the quorum. - Size broker heap to avoid OOM during metadata apply, and monitor disk health.
- Alert on
server.loglines matching “Metadata loader failed” — they usually mean a broker is about to (or did) shut down.
Related Errors
- Unable to fetch metadata log — the fetch-side counterpart; here fetch works but apply fails.
- Metadata log corruption detected — a corrupt local image can surface as a loader apply failure.
- Metadata quorum unavailable — if the controller side is down, brokers have nothing to load. See the Kafka guides.
Frequently Asked Questions
Why does the broker shut down instead of skipping the bad record? Running on a partially applied metadata image risks serving wrong leadership/config. KRaft treats apply failures as fatal so the broker fails safe rather than corrupt client behavior.
Is this always a version problem? No, but version/feature skew is the most common cause. Inconsistent local images and local resource faults (OOM, disk) also trigger it.
Can I downgrade metadata.version to recover? Generally no — metadata version upgrades are one-way. The fix is to upgrade the lagging broker’s binary, not downgrade the cluster.
Do other brokers keep working while one loader fails? Yes. The failure is local to the affected broker. Its partitions fail over to other replicas while it is down; fix and restart it to restore capacity.
How do I avoid this during upgrades? Finalize metadata.version only after every node runs a binary that supports it. Confirm with kafka-features.sh describe and per-broker version checks before finalizing.
Download the Free 500-Prompt DevOps AI Toolkit
500 battle-tested, copy-paste AI prompts engineered by a senior systems engineer — every one with fill-in placeholders and safety/back-out notes. Drop your email and it's yours.
- 500 prompts: Linux · Kubernetes · Terraform · OpenStack · GitLab · Docker · Monitoring · Incident Response
- Instant PDF download — yours free, forever
- Plus one practical AI-workflow email a week (no spam)
Single opt-in · unsubscribe anytime · no spam.