AI for Kafka
Operate and debug Apache Kafka with AI — brokers and controllers, partitions and ISR, producers and consumers, KRaft and ZooKeeper, rebalances, retention, and throughput tuning.
Prompts
- Advanced
Kafka Cluster Sizing & Capacity Planning Prompt
Size a Kafka cluster end to end — broker count, partition counts, retention, disk, memory, and network — for a target throughput, with headroom for spikes and broker failure.
- Claude
- ChatGPT
Open prompt - Intermediate
Kafka Consumer Lag Investigation Prompt
Investigate and reduce growing consumer lag by isolating the root cause — slow processing, partition skew, GC pauses, or broker-side bottlenecks — then prescribe targeted fixes.
- Claude
- ChatGPT
Open prompt - Advanced
Kafka Consumer Rebalance Storm Triage Prompt
Diagnose frequent or looping consumer-group rebalances by working through session, heartbeat, and poll timeouts, static membership, and the rebalance protocol in use.
- Claude
- ChatGPT
Open prompt - Advanced
Kafka Exactly-Once Semantics Design Prompt
Design exactly-once processing across a produce-process-consume pipeline using the idempotent producer and transactions, with honest guidance on where EOS holds and where it does not.
- Claude
- ChatGPT
Open prompt - Advanced
Kafka Partition Reassignment & Broker Decommission Plan Prompt
Plan a safe partition reassignment or broker decommission using throttled data movement, staged batches, and verification, minimizing impact on live traffic.
- Claude
- ChatGPT
Open prompt - Intermediate
Kafka Producer Throughput & Latency Tuning Prompt
Tune Kafka producer batching, compression, acks, linger, and idempotence to hit a throughput or latency target while keeping the durability guarantees you actually need.
- Claude
- ChatGPT
Open prompt - Advanced
Kafka TLS, SASL & ACL Security Hardening Prompt
Harden a Kafka cluster end to end — TLS encryption, SASL authentication, and least-privilege ACLs with per-principal scoping — and produce a phased rollout that avoids locking out clients.
- Claude
- ChatGPT
Open prompt - Intermediate
Kafka Topic Design & Partitioning Strategy Prompt
Design a Kafka topic from first principles — partition count, keying, replication factor, min.insync.replicas, and retention vs. compaction — to match ordering, scale, and durability needs.
- Claude
- ChatGPT
Open prompt - Advanced
Kafka ZooKeeper to KRaft Migration Plan Prompt
Plan a staged migration from ZooKeeper-based Kafka to KRaft mode with version prerequisites, controller quorum design, rollback checkpoints, and validation at each phase.
- Claude
- ChatGPT
Open prompt
Guides
- · 10 min read
AI-Assisted Kafka Troubleshooting Explained
How AI-assisted Kafka troubleshooting works — diagnosing broker faults, consumer lag, rebalance storms, and ISR shrink faster, with the governance to run it safely.
Read guide - · 11 min read
Debugging Kafka Consumer Lag with AI
Measure Kafka consumer lag correctly, find the real root cause with AI-assisted analysis, and apply durable fixes — from poison messages to under-provisioned groups.
Read guide - · 11 min read
Designing Kafka Topics: Partitions and Replication
How to design Kafka topics that scale — choosing partition counts, partition keys, replication factor, min.insync.replicas, retention, and log compaction correctly.
Read guide - · 9 min read
Kafka Error Guide: 'AuthorizationException' Not Authorized to Access
Fix Kafka AuthorizationException: diagnose missing ACLs, wrong principal mapping, allow.everyone.if.no.acl.found, super.users, and authorizer misconfiguration.
Read guide - · 9 min read
Kafka Error Guide: 'java.io.IOException: Broken pipe' Write to Closed Socket
Fix Kafka 'Broken pipe' — diagnose writes to closed sockets, idle-timeout disconnects, oversized requests, and broker-side connection drops during sends.
Read guide - · 9 min read
Kafka Error Guide: 'Broker may not be available' Connection Failure
Fix Kafka 'Connection to node 1 could not be established. Broker may not be available': diagnose down brokers, wrong bootstrap servers, listeners, and firewalls.
Read guide - · 9 min read
Kafka Error Guide: 'BrokerEndPointNotAvailableException' Missing Listener
Fix Kafka BrokerEndPointNotAvailableException: a listener or security protocol has no advertised endpoint. Fix listeners, advertised.listeners, and listener maps.
Read guide - · 9 min read
Kafka Error Guide: 'CertificateExpiredException' Certificate Has Expired
Fix Kafka CertificateExpiredException: diagnose expired broker or client certs, expired CA roots, clock skew, and short-lived certificate rotation failures.
Read guide - · 9 min read
Kafka Error Guide: 'ClusterAuthorizationException' Cluster Authorization Failed
Fix Kafka ClusterAuthorizationException: diagnose missing CLUSTER ACLs, idempotent producer IdempotentWrite, transactional IDs, and admin operations on the cluster.
Read guide - · 9 min read
Kafka Error Guide: 'CommitFailedException' Offset Commit Cannot Be Completed
Fix Kafka CommitFailedException when the consumer falls out of an active group: diagnose slow processing, max.poll.interval.ms, and rebalance-driven commit rejection.
Read guide - · 9 min read
Kafka Error Guide: 'java.io.IOException: Connection reset by peer' Broker Reset
Fix Kafka 'Connection reset by peer' — diagnose broker restarts, load balancer and firewall idle resets, and plaintext-to-SSL listener mismatches.
Read guide - · 9 min read
Kafka Error Guide: 'java.net.SocketTimeoutException: Connection timed out' TCP Connect
Fix Kafka 'Connection timed out' at TCP connect — diagnose firewall DROP rules, security groups, and routing black holes, distinct from request.timeout.ms.
Read guide - · 9 min read
Kafka Error Guide: 'Connection to node -1 could not be established' Bootstrap Failure
Fix Kafka 'Connection to node -1 could not be established. Broker may not be available' — diagnose dead brokers, wrong bootstrap.servers, and listener binds.
Read guide - · 9 min read
Kafka Error Guide: 'Controller epoch is older than the current controller epoch' Stale Epoch
Fix Kafka 'controller epoch is older than the current controller epoch': understand epoch fencing, split brain after a network partition, and how to confirm the live controller.
Read guide - · 9 min read
Kafka Error Guide: 'Controller heartbeat timeout' Broker Fenced in KRaft
Fix Kafka KRaft 'controller heartbeat timeout / broker fenced': tune broker.heartbeat.interval.ms and broker.session.timeout.ms, and diagnose missed broker heartbeats.
Read guide - · 9 min read
Kafka Error Guide: 'This is not the correct controller for this cluster' Controller Moved
Fix Kafka 'not the correct controller / controller moved to another broker': understand normal failover, stale controllers, and how to confirm the real active controller.
Read guide - · 9 min read
Kafka Error Guide: 'Controller mutation rate quota exceeded' Throttled Topic Ops
Fix Kafka controller mutation rate quota errors: understand CONTROLLER_MUTATION quotas, throttled topic create/delete/partition ops, and how to size the limit safely.
Read guide - · 9 min read
Kafka Error Guide: 'Controller not available' No Active Controller
Fix Kafka 'Controller not available / controller connection failed': diagnose quorum loss, no elected controller, ZooKeeper outages, and KRaft voter majority failures.
Read guide - · 9 min read
Kafka Error Guide: 'CoordinatorNotAvailableException' Group Coordinator Down
Fix Kafka CoordinatorNotAvailableException: resolve __consumer_offsets unavailability, coordinator load-in-progress, offline partitions, and under-replicated offsets topic.
Read guide - · 9 min read
Kafka Error Guide: 'Found a corrupted segment' Corrupt Log Segment on Load
Fix Kafka corrupted log segment errors: diagnose unclean shutdowns, truncated segments, and 'Unexpected EOF while reading log' so a broker can finish startup recovery.
Read guide - · 9 min read
Kafka Error Guide: 'CorruptRecordException' Message Failed Its CRC Checksum
Fix Kafka CorruptRecordException: diagnose CRC32C checksum mismatches from network corruption, bad disks, truncated segments after unclean shutdown, and consumer fetch detection.
Read guide - · 9 min read
Kafka Error Guide: 'Error while creating log directories' Log Dir Failure
Fix Kafka 'Error while creating log directories': resolve missing log.dirs paths, wrong ownership, Permission denied, full disks, and stale .lock files that mark a dir offline.
Read guide - · 9 min read
Kafka Error Guide: 'Error while electing or becoming controller on broker 1' Election Failure
Fix Kafka 'Error while electing or becoming controller on broker 1': diagnose ZooKeeper session loss, quorum problems, znode conflicts, and stuck controller election.
Read guide - · 9 min read
Kafka Error Guide: 'Error while fetching metadata' LEADER_NOT_AVAILABLE
Fix Kafka's 'Error while fetching metadata ... LEADER_NOT_AVAILABLE' and UNKNOWN_TOPIC_OR_PARTITION client warnings: causes, diagnostics, and resolution.
Read guide - · 9 min read
Kafka Error Guide: 'Failed to append metadata record' Raft Append Failure
Fix KRaft 'Failed to append metadata record' to __cluster_metadata: diagnose lost leadership, no quorum, disk-full, and timeout failures on the Raft write path.
Read guide - · 9 min read
Kafka Error Guide: 'Failed to append records to topic-0 in dir /var/lib/kafka/data' Offline Log Dir
Fix Kafka's KafkaStorageException when a broker fails to append to its local log and marks the data directory offline due to disk, IO, or permission faults.
Read guide - · 9 min read
Kafka Error Guide: 'Error while flushing log' fsync Failure on Broker
Fix Kafka 'Error while flushing log for topic-0' fsync failures: diagnose disk stalls, IO errors, and storage latency that mark a log directory offline via KafkaStorageException.
Read guide - · 9 min read
Kafka Error Guide: 'Could not recover log' Recovery Failure After Crash
Fix Kafka 'Could not recover log' errors: diagnose crash recovery failures, 'Unable to allocate log segment', disk-full recovery, and brokers stuck on startup.
Read guide - · 9 min read
Kafka Error Guide: 'Failed to update metadata after 60000 ms' Client Timeout
Fix Kafka 'TimeoutException: Failed to update metadata after 60000 ms': resolve bad bootstrap.servers, broken advertised.listeners, ACL denials, and unreachable brokers.
Read guide - · 9 min read
Kafka Error Guide: 'Fatal error during KafkaServer startup' Broker Won't Start
Fix Kafka 'Fatal error during KafkaServer startup. Prepare to shutdown': resolve bad config, port-in-use, log.dir failures, and meta.properties cluster.id/broker.id mismatches.
Read guide - · 9 min read
Kafka Error Guide: 'FencedInstanceIdException' Static Member Has Been Fenced
Fix Kafka FencedInstanceIdException: why duplicate group.instance.id values fence a static consumer member, and how to keep static membership ids unique.
Read guide - · 9 min read
Kafka Error Guide: 'Fetch request timed out' Consumer & Replica Fetch Timeout
Fix Kafka 'Fetch request timed out' / request.timeout.ms exceeded on fetch: resolve slow brokers, overlarge fetch sizes, network latency, and replica fetcher stalls.
Read guide - · 9 min read
Kafka Error Guide: 'GroupAuthorizationException' Not Authorized to Access Group
Fix Kafka GroupAuthorizationException: diagnose missing group Read ACLs, wrong group.id, consumer principal mapping, and prefixed vs literal group patterns.
Read guide - · 9 min read
Kafka Error Guide: 'Truncating partition topic-0 to local high watermark 10042' Replica Divergence
Understand Kafka follower log truncation and high watermark mismatch after a leader change, when it is safe, and when unclean leader election causes data loss.
Read guide - · 9 min read
Kafka Error Guide: 'IllegalGenerationException' Generation Is Not the Current Generation
Fix Kafka IllegalGenerationException: why a stale group generation rejects commits and heartbeats after a rebalance, and how to rejoin with the current generation.
Read guide - · 9 min read
Kafka Error Guide: 'InvalidConfigurationException: Unknown topic config name' Fix
Fix Kafka InvalidConfigurationException: unknown topic config keys like retentions.ms, bad cleanup.policy values, out-of-range numbers, and broker config typos.
Read guide - · 9 min read
Kafka Error Guide: 'InvalidPartitionsException' Cannot Decrease Partition Count
Fix Kafka InvalidPartitionsException when altering a topic: why partitions can only increase, invalid counts, IaC drift, and the ordering caveat for keyed messages.
Read guide - · 9 min read
Kafka Error Guide: 'InvalidRecordException: One or more records have been rejected' Records Rejected by Broker Validation
Fix Kafka InvalidRecordException when the broker rejects records: null keys on compacted topics, out-of-range timestamps, bad magic bytes, and transactional misuse.
Read guide - · 9 min read
Kafka Error Guide: 'InvalidReplicationFactorException' Larger Than Available Brokers
Fix Kafka InvalidReplicationFactorException: replication factor larger than available brokers, brokers down, single-node dev RF=3 defaults, and min.insync.replicas confusion.
Read guide - · 9 min read
Kafka Error Guide: 'InvalidTopicException' Topic Name Is Invalid
Fix Kafka InvalidTopicException: illegal characters, names over 249 chars, '.'/'_' metric collisions, reserved '__' prefixes, and empty or '.' topic names.
Read guide - · 9 min read
Kafka Error Guide: 'Shrinking ISR from 1,2,3 to 1,2' Replica Lag Flapping
Why Kafka logs 'Shrinking ISR' and 'Expanding ISR' for a partition, how replica.lag.time.max.ms drives it, and how to stabilize a flapping follower.
Read guide - · 9 min read
Kafka Error Guide: '[KafkaServer id=1] shutting down' Graceful vs Crash
Read Kafka '[KafkaServer id=1] shutting down' and 'started' lifecycle lines: tell a graceful controlled.shutdown from an abnormal crash and trace the real trigger.
Read guide - · 9 min read
Kafka Error Guide: 'Raft leader election failed' No Quorum Leader Elected
Fix KRaft 'Raft leader election failed': diagnose missing quorum leader, bad controller.quorum.voters, network partitions, and clock/epoch issues between controllers.
Read guide - · 9 min read
Kafka Error Guide: 'Leader election failed' Offline Partitions and No Leader
Why Kafka controller and preferred-leader elections fail, how unclean.leader.election leaves partitions leaderless, and read-only commands to diagnose it.
Read guide - · 9 min read
Kafka Error Guide: 'Leader epoch mismatch' Fencing in the KRaft Metadata Log
Fix KRaft 'Leader epoch mismatch': understand epoch fencing after a controller election, diagnose stale leaders and divergent followers, and recover the quorum cleanly.
Read guide - · 9 min read
Kafka Error Guide: 'LeaderNotAvailableException' Leader Election in Progress
Fix Kafka LeaderNotAvailableException: understand transient leader election on new topics, stale metadata, offline partitions, and when to retry vs investigate.
Read guide - · 9 min read
Kafka Error Guide: 'Stopping serving logs in dir /var/lib/kafka' Log Directory Failure
Fix Kafka KafkaStorageException log directory failures: diagnose disk errors, full volumes, bad permissions, and offline JBOD log dirs marked dead by the broker.
Read guide - · 9 min read
Kafka Error Guide: 'Error in log cleaner thread' Retention Cleanup Failed
Fix Kafka log cleaner and retention cleanup failures: diagnose a dead LogCleaner thread, dedupe buffer memory limits, compaction errors, and growing disk usage.
Read guide - · 9 min read
Kafka Error Guide: 'Metadata loader failed' Broker Cannot Apply Controller Updates
Fix KRaft 'Metadata loader failed': diagnose why a broker cannot apply __cluster_metadata updates from the controller due to bad records, version skew, or local faults.
Read guide - · 9 min read
Kafka Error Guide: 'Metadata log corruption detected' Corrupted __cluster_metadata Segment
Fix KRaft 'Metadata log corruption detected': diagnose a corrupted __cluster_metadata segment from CRC mismatch, partial writes, or disk faults, and recover safely.
Read guide - · 9 min read
Kafka Error Guide: 'Metadata quorum unavailable' Controller Majority Down
Fix KRaft 'Metadata quorum unavailable' / 'Quorum controller unavailable': diagnose a lost controller majority, wrong bootstrap controllers, and stalled metadata.
Read guide - · 9 min read
Kafka Error Guide: 'NetworkException: The server disconnected before a response was received' Server Disconnected Before Response
Diagnose Kafka NetworkException from broker disconnects: rolling restarts, GC pauses, idle-connection close, and proxy/advertised.listeners misconfig. Retry safely.
Read guide - · 9 min read
Kafka Error Guide: 'No subject alternative names matching IP address found' Hostname Verification
Fix Kafka No subject alternative names found: diagnose hostname verification failures, missing SAN entries, IP vs DNS mismatches, and endpoint identification settings.
Read guide - · 9 min read
Kafka Error Guide: 'NoBrokersAvailable' Client Cannot Reach Cluster
Fix kafka-python NoBrokersAvailable: diagnose wrong bootstrap_servers, DNS failures, firewall blocks, security protocol mismatches, and down brokers.
Read guide - · 9 min read
Kafka Error Guide: 'Node 1 disconnected' Established Broker Connection Dropped
Fix Kafka 'Node 1 disconnected' and 'Connection to node 1 disconnected' — diagnose idle timeouts, broker restarts, and version or protocol mismatches.
Read guide - · 9 min read
Kafka Error Guide: 'NotEnoughReplicasException: Messages are rejected' Fewer In-Sync Replicas Than Required
Resolve Kafka NotEnoughReplicasException and NotEnoughReplicasAfterAppendException: ISR dropped below min.insync.replicas under acks=all. Diagnose ISR and fix durability.
Read guide - · 9 min read
Kafka Error Guide: 'NotLeaderOrFollowerException' Stale Leader Metadata
Fix Kafka NotLeaderOrFollowerException (formerly NotLeaderForPartition): stale client metadata after a leader move, reassignments, and broker restarts.
Read guide - · 9 min read
Kafka Error Guide: 'Corrupted index found' Offset Index Corrupted on Startup
Fix Kafka 'Corrupted index found' and 'Found invalid offset index' errors: understand index rebuilds on restart, time index corruption, and slow recovery startups.
Read guide - · 9 min read
Kafka Error Guide: 'OffsetOutOfRangeException' Fetch Position Out of Range
Fix Kafka OffsetOutOfRangeException: diagnose offsets behind the log start from retention, auto.offset.reset behavior, and lagging consumers reading deleted data.
Read guide - · 9 min read
Kafka Error Guide: 'OutOfOrderSequenceException' Out of Order Sequence Number
Fix Kafka OutOfOrderSequenceException: diagnose idempotent producer sequence gaps from dropped batches, message loss via unclean leader election, and PID resets; why it is non-recoverable and how to prevent it.
Read guide - · 9 min read
Kafka Error Guide: 'NOT_LEADER_OR_FOLLOWER' stale partition metadata on clients
Fix Kafka clients hitting NOT_LEADER_OR_FOLLOWER after a leader moves. Understand metadata refresh, retries, advertised.listeners, and why it self-heals.
Read guide - · 9 min read
Kafka Error Guide: 'Partition marked offline' OfflinePartitionsCount > 0
Diagnose Kafka offline partitions when OfflinePartitionsCount is above zero and a partition has no leader. Restore replicas and recover offline log dirs.
Read guide - · 9 min read
Kafka Error Guide: 'Partition reassignment for topic-0 failed' Stuck and Failed Reassignments
Why kafka-reassign-partitions.sh reports a reassignment as still in progress or failed, how to diagnose throttles, dead brokers, and disk, and how to recover.
Read guide - · 9 min read
Kafka Error Guide: 'PKIX path building failed' Unable to Find Valid Certification Path
Fix Kafka PKIX path building failed: diagnose missing CA in the truststore, incomplete chains, wrong truststore, bad_certificate alerts, and self-signed broker certs.
Read guide - · 9 min read
Kafka Error Guide: 'TimeoutException: Expiring N record(s)' Producer Send Timeout
Fix Kafka producer 'TimeoutException: Expiring 5 record(s) ... ms has passed since batch creation': tune delivery.timeout.ms, request.timeout.ms, linger.ms, batch.size and buffer.memory.
Read guide - · 9 min read
Kafka Error Guide: 'RebalanceInProgressException' Consumer Group Is Rebalancing
Fix Kafka RebalanceInProgressException: why offset commits fail mid-rebalance, how cooperative rebalancing changes it, and how to retry the poll cycle safely.
Read guide - · 9 min read
Kafka Error Guide: 'RecordTooLargeException' Message Exceeds max.request.size
Fix Kafka 'RecordTooLargeException: The message is N bytes ... larger than max.request.size': align max.request.size, message.max.bytes, max.message.bytes, fetch limits and compression.
Read guide - · 9 min read
Kafka Error Guide: 'Shrinking ISR' replica lagging and under-replicated partitions
Fix Kafka followers that lag and drop out of ISR causing under-replicated partitions: slow disk, NIC saturation, fetchers, and leftover replication throttles.
Read guide - · 9 min read
Kafka Error Guide: 'Error for partition topic-0 at offset 12345' ReplicaFetcherThread Failure
Decode ReplicaFetcherThread errors when a Kafka follower can't fetch from the leader: NOT_LEADER, OFFSET_OUT_OF_RANGE, fetch size, and TLS causes.
Read guide - · 9 min read
Kafka Error Guide: 'ReplicaNotAvailableException' Replica Reassignment Notice
Understand Kafka ReplicaNotAvailableException: usually transient and informational during reassignment, when to ignore it, and when a replica is truly offline.
Read guide - · 9 min read
Kafka Error Guide: 'SaslAuthenticationException: Authentication failed' Invalid Credentials
Fix Kafka SaslAuthenticationException: diagnose bad SCRAM/PLAIN passwords, wrong JAAS config, missing mechanism, Kerberos keytab issues, and broker SASL setup.
Read guide - · 9 min read
Kafka Error Guide: 'SerializationException: Error serializing Avro message' Error Serializing Message
Fix Kafka 'SerializationException: Error serializing Avro message': wrong key/value.serializer, type mismatches, and Schema Registry subject-not-found or incompatible-schema failures.
Read guide - · 9 min read
Kafka Error Guide: 'left group due to expired session timeout' Consumer Drop
Fix Kafka consumers leaving the group on expired session timeout: tune session.timeout.ms and max.poll.interval.ms, cut GC pauses, and fix network and heartbeat stalls.
Read guide - · 9 min read
Kafka Error Guide: 'Snapshot generation failed' Metadata Snapshot Write Error
Fix KRaft 'Snapshot generation failed': diagnose disk-full, permissions, and I/O errors when the controller writes a __cluster_metadata snapshot to checkpoint state.
Read guide - · 9 min read
Kafka Error Guide: 'Error while accepting connection' SocketServer Processor Failure
Fix Kafka SocketServer errors — 'Error while accepting connection' and 'Processor got uncaught exception' from file-descriptor limits and listener bind failures.
Read guide - · 9 min read
Kafka Error Guide: 'SslAuthenticationException: SSL channel closed' Failed to Send SSL Close Message
Fix Kafka SSL channel closed: diagnose plaintext clients hitting SSL listeners, wrong security.protocol, abrupt connection drops, and proxy/LB TLS termination.
Read guide - · 9 min read
Kafka Error Guide: 'SSLHandshakeException: Received fatal alert: handshake_failure'
Fix Kafka SSLHandshakeException handshake_failure: diagnose TLS version mismatch, cipher suite gaps, one-way vs mTLS, missing client cert, and protocol disablement.
Read guide - · 9 min read
Kafka Error Guide: 'Topic deletion is disabled' delete.topic.enable Fix
Fix Kafka 'Topic deletion is disabled' when deleting a topic: enable delete.topic.enable on brokers, restart safely, and retry the delete cluster-wide.
Read guide - · 9 min read
Kafka Error Guide: 'TopicAuthorizationException' Not Authorized to Access Topics
Fix Kafka TopicAuthorizationException: diagnose missing topic Read/Write/Describe ACLs, principal mismatch, prefixed patterns, and metadata describe denials.
Read guide - · 9 min read
Kafka Error Guide: 'TopicExistsException' Topic Already Exists
Fix Kafka TopicExistsException for 'orders': duplicate creates, create races between CI jobs, topics stuck in pending deletion, and auto-create collisions.
Read guide - · 9 min read
Kafka Error Guide: 'Unable to fetch metadata log' Follower Far Behind
Fix KRaft 'Unable to fetch metadata log' / 'Unable to catch up to metadata log': diagnose a follower controller or broker lagging the __cluster_metadata leader.
Read guide - · 9 min read
Kafka Error Guide: 'UnknownMemberIdException' Coordinator Is Not Aware of This Member
Fix Kafka UnknownMemberIdException: why the group coordinator evicts a consumer member id after session timeouts, and how to keep heartbeats alive to rejoin cleanly.
Read guide - · 9 min read
Kafka Error Guide: 'UnknownProducerIdException' Producer ID Not Found by Broker
Fix Kafka UnknownProducerIdException: diagnose idempotent producer ID state evicted by short retention.ms, producer.id.expiration.ms expiry, and old-broker restarts; tune retention and adopt KIP-360.
Read guide - · 9 min read
Kafka Error Guide: 'UnknownTopicOrPartitionException' Topic Not Found
Fix Kafka UnknownTopicOrPartitionException 'server does not host this topic-partition': missing topics, auto-create disabled, stale metadata, and typos.
Read guide - · 9 min read
Kafka Error Guide: 'WakeupException' Thrown During Consumer Poll
Understand Kafka WakeupException: why consumer.wakeup() interrupts poll(), how to handle it for clean shutdown, and how to tell intended wakeups from real failures.
Read guide - · 9 min read
Kafka Error Guide: 'ConnectionLoss for /brokers/ids' ZooKeeper Connection Loss
Fix Kafka ZooKeeper ConnectionLoss for /brokers/ids: diagnose a downed ensemble, lost quorum, port 2181 firewall blocks, bad zookeeper.connect, and GC pauses.
Read guide - · 9 min read
Kafka Error Guide: 'NodeExistsException for /brokers/ids/1' Broker Registration Conflict
Fix Kafka KeeperException NodeExistsException for /brokers/ids: resolve duplicate broker.id, stale ephemeral nodes, session-timeout races, and cloned VM images.
Read guide - · 9 min read
Kafka Error Guide: 'NoNodeException for /brokers/ids/1' Missing ZooKeeper Znode
Fix Kafka KeeperException NoNodeException for /brokers/ids: diagnose wrong chroot in zookeeper.connect, wrong ensemble, fresh-cluster znodes, and tool mismatches.
Read guide - · 9 min read
Kafka Error Guide: 'Session expired for /controller' ZooKeeper Session Expiry
Fix Kafka ZooKeeper SessionExpiredException for /controller: diagnose long GC pauses, low session timeouts, lost ephemeral nodes, controller re-election, and clock skew.
Read guide - · 10 min read
Kafka Exactly-Once Semantics Explained
A clear guide to Kafka exactly-once semantics — idempotent producers, transactions, and the read-process-write pattern that prevents duplicates without losing data.
Read guide - · 10 min read
Kafka Partition Rebalancing Strategies
A practical guide to Kafka partition rebalancing — partition reassignment, throttles, Cruise Control, and cooperative rebalancing to move data without breaking your cluster.
Read guide - · 12 min read
Migrating Kafka from ZooKeeper to KRaft
A practical guide to migrating Kafka from ZooKeeper to KRaft — why it matters, prerequisites, the controller-based migration steps, validation, and rollback.
Read guide - · 10 min read
Monitoring Kafka with Prometheus and AI
How to monitor Apache Kafka with the JMX exporter and Prometheus — the metrics that matter, alert rules that catch real problems, and AI-assisted triage that cuts MTTR.
Read guide - · 10 min read
Securing Kafka with TLS, SASL, and ACLs
A practical guide to securing Apache Kafka with TLS encryption, SASL authentication, and ACL authorization — keystores, JAAS, listener config, and access control done right.
Read guide - · 11 min read
Tuning Kafka Producer Throughput and Latency
A practical guide to tuning Kafka producers — batching, linger, compression, acks, and idempotence — to balance throughput, latency, and durability without data loss.
Read guide