Change Data Capture From Postgres With Logical Replication

Change data capture is how you let the rest of your architecture react to what happens in your primary database without anyone writing dual-write code that drifts out of sync. Instead of the application publishing an event after it commits — and occasionally crashing between the commit and the publish — you read the database’s own write-ahead log and turn every committed change into a stream. Postgres has had logical replication built in since version 10, and it is the foundation that tools like Debezium sit on top of. It is also a feature with a few sharp edges that can take down your primary if you ignore them, the replication slot being the sharpest. AI is a capable assistant for drafting the setup and, more usefully, for building the monitoring checklist that keeps you out of trouble. You still own the production decisions.

Here is how I set up CDC from Postgres and, more importantly, how I keep it from biting me.

The Moving Parts

Logical replication decodes the WAL from physical change records into logical row changes — insert, update, delete — for the tables you choose. Three concepts do the work.

A publication is the set of tables you want to stream. A replication slot is a server-side bookmark that tracks how far a consumer has read, and guarantees Postgres retains the WAL until that consumer catches up. A decoding plugin turns the binary WAL into a format the consumer understands — pgoutput is built in and used by native subscriptions and Debezium, while wal2json emits human-readable JSON that is handy for debugging and lightweight consumers.

The retention guarantee is the whole game. The slot is what makes CDC reliable, and it is also what makes it dangerous: if a consumer stops reading, the slot pins the WAL forever and your disk fills.

Prerequisites and a Native Subscription

First, the server has to be configured for logical decoding. This requires a restart, so plan it.

# postgresql.conf
wal_level = logical
max_replication_slots = 10
max_wal_senders = 10

On the source, create a publication and a role that can replicate:

CREATE PUBLICATION orders_pub FOR TABLE orders, order_items;

CREATE ROLE cdc_reader WITH REPLICATION LOGIN PASSWORD 'redacted';
GRANT SELECT ON orders, order_items TO cdc_reader;

If your consumer is another Postgres database, a native subscription is the simplest path and it creates the slot for you:

CREATE SUBSCRIPTION orders_sub
  CONNECTION 'host=primary dbname=app user=cdc_reader password=redacted'
  PUBLICATION orders_pub;

For streaming into the broader data platform, you usually do not point Postgres at Postgres — you point a CDC connector at the slot.

Streaming to Kafka With Debezium

Debezium is the most common way to get Postgres changes into Kafka. It connects as a replication client, uses the pgoutput plugin by default, and emits one Kafka message per row change with before and after images. A minimal connector config looks like this:

{
  "name": "orders-connector",
  "config": {
    "connector.class": "io.debezium.connector.postgresql.PostgresConnector",
    "database.hostname": "primary",
    "database.dbname": "app",
    "database.user": "cdc_reader",
    "plugin.name": "pgoutput",
    "publication.name": "orders_pub",
    "slot.name": "orders_debezium",
    "topic.prefix": "app",
    "table.include.list": "public.orders,public.order_items"
  }
}

The key fields are slot.name and publication.name. Debezium will create the slot if it does not exist, and from that moment the slot starts retaining WAL on your primary. That is the line you have to monitor on the other side.

Prompt: “I’m setting up Debezium against a Postgres primary that handles production writes. Give me a monitoring checklist focused specifically on the risks the replication slot introduces.”

Output (abridged): “1. Alert on pg_replication_slots.active = false — an inactive slot still retains WAL but nothing is draining it. 2. Alert on retained WAL size via pg_wal_lsn_diff(pg_current_wal_lsn(), restart_lsn) crossing a threshold well below free disk. 3. Track replication lag in bytes, not just seconds, since a stalled connector shows growing bytes before time-based lag is obvious. 4. Watch max_slot_wal_keep_size if set, and decide whether you’d rather drop a stuck slot or fill the disk. 5. Monitor connector restarts — flapping consumers leave the slot inactive between attempts.”

That checklist is exactly right, and the first two items are the ones that take down databases.

The Slot Bloat Danger

This deserves its own section because it is the failure that surprises people. A replication slot promises Postgres will keep every WAL segment back to the consumer’s last acknowledged position. If the consumer dies, gets paused for a deploy, or simply can’t keep up, the slot keeps accumulating WAL on the primary’s disk. There is no automatic expiry by default. I have seen a forgotten test slot quietly eat hundreds of gigabytes over a weekend until the primary refused writes.

Two defenses matter. First, monitor retained WAL and alert early. Second, decide your policy with max_slot_wal_keep_size, which caps how much WAL a slot may pin before Postgres invalidates the slot to protect itself.

-- postgresql.conf or ALTER SYSTEM
ALTER SYSTEM SET max_slot_wal_keep_size = '50GB';
SELECT pg_reload_conf();

Setting this means a stuck consumer loses its place — you trade CDC continuity for a database that stays up. For most production systems that is the correct trade, because a paused connector you can re-snapshot, but a full disk is an outage. Never leave a slot uncapped on a primary you care about.

Monitoring Queries You Should Have Ready

Keep these in a runbook. They answer “is my CDC healthy” in three queries.

-- Slot status and the LSN it's holding the WAL back to
SELECT slot_name, active, restart_lsn, confirmed_flush_lsn
FROM pg_replication_slots;

-- WAL bytes retained by each slot (the disk risk)
SELECT slot_name,
       pg_size_pretty(
         pg_wal_lsn_diff(pg_current_wal_lsn(), restart_lsn)
       ) AS retained_wal
FROM pg_replication_slots;

-- Live sender lag for active replication connections
SELECT application_name, state,
       pg_size_pretty(
         pg_wal_lsn_diff(pg_current_wal_lsn(), replay_lsn)
       ) AS replay_lag
FROM pg_stat_replication;

If active is false and retained_wal is climbing, you have a dead consumer pinning your WAL — act before the disk does. AI is good at turning these into alert rules for your monitoring stack; describe your tooling and ask it to template the thresholds, then sanity-check the numbers against your actual disk headroom.

Verifying the Stream

Before you wire downstream consumers to the topics, confirm the decoded output looks right. The wal2json plugin is excellent for this because the output is readable:

pg_recvlogical -d app --slot test_slot --create-slot -P wal2json
pg_recvlogical -d app --slot test_slot --start -f -
# In another session, run an UPDATE and watch the JSON appear

Watch one insert, one update, and one delete flow through, and confirm the before/after images contain the columns you expect. Drop the test slot the moment you are done — an orphaned debugging slot is the single most common cause of the bloat problem above.

SELECT pg_drop_replication_slot('test_slot');

CDC from Postgres is reliable and well-trodden, but it shifts a real operational burden onto you: a slot is a contract to retain WAL, and contracts you forget about cost money. Let AI draft the connector config and the monitoring checklist — it does both well — and keep ownership of the slot policy yourself. For the structured end-to-end setup, the Postgres logical replication CDC setup prompt is the one I reach for, and the broader Postgres category covers the replication and monitoring topics that surround it.

Change Data Capture From Postgres With Logical Replication and AI