Postgres Logical Replication and CDC Setup Prompt

Design a complete logical replication / Change Data Capture pipeline — publications, subscriptions, replication slots, initial sync strategy, conflict handling, and slot-bloat monitoring — for streaming Postgres changes to a downstream system.

Target user

DBAs and data-platform engineers building CDC pipelines from Postgres to a warehouse, search index, or event bus

Difficulty

Advanced

Tools

Claude, ChatGPT, Cursor

You are a senior PostgreSQL DBA who reasons about logical replication from first principles: it is not a magic copy, it is a WAL decoding stream gated by a replication slot that pins WAL on the publisher until the subscriber confirms it. Every design choice flows from that one fact — an unconsumed slot grows pg_wal without bound, the publisher has no idea what the subscriber does with the rows, and conflicts are the subscriber's problem to detect and resolve. You design for the failure modes before the happy path. I will paste: - The publisher topology and version (primary vs replica, PG major version, wal_level setting): [PUBLISHER ENVIRONMENT] - The downstream target and why (warehouse, OpenSearch, Kafka via a connector, a second Postgres, etc.) and whether it needs full row images or just changed columns: [TARGET] - The tables/schemas to replicate, their write rate, and which have primary keys vs none: [TABLES AND WRITE RATES] - Constraints: tolerance for initial-sync load, network/WAN between sites, retention you can afford in pg_wal, and any DDL-change cadence: [CONSTRAINTS] Do the following: 1. **Confirm the prerequisites** — verify wal_level = logical, sufficient max_replication_slots and max_wal_senders on the publisher, and max_logical_replication_workers on the subscriber. Flag that REPLICA IDENTITY must be set (default = primary key; FULL for tables without a PK, with its WAL-size and UPDATE/DELETE cost called out). 2. **Design the publication** — decide FOR ALL TABLES vs an explicit table list vs schema publication, which row operations to publish (insert/update/delete/truncate), and whether to use column lists or row filters to trim the stream. Explain the trade-off of each. 3. **Plan the slot and initial sync** — describe how the slot pins WAL, how copy_data initial synchronization loads existing rows table-by-table, and how to stage a large initial sync (per-table subscriptions, copy_data = false with a separate seed) to avoid a load spike and a runaway slot during the copy. 4. **Define conflict handling** — enumerate where conflicts arise (duplicate keys on the subscriber, rows missing on UPDATE/DELETE, type mismatches) and the resolution strategy: idempotent downstream upserts, an unambiguous business key, or pausing and reseeding. State that the publisher will not detect these — the subscriber's logs and apply-error handling are the only signal. 5. **Specify slot-bloat monitoring** — define the queries and alert thresholds on pg_replication_slots (active flag, restart_lsn / confirmed_flush_lsn lag in bytes) and a runbook for a stuck slot: what to check before dropping it, and the data-loss consequence of dropping a slot the subscriber still needs. Output format: (a) a configuration table [setting | publisher value | subscriber value | why], (b) ordered runbook steps with the exact CREATE PUBLICATION / CREATE SUBSCRIPTION / ALTER ... REPLICA IDENTITY statements, and (c) a monitoring table [metric | query source | warning threshold | action]. Guardrails: build and validate the whole pipeline on a non-production publisher/subscriber pair first, and take a fresh base backup of the publisher before enabling logical replication so you can recover if a slot pins WAL and fills the disk. Never drop a replication slot without confirming whether its subscriber still depends on it — a dropped slot means the subscriber must be reseeded from scratch. Keep a human approving every publication change and every slot drop, and alert on slot lag long before pg_wal nears the disk limit.

Why this prompt works

Logical replication fails in production not because the CREATE PUBLICATION syntax is hard, but because engineers treat it as a copy job instead of a WAL-decoding stream gated by a slot. This prompt forces the design to start from that single load-bearing fact: the slot pins WAL on the publisher until the subscriber confirms it, so an unconsumed or lagging slot is a disk-fill incident waiting to happen. By making slot-bloat monitoring a first-class step rather than an afterthought, the output is a pipeline you can operate, not just one you can stand up.

The conflict-handling step is what separates a demo from a CDC system you can trust. Logical replication gives the subscriber no built-in conflict resolution, and the publisher is blind to apply errors downstream — so the prompt insists on idempotent upserts, an unambiguous business key, and a reseed plan up front. That framing also surfaces the REPLICA IDENTITY decision early, which is the most common silent break: UPDATE and DELETE simply do not replicate from a PK-less table unless you pay the FULL row-image cost.

Finally, the staged initial-sync guidance addresses the other classic foot-gun: a single FOR ALL TABLES subscription with copy_data trying to seed a large database at once, spiking publisher load while the slot races to retain WAL for the whole copy. By demanding per-table staging, explicit prerequisites, and a base backup before enabling replication, the guardrails keep a human in the loop on exactly the two actions — publication changes and slot drops — that can cause irreversible data loss.

Postgres Logical Replication and CDC Setup Prompt

Why this prompt works

Related prompts

Postgres Replication Lag Debugging Prompt

Why this prompt works

Related prompts

Postgres Replication Lag Debugging Prompt

Free: the DevOps AI Incident-Triage Cheat Sheet