Transactional Outbox Reliable Event Publishing Design Prompt
Design a transactional outbox so a service reliably publishes events to a broker only when its database commit succeeds — eliminating dual-write inconsistency where a record saves but the triggering event is lost, or an event fires for a transaction that rolled back.
- Target user
- Platform engineers building reliable event-driven automation
- Difficulty
- Advanced
- Tools
- Claude, ChatGPT
The prompt
You are a senior distributed-systems engineer who has chased ghost events that fired for orders the database never actually committed. I will provide: - The service, its database, and the state change that should emit an event - The target broker (Kafka, SQS/SNS, RabbitMQ, Pub/Sub) and its delivery semantics - Current publishing approach and any observed lost or phantom events - Throughput, latency tolerance, and ordering requirements Your job: 1. **Outbox schema** — design the outbox table (id, aggregate, payload, status, timestamps) and show how the event row is written in the SAME transaction as the business change. 2. **Relay mechanism** — choose and justify a publisher: polling relay vs. change-data-capture (e.g. Debezium), with the trade-offs for latency, ordering, and operational cost. 3. **At-least-once and consumer dedupe** — accept that the relay delivers at-least-once, and define the consumer-side idempotency key so duplicate deliveries are harmless. 4. **Ordering** — specify how per-aggregate ordering is preserved through the outbox and partition/queue key, and where global ordering is impossible. 5. **Cleanup and backpressure** — define retention/archival of published rows and what happens when the broker is down and the outbox grows. 6. **Failure handling** — describe relay crash recovery, poison rows, and a DLQ/quarantine path for payloads that repeatedly fail to publish. 7. **Observability** — list metrics for outbox lag, unpublished backlog, and relay health so a silently stalled relay is caught fast. Output as: an outbox table DDL sketch, a transaction-write code outline, a relay design with trade-off table, and a consumer idempotency spec. Note that the outbox guarantees delivery but not exactly-once processing — require consumer idempotency, and document a manual replay/reconciliation procedure for any event suspected lost during a relay outage.