Slack Socket Mode Reconnect & Event Ordering Prompt
Design resilient Socket Mode reconnection, in-flight ack handling, and event-ordering guarantees so an ops bot survives WebSocket churn without dropping or duplicating events.
- Target user
- Platform engineers running Socket Mode bots at scale
- Difficulty
- Advanced
- Tools
- Claude, ChatGPT, Cursor
The prompt
You are a senior platform engineer who has run Socket Mode bots through Slack's periodic WebSocket recycling, regional failovers, and load spikes without losing events. I will provide: - Runtime ([Bolt JS / Bolt Python / custom WebSocket client]) - Event volume and the events I subscribe to - Whether I run a single connection or multiple ([app-level token allows up to 10 connections]) - My current symptoms ([duplicate handling / missed events / disconnect storms]) Your job: 1. **Reconnect lifecycle** — map the Socket Mode handshake: `apps.connections.open`, the `hello` frame, Slack's `disconnect` frames (`warning` and `refresh_requested`), and the buffer window before the old socket closes. Show how to open the new socket BEFORE draining the old one so no events are lost. 2. **Ack discipline** — Socket Mode requires acking each envelope by `envelope_id` within 3 seconds or Slack retries it. Show how to ack immediately, then process asynchronously, and how this interacts with retries. 3. **Duplicate handling** — because of acks-then-retries and overlapping sockets during reconnect, events can arrive more than once. Provide an idempotency strategy keyed on `event_id` / `envelope_id` with a short-TTL dedup store. 4. **Ordering reality** — explain that Socket Mode does NOT guarantee strict ordering across reconnects; design handlers to be order-tolerant or sequence on `event_ts`. 5. **Multi-connection fan-out** — when to run multiple connections for throughput, and how that complicates dedup (events fan across sockets). 6. **Backpressure** — what to do when downstream processing falls behind the socket: queue, shed, or signal. Output as: (a) reconnect lifecycle diagram, (b) overlap-window pseudocode, (c) ack-then-process snippet, (d) idempotency/dedup design, (e) ordering-tolerance guidance, (f) failure runbook (disconnect storm, ack timeout, queue overflow). Safety: never assume exactly-once delivery; design for at-least-once with idempotent handlers.
Why this prompt works
Most Socket Mode tutorials stop at “it connects and you get events,” which is exactly where the production problems begin. Slack recycles WebSocket connections on its own schedule and sends disconnect frames with little ceremony, and a naive client that closes the old socket before the new one is live will silently drop whatever arrived in the gap. This prompt makes the overlap window the centerpiece — the model has to reason about opening the replacement connection first, which is the difference between a bot that survives reconnects and one that loses events every few hours.
The ack-versus-process distinction is the other thing this prompt forces. Socket Mode’s 3-second ack requirement means any handler that does real work inline will eventually blow the deadline and trigger retries, which then arrive as duplicates. By demanding an ack-then-process pattern plus an explicit idempotency store, the output confronts the at-least-once reality of the transport instead of pretending delivery is exactly-once. That framing alone eliminates a whole class of “why did this deploy fire twice” incidents.
Because the prompt also asks about ordering and multi-connection fan-out, it pushes the AI past the single-socket happy path into the genuinely hard territory where throughput scaling makes dedup harder. The result is a design a human can review against their actual event volume, with a failure runbook that turns vague WebSocket flakiness into named, handleable conditions.