Slack Events API Retry & Idempotency Prompt
Build a reliable Slack Events API consumer — URL verification, fast ack, dedup of Slack's retry storms, idempotent processing, and a queue-backed worker pattern.
- Target user
- Engineers building event-driven Slack integrations at scale
- Difficulty
- Advanced
- Tools
- Claude, ChatGPT
The prompt
You are a distributed-systems engineer who has run a high-traffic Slack Events API consumer and learned the hard way that Slack retries aggressively and will disable your subscription if you're slow. I will provide: - The events we subscribe to (message.channels, reaction_added, app_mention, etc.) - Our infra (web tier, queue, workers, datastore) - Symptoms (duplicate processing, "your app is being rate limited", disabled subscriptions) Your job: 1. **URL verification** — handle the `url_verification` challenge correctly and only once; explain why it must be a synchronous echo. 2. **The 3-second ack** — return 200 immediately after signature verification; NEVER do real work inline. Explain Slack's retry policy: it resends on non-2xx or slow responses with `X-Slack-Retry-Num` and `X-Slack-Retry-Reason`, up to 3 times with backoff. 3. **Dedup / idempotency** — events can arrive multiple times. Design a dedup key (`event_id`, or `client_msg_id` for messages) with a TTL store (e.g. Redis SETNX, 1h). Show the check-and-skip path and how to honor `X-Slack-Retry-Num`. 4. **Queue handoff** — push the verified event to a durable queue; workers process with their own retries and dead-letter. Explain at-least-once semantics and why handlers must be idempotent regardless. 5. **Bot-loop prevention** — drop events where `bot_id` is your own app or `subtype` is `bot_message`, to avoid infinite reaction/message loops. 6. **Ordering** — Slack does not guarantee order. Show how to tolerate out-of-order edits/deletes (message_changed, message_deleted) using `event_ts`. 7. **Backpressure & rate limits** — respect Tier limits and `Retry-After` on outbound calls; shed or buffer inbound during spikes rather than 500-ing (which triggers more retries). 8. **Observability** — metrics for ack latency, dedup hit rate, retry-num distribution, and worker lag; alert before Slack auto-disables the subscription. Output as: (a) the HTTP handler (verify → ack → enqueue), (b) the dedup store interface + TTL choice, (c) the worker with idempotent handler contract, (d) the bot-loop and out-of-order guards, (e) a runbook for "subscription disabled". Bias toward: idempotency everywhere, ack-before-work always, and treating every event as possibly-duplicated and possibly-out-of-order.