You are a senior platform engineer who has scaled Bolt apps through interaction bursts — a SEV1 where 40 engineers click "Acknowledge" within seconds — without the app falling over. I will provide: - Runtime ([Bolt JS single process / Bolt Python / clustered / serverless]) - The expensive work each listener does (DB writes, downstream API calls, LLM calls) - Observed failure mode ([rate-limit 429s / event-loop starvation / OOM / dropped acks]) - Whether I can add an external queue (Redis, SQS) Your job: 1. **Decouple ack from work** — confirm acks happen immediately, then enqueue the work; concurrency control applies to the post-ack path, not the ack. 2. **Concurrency model** — design a bounded worker pool / semaphore so N units of expensive work run at once; explain how to size N against Slack's per-method rate limits and your downstream limits. 3. **Queue choice** — in-process queue (simple, lost on restart) vs external durable queue (survives restart, enables horizontal scale). Tie the choice to whether dropped work is acceptable. 4. **Rate-limit coordination** — share a token-bucket / limiter ACROSS workers so concurrent jobs don't collectively exceed Slack's Tier limits; show backoff on 429 with `Retry-After`. 5. **Burst handling** — coalesce duplicate work (10 people ack the same incident → one state update), and shed or defer low-priority work under pressure. 6. **Backpressure & observability** — queue-depth metrics, oldest-job age, and what to alert on. Output as: (a) ack-then-enqueue flow, (b) bounded-worker design with sizing math, (c) in-process vs durable queue decision, (d) shared rate limiter + 429 backoff, (e) burst coalescing, (f) queue metrics and alerts. Safety: bound concurrency explicitly; an unbounded post-ack path will exhaust connections and trip Slack rate limits during exactly the incidents you most need the bot.

Why this prompt works

Bolt apps behave fine until an incident, which is precisely when forty people interact with your bot in the same ten seconds and the unbounded post-ack work path exhausts connections, starves the event loop, and trips Slack’s rate limits — taking down the bot at the moment it matters most. This prompt centers concurrency control on the post-ack path, keeping the fast ack sacred while putting a bound on the expensive work behind it. That separation is the architectural move that lets a bot survive its own success.

The shared-rate-limiter requirement is the subtle, load-bearing detail. Engineers add a per-worker limiter, feel safe, and then watch a burst of concurrent jobs collectively blow Slack’s per-method Tier limit because each worker was metering only itself. By demanding a limiter shared across the whole pool plus Retry-After-aware backoff, the prompt forces a design that meters aggregate throughput, which is the only thing Slack’s API actually cares about.

Burst coalescing is what turns this from a generic queue design into a Slack-ops design. During an incident, ten acknowledgments for one alert should collapse into a single state update, not ten redundant API calls competing for the same rate budget. Asking the model to coalesce duplicate work and shed low-priority load under pressure produces a system a human can size against real Tier limits and real downstream capacity — with queue-depth and oldest-job metrics so the backpressure is observable rather than discovered via outage.

Why this prompt works

Free: the DevOps AI Incident-Triage Cheat Sheet