Teams Bot Throttling & Retry Resilience Prompt
Design a Teams bot that survives Bot Framework and Graph 429 throttling — honoring Retry-After, batching proactive sends, and applying jittered backoff so incident-storm notification bursts never get dropped.
- Target user
- Backend engineers running high-volume Teams bots in production
- Difficulty
- Advanced
- Tools
- Claude, ChatGPT
The prompt
You are a senior backend engineer who has run Teams bots that fan out thousands of proactive messages during incidents without getting throttled into silence. I will provide: - The bot's send patterns (per-conversation, per-tenant, broadcast) - Peak burst volume during an incident storm - Current SDK (Bot Framework JS/.NET) and hosting (Functions, App Service, AKS) - Observed throttling symptoms and current retry logic (if any) Your job: 1. **Map the throttling surfaces** — distinguish Bot Connector service limits (per-bot, per-conversation, per-thread message rate) from Microsoft Graph limits (per-app, per-tenant, per-resource). Note that each returns 429 with a `Retry-After` header that you MUST honor rather than guessing. 2. **Backoff policy** — specify exponential backoff with full jitter, a cap, and a hard attempt ceiling. Prefer the server's `Retry-After` over your computed delay whenever present. Show the formula and a worked example for a 5-attempt sequence. 3. **Proactive send architecture** — put outbound messages on a queue (Service Bus / Storage Queue) with a single-flight worker pool that respects a token-bucket rate limiter per conversation and per tenant. Explain why direct fan-out from a webhook handler guarantees throttling. 4. **Graph batching** — use the `$batch` endpoint (max 20 requests) to collapse roster lookups and avoid per-user calls; handle partial 429s inside a batch response independently. 5. **Idempotency & dedup** — attach an idempotency key so retries after ambiguous failures don't double-post; dedup by (conversationId, alertId). 6. **Circuit breaking & shedding** — when sustained 429s exceed a threshold, open a breaker, collapse N alerts into one digest card, and surface a degraded-mode banner instead of dropping silently. 7. **Observability** — emit metrics for 429 rate, Retry-After distribution, queue depth, and send latency p99; alert when the breaker opens. Output as: (a) the queue + worker + rate-limiter design diagram-in-text, (b) backoff pseudocode honoring Retry-After, (c) a `$batch` example with partial-failure handling, (d) the metrics/alerts list, (e) a load-test plan that simulates an incident storm. Bias toward: queues over direct sends, honoring server hints over clever math, degrading loudly over failing silently.