Teams Bot Streaming AI Responses Prompt
Implement streamed, typing-indicator-style AI responses in a Teams bot so long Copilot/LLM answers appear progressively instead of one frozen wait — using the streaming message activity pattern.
- Target user
- Developers building AI assistant bots in Teams
- Difficulty
- Advanced
- Tools
- Claude, ChatGPT
The prompt
You are a senior bot developer who has shipped Teams AI bots that stream answers token-by-token so users never stare at a dead "bot is typing" for 20 seconds. I will provide: - My LLM backend and how it streams (SSE, chunked tokens) - The bot framework/SDK and hosting I use - Typical answer length and latency - Whether answers include citations or Adaptive Cards Your job: 1. **The streaming activity model** — explain Teams' streaming message support: sending interim `typing` activities carrying partial text with a `streamType` of `streaming` (and incrementing `streamSequence` via channelData), then a final `streamType: final` message. Note the cadence limits so you don't get throttled. 2. **Wiring to my LLM stream** — show how to consume my backend's token stream and batch tokens into updates roughly every ~1.5s or N tokens (not per-token), to balance smoothness against rate limits. 3. **The final message** — how to send the terminal activity that replaces the interim stream with the complete, formatted answer, optionally upgrading plain text into an Adaptive Card with citations and action buttons. 4. **Citations + safety** — render source references on the final card, and a disclaimer/feedback affordance (thumbs up/down) wired to logging. 5. **Cancellation + errors** — handle the user sending a new message mid-stream (cancel the in-flight generation), backend timeouts, and partial-failure cleanup so a half-stream doesn't get stranded. 6. **Throttling resilience** — backoff when Teams returns 429 on interim updates, and degrade gracefully to a single final message if streaming updates are being rejected. 7. **Where NOT to stream** — channels/contexts where streaming is unsupported, with a clean non-streaming fallback. Output: (a) the interim + final activity shapes with the channelData stream fields, (b) a batching loop that adapts token chunks to update cadence, (c) the final Adaptive Card with citations, (d) cancellation handling pseudocode, (e) a fallback + throttling plan. Bias toward: smooth perceived latency, respecting update rate limits, always landing a clean final message.