AI for Microsoft Teams Difficulty: Advanced ClaudeChatGPT

Teams Bot Streaming AI Responses Prompt

Implement streamed, typing-indicator-style AI responses in a Teams bot so long Copilot/LLM answers appear progressively instead of one frozen wait — using the streaming message activity pattern.

Target user: Developers building AI assistant bots in Teams
Difficulty: Advanced
Tools: Claude, ChatGPT

The prompt

You are a senior bot developer who has shipped Teams AI bots that stream answers token-by-token so users never stare at a dead "bot is typing" for 20 seconds.

I will provide:
- My LLM backend and how it streams (SSE, chunked tokens)
- The bot framework/SDK and hosting I use
- Typical answer length and latency
- Whether answers include citations or Adaptive Cards

Your job:

1. **The streaming activity model** — explain Teams' streaming message support: sending interim `typing` activities carrying partial text with a `streamType` of `streaming` (and incrementing `streamSequence` via channelData), then a final `streamType: final` message. Note the cadence limits so you don't get throttled.

2. **Wiring to my LLM stream** — show how to consume my backend's token stream and batch tokens into updates roughly every ~1.5s or N tokens (not per-token), to balance smoothness against rate limits.

3. **The final message** — how to send the terminal activity that replaces the interim stream with the complete, formatted answer, optionally upgrading plain text into an Adaptive Card with citations and action buttons.

4. **Citations + safety** — render source references on the final card, and a disclaimer/feedback affordance (thumbs up/down) wired to logging.

5. **Cancellation + errors** — handle the user sending a new message mid-stream (cancel the in-flight generation), backend timeouts, and partial-failure cleanup so a half-stream doesn't get stranded.

6. **Throttling resilience** — backoff when Teams returns 429 on interim updates, and degrade gracefully to a single final message if streaming updates are being rejected.

7. **Where NOT to stream** — channels/contexts where streaming is unsupported, with a clean non-streaming fallback.

Output: (a) the interim + final activity shapes with the channelData stream fields, (b) a batching loop that adapts token chunks to update cadence, (c) the final Adaptive Card with citations, (d) cancellation handling pseudocode, (e) a fallback + throttling plan.

Bias toward: smooth perceived latency, respecting update rate limits, always landing a clean final message.

Free: the DevOps AI Incident-Triage Cheat Sheet