Stream AI Responses in Teams Bots with Typing and Updates
A bot that stalls for ten seconds feels broken. Use typing indicators and message updates to stream LLM responses into Teams so the conversation feels alive.
- #microsoft-teams
- #bot-framework
- #ai
- #streaming
- #chatops
The first AI-powered ops bot I shipped to Teams worked, but it felt dead. Someone would ask “why is checkout latency up?”, the bot would go silent for eight to twelve seconds while the model thought and the runbook RAG ran, and then a wall of text appeared. In that silence, half the users assumed it had crashed and asked again, firing a second request. The bot was fine — it just gave no feedback. Streaming feedback is what makes an AI bot feel responsive even when the underlying model is slow.
This guide covers the techniques for keeping a Teams conversation feeling alive during a slow LLM call: typing indicators, incremental message updates to stream tokens, and the throttling constraints that keep you from getting rate-limited. I drafted the streaming loop with an AI assistant, and the model gladly wrote a version that called the update API on every single token — which is a fast track to getting throttled out of the channel. Treat AI like a fast junior engineer: it gets the shape right and the rate-limiting wrong, so you review the throttling before this touches a real tenant. And the bot’s own credentials and the model’s API key never go into a prompt.
The cheapest win: typing indicators
Before you stream anything, send a typing activity. It costs almost nothing and immediately tells the user “I heard you, I’m working.” In the Bot Framework SDK:
async function showTyping(context) {
await context.sendActivity({ type: "typing" });
}
A typing indicator lasts only a few seconds in the Teams client, so for a long operation you send one periodically — every few seconds — until the real reply is ready. That alone eliminated most of the duplicate-question problem in my bot. Users will wait far longer if they can see the bot is doing something.
async function withTyping(context, work) {
const timer = setInterval(() => context.sendActivity({ type: "typing" }), 3000);
try {
return await work();
} finally {
clearInterval(timer); // always stop, even on error
}
}
The finally matters — if the work throws and you never clear the interval, the bot types forever like an over-caffeinated intern.
Streaming tokens by updating a message
Typing tells the user something is happening; streaming shows them what. The pattern is: send an initial placeholder message, capture its activity ID, then call updateActivity to replace its text as tokens arrive. Teams supports updating a previously sent message in place.
async function streamReply(context, tokenStream) {
// 1. Send a placeholder and remember its id.
const sent = await context.sendActivity({ type: "message", text: "…" });
const activityId = sent.id;
let buffer = "";
let lastUpdate = 0;
for await (const token of tokenStream) {
buffer += token;
const now = Date.now();
// 2. Throttle updates: at most ~1 per 1.5s, not per token.
if (now - lastUpdate > 1500) {
await context.updateActivity({ id: activityId, type: "message", text: buffer });
lastUpdate = now;
}
}
// 3. Final flush so the last tokens always land.
await context.updateActivity({ id: activityId, type: "message", text: buffer });
}
The throttle on step 2 is the whole ballgame. Updating on every token sends dozens of API calls per second and Teams will rate-limit you, at which point your “smooth streaming” stutters and stalls — worse than not streaming at all. Batch into roughly one update every one to two seconds. The final flush after the loop guarantees the last partial batch lands even if it arrived inside the throttle window.
Pro Tip: Teams has been rolling out native AI-streaming support via streaming-specific activities and channelData flags, which the Teams AI Library wraps for you. If you are on a recent SDK, prefer the library’s built-in streaming over hand-rolled updateActivity calls — it handles the throttling and the “informative loading” states for you. Confirm support on your target clients before relying on it.
Respect the rate limits
Proactive and update activities count against per-bot and per-conversation rate limits. The streaming loop above is gentle because it batches, but if your bot is streaming to many conversations at once, the aggregate matters. Build in backoff: if an update returns a 429, read the Retry-After header and wait before the next update rather than hammering. This is the same discipline as any Teams bot that sends a lot of proactive messages — streaming just makes it easy to forget because the volume is per-conversation rather than fan-out.
async function safeUpdate(context, activity) {
try {
await context.updateActivity(activity);
} catch (e) {
if (e.statusCode === 429) {
const wait = (e.retryAfter || 2) * 1000;
await new Promise((r) => setTimeout(r, wait));
await context.updateActivity(activity); // one retry after backoff
} else {
throw e;
}
}
}
Closing with a card, not just text
Streaming raw text is great for the answer, but ops answers often want structure — a runbook link, a “open incident” button, related metrics. So I stream the prose into a text message for responsiveness, then send a final Adaptive Card with the actionable bits once the answer is complete. The user gets immediate feedback during generation and a clean, interactive result at the end. I route the answer quality and the “should this open an incident” call through the same logic my incident-response dashboard uses, with a human in the loop for anything that would actually page someone.
Where AI helps and the lines it must not cross
The streaming loop, the typing-interval helper, and the backoff wrapper are all boilerplate I let Claude or Cursor draft, then review specifically for the throttling — the part the model gets wrong by default. The starter prompts in my prompt packs include a Teams streaming skeleton, and I tune the batch interval in the prompt workspace. What stays out of every prompt: the bot’s app credentials and the LLM API key. The model helps me write the code that calls those; it never sees the secrets themselves, and a human reviews the rate-limit handling before it ships to the tenant.
Conclusion
A slow AI bot feels broken only if it is silent. Typing indicators tell users you are working, message updates stream the answer as it generates, and disciplined throttling — roughly one update per one to two seconds with 429 backoff — keeps Teams from rate-limiting you. Finish with an Adaptive Card for the actionable bits. Let AI scaffold the loop, review the throttling yourself, and keep credentials out of prompts. More Teams bot patterns are in the Microsoft Teams category.
Download the Free 500-Prompt DevOps AI Toolkit
500 battle-tested, copy-paste AI prompts engineered by a senior systems engineer — every one with fill-in placeholders and safety/back-out notes. Drop your email and it's yours.
- 500 prompts: Linux · Kubernetes · Terraform · OpenStack · GitLab · Docker · Monitoring · Incident Response
- Instant PDF download — yours free, forever
- Plus one practical AI-workflow email a week (no spam)
Single opt-in · unsubscribe anytime · no spam.