Proactive Messaging From Teams Bots Without Getting Rate

Most Teams bots are reactive: you talk, they answer. The genuinely useful ones are proactive — they message you first. “Your deploy finished.” “You’re on-call in 30 minutes.” “This PR has been waiting 4 hours.” But proactive messaging is also where I’ve watched more bots fall over than anywhere else, because the moment you fan a message out to 200 engineers, you meet the Teams throttling limits face-first. Here’s how to do it without getting 429’d into oblivion.

The thing you must capture: the conversation reference

A bot can’t message a user out of nowhere. It can only continue a conversation it already knows about. So the first time the bot sees a user — when they install it, or send any message — you grab and persist the conversation reference:

const { TurnContext } = require("botbuilder");

async onInstallOrMessage(context) {
  const ref = TurnContext.getConversationReference(context.activity);
  // ref contains serviceUrl, conversation.id, user.aadObjectId, tenant
  await refStore.put(context.activity.from.aadObjectId, ref);
}

Store it keyed by the AAD object ID. That reference is your permission slip to message that person later. No reference, no proactive message — there is no API to cold-DM a user.

Sending the proactive message

Later, from a webhook or a cron job, you load the reference and continueConversation:

async function notify(aadId, card) {
  const ref = await refStore.get(aadId);
  if (!ref) return; // never installed the bot

  await adapter.continueConversationAsync(
    process.env.MICROSOFT_APP_ID,
    ref,
    async (context) => {
      await context.sendActivity({ attachments: [CardFactory.adaptiveCard(card)] });
    }
  );
}

The serviceUrl inside the reference is region-specific and can change — always use the one from the latest stored reference, and refresh it whenever the user interacts again. Hardcoding a service URL is a slow-burn bug that breaks for users in another datacenter.

Now the part that actually matters: throttling

Teams enforces per-bot and per-thread rate limits, and they’re tighter than people expect. Roughly: about one message per second per conversation, and a per-app ceiling across all conversations. Blow past it and you get HTTP 429 with a Retry-After header. Three rules keep you alive:

1. Respect Retry-After. Don’t guess a backoff — the response tells you exactly how long to wait. Wrap every send:

async function sendWithBackoff(fn, attempt = 0) {
  try {
    return await fn();
  } catch (err) {
    if (err.statusCode === 429 && attempt < 5) {
      const wait = (err.retryAfter ?? 2 ** attempt) * 1000;
      await sleep(wait);
      return sendWithBackoff(fn, attempt + 1);
    }
    throw err;
  }
}

2. Cap concurrency. Fanning out to 500 users with Promise.all is how you self-DDoS. Use a small concurrency pool — 5 to 10 in flight — and let it drain:

const pLimit = require("p-limit");
const limit = pLimit(8);
await Promise.all(
  recipients.map((u) => limit(() => sendWithBackoff(() => notify(u.aadId, card))))
);

3. Queue, don’t loop. For anything beyond a handful of recipients, push the sends onto a real queue (a durable job queue, not an in-memory array). If your process restarts mid-fan-out, an in-memory loop loses everyone who hadn’t been notified yet. A queue with idempotency keys lets you resume exactly where you stopped.

Batch where you can: send to a channel, not 50 DMs

The cheapest proactive message is the one you don’t fan out. If 50 engineers need to know a deploy failed, post one adaptive card to the team channel and @mention the relevant on-call, instead of sending 50 individual DMs. You spend one rate-limit token instead of fifty, and the message is visible to everyone who needs context. Reserve DMs for genuinely personal nudges (“you are on-call in 30 minutes”).

Deduplicate, or you’ll cry wolf

Proactive systems hooked to alerting will absolutely double-fire. An alert flaps, your webhook receives the same event twice, two engineers get paged for one thing. Put an idempotency key on every notification — a hash of (eventId, recipient) — and check a short-TTL cache before sending. A bot that pings you three times for one deploy gets muted within a week, and a muted bot is a dead bot.

Just because you can message someone at 3am doesn’t mean you should — unless they’re on-call. Two cheap manners features that massively improve trust:

Honor on-call state. Non-urgent nudges (PR waiting, build slow) should respect the recipient’s working hours; route urgent ones (you’re paged) immediately regardless.
Let people opt out per category. Store per-user preferences (“deploy notifications: on, PR nudges: off”). A bot that respects “stop telling me about builds” earns the right to tell you about incidents.

Where this fits

Proactive messaging is what turns a bot from a command line into a teammate — but the engineering is in the delivery layer, not the message. Capture conversation references on every interaction, send through a queue with backoff and capped concurrency, deduplicate ruthlessly, and prefer one channel post over fifty DMs. For ready-made adaptive card payloads and backoff snippets, see the prompt library, and find more bot patterns in the Microsoft Teams category.

Rate limits and serviceUrl behavior change over time; verify current Teams throttling guidance before scaling a fan-out.

Proactive Messaging From Teams Bots Without Getting Rate Limited

The thing you must capture: the conversation reference

Sending the proactive message

Now the part that actually matters: throttling

Batch where you can: send to a channel, not 50 DMs

Deduplicate, or you’ll cry wolf

Where this fits

Download the Free 500-Prompt DevOps AI Toolkit

The thing you must capture: the conversation reference

Sending the proactive message

Now the part that actually matters: throttling

Batch where you can: send to a channel, not 50 DMs

Deduplicate, or you’ll cry wolf

Quiet hours and consent

Where this fits

Download the Free 500-Prompt DevOps AI Toolkit