Handle Microsoft Graph Throttling and 429s in Teams

The first time I watched a Teams provisioning job fall over in production, it wasn’t a bug in my code. It was a wall of HTTP 429 Too Many Requests responses that arrived the instant my loop got fast enough to be useful. I had written something that worked beautifully against a test tenant of twelve users, then pointed it at a real org with four thousand, and Microsoft Graph politely but firmly told me to slow down. If you automate anything against Teams, channels, or the users and groups behind them, you are going to meet the throttle. The only question is whether you meet it gracefully or whether your job dies at 2 a.m. and pages someone.

This post is the playbook I wish I’d had: how throttling actually works in Graph, how to read the signals it sends you, and how to write a retry layer that survives a 429 storm instead of amplifying it.

Why Graph throttles at all

Microsoft Graph is a shared front door to a multi-tenant backend. Exchange Online, SharePoint, the directory, and the Teams services all sit behind it, and each one enforces its own limits to protect every other tenant on the same infrastructure. Throttling is not a punishment; it is back-pressure. When you exceed a per-app, per-tenant, or per-resource budget inside a sliding time window, Graph stops doing work and starts returning 429 Too Many Requests.

The critical mental shift is that throttling is a normal, expected response code, not an error. A well-behaved client treats 429 the way TCP treats congestion: a signal to slow down, not a reason to crash. If your automation does not have a deliberate strategy for it, it has an accidental one, and accidental ones are always worse.

Read the Retry-After header — it’s not a suggestion

When Graph throttles you, the response is rarely silent. The single most important thing in a 429 is the Retry-After header, an integer number of seconds you are expected to wait before trying again. A throttled response typically looks like this:

HTTP/1.1 429 Too Many Requests
Retry-After: 12
Content-Type: application/json

{
  "error": {
    "code": "TooManyRequests",
    "message": "Application is over its MailboxConcurrency limit."
  }
}

That Retry-After: 12 is Graph telling you exactly how long to back off. Honoring it is the difference between recovering smoothly and getting your delays escalated. If you ignore it and keep hammering, the service can extend the window and, in extreme cases, your app can be throttled more aggressively across the tenant.

So rule one: if a Retry-After header is present, wait at least that long, full stop. Only fall back to your own backoff math when the header is missing.

The limits you actually need to know

Graph throttling is service-specific, and the numbers matter when you’re sizing a job. A few that bite Teams automation most often:

Per-app per-tenant ceilings. Many Graph resources enforce a request-count limit per application per tenant within a rolling window (commonly measured over a few minutes). Spread the same volume across more tenants and you have more headroom; concentrate it on one and you hit the wall sooner.
Exchange / mailbox limits. Anything that touches Outlook-backed data — including some Teams chat and calendar operations — runs into Exchange’s MailboxConcurrency limit, which caps concurrent requests per mailbox (historically around 4). Fan out across many mailboxes and you’re fine; pound one mailbox with parallel calls and you’ll be throttled almost immediately.
Teams-specific service limits. The Teams services apply their own per-app and per-user limits on messaging, channel, and membership operations. Bulk channel creation and bulk membership changes are classic offenders.

The practical takeaway is that there is no single global number to design against. Your effective limit depends on which resource you touch and how concentrated your calls are on a single mailbox, user, or tenant.

Pro Tip: Don’t try to memorize exact thresholds — Microsoft tunes them and they vary by service. Design so that any single 429 is recoverable, and your code stays correct even when the published numbers change underneath you.

A retry helper that backs off with jitter

Here is the core pattern: honor Retry-After when it’s there, otherwise use exponential backoff with full jitter so a fleet of clients doesn’t retry in lockstep and create a thundering herd. This is a TypeScript helper I reach for, framework-agnostic and built on fetch:

interface RetryOptions {
  maxRetries?: number;
  baseDelayMs?: number;
  maxDelayMs?: number;
}

const sleep = (ms: number) => new Promise((r) => setTimeout(r, ms));

export async function graphFetch(
  url: string,
  init: RequestInit,
  opts: RetryOptions = {}
): Promise<Response> {
  const { maxRetries = 5, baseDelayMs = 1000, maxDelayMs = 60_000 } = opts;

  for (let attempt = 0; attempt <= maxRetries; attempt++) {
    const res = await fetch(url, init);

    // Throttling (429) and transient server errors (503/504) are retryable.
    if (![429, 503, 504].includes(res.status)) {
      return res;
    }
    if (attempt === maxRetries) {
      return res; // give the caller the final throttled response to handle
    }

    // 1) Always prefer the server's Retry-After if present.
    const retryAfter = res.headers.get("Retry-After");
    let delay: number;
    if (retryAfter) {
      delay = Number(retryAfter) * 1000;
    } else {
      // 2) Otherwise exponential backoff with FULL jitter.
      const expo = Math.min(maxDelayMs, baseDelayMs * 2 ** attempt);
      delay = Math.random() * expo;
    }

    await sleep(delay);
  }

  // Unreachable, but keeps the type checker happy.
  throw new Error("graphFetch: exhausted retries");
}

Three things make this safe in practice. First, it treats Retry-After as authoritative. Second, it uses full jitter (Math.random() * expo) rather than fixed exponential delays, so concurrent workers spread their retries across the window instead of synchronizing into a spike. Third, it caps the delay so a runaway backoff doesn’t stall your pipeline indefinitely. Only retry idempotent operations blindly; for writes, make sure a retried POST won’t create duplicates (use an idempotency key or check-then-create).

This is exactly the kind of code where an AI assistant earns its keep. Tools like Claude or GitHub Copilot will draft a backoff helper in seconds — they’re fast junior engineers, and this is well-trodden ground. But treat the output the way you’d treat a junior’s first PR: review it before it ever touches a tenant. I’ve seen generated retry loops that retried non-idempotent writes, swallowed the final error, or quietly dropped the Retry-After header in favor of a hardcoded setTimeout. Fast drafting, mandatory human review. If you want a starting point, our prompt-packs include scaffolds for resilient API clients, and the prompts library has review checklists you can paste straight into a chat.

$batch is faster — and a trap if you don’t read inner responses

The $batch endpoint lets you bundle up to 20 requests into one round trip, which is a huge win for bulk Teams work like creating channels or syncing membership. But there’s a subtlety that catches almost everyone: the outer batch request can return 200 OK while individual sub-requests inside it return 429 independently.

{
  "responses": [
    { "id": "1", "status": 200, "body": { "id": "channel-a" } },
    {
      "id": "2",
      "status": 429,
      "headers": { "Retry-After": "8" },
      "body": { "error": { "code": "TooManyRequests" } }
    }
  ]
}

A naive client checks the HTTP status of the whole batch, sees 200, and assumes everything succeeded — silently losing the throttled operations. You must iterate the responses array, find any entry with status: 429, read its per-response Retry-After, wait, and resubmit only the failed sub-requests in a fresh batch. Don’t resend the ones that already succeeded; that just burns more of your budget. Batching reduces round trips, but it does not exempt any individual operation from throttling.

Pro Tip: When you build a batch, keep a map from sub-request id back to the original operation. When a response comes back 429, you can reconstruct exactly which items to retry without guessing — and your logs stay debuggable.

Detect throttling before it hurts

Reactive retries keep you alive; proactive detection keeps you fast. A few signals worth instrumenting:

Track your own request rate per resource. If you know roughly where the wall is, throttle yourself with a client-side limiter so you glide just under it instead of bouncing off it. A token-bucket limiter in front of graphFetch is often more effective than any amount of retry tuning.
Watch for early 429s as a leading indicator. A rising rate of throttled responses means you’re at the edge. Feed that into your monitoring and alerting so a human sees the trend before the job fails outright.
Log Retry-After values over time. Climbing wait times mean the service is escalating; that’s your cue to reduce concurrency, not to retry harder.
Respect RateLimit headers when present. Some Graph services emit limit/remaining/reset hints. Reading them lets you slow down preemptively rather than waiting for the 429.

When you do hit a throttling incident, treat it like any other production event: capture the timeline, the affected operations, and the recovery. Our incident-response workflow is built for exactly these post-mortems, and a quick code-review pass on the retry layer before deploy catches the silent-failure bugs that batching loves to hide.

One more word on security

Because throttling code so often gets drafted with AI help, it’s worth saying plainly: never hand the model real tenant credentials, app secrets, or bearer tokens. Paste redacted samples, not live values. If your automation receives Graph change notifications, verify the clientState and validation tokens on every webhook, and confirm any Teams connector or incoming-webhook URL is treated as a secret. An assistant can write the verification logic, but it cannot be trusted to know what is safe to expose — that judgment stays with you.

Wrapping up

Graph throttling isn’t an edge case you’ll occasionally trip over; at any real scale it’s the steady state, and the clients that thrive are the ones that expect it. Honor Retry-After, back off with jitter, inspect every inner response in a batch, and instrument yourself so you see the wall before you hit it. Let your AI tools draft the retry helper fast — then review it like a human who’ll be the one getting paged. Do that, and a 429 storm becomes a non-event instead of an outage. For more in this vein, browse the rest of the Microsoft Teams category.

slug: handle-microsoft-graph-throttling-and-429s-in-teams-automation

Handle Microsoft Graph Throttling and 429s in Teams Automation