AI for Microsoft Teams Difficulty: Intermediate ClaudeChatGPT

Teams Webhook Routing for Monitoring Alerts Prompt

Route Prometheus / Datadog / CloudWatch alerts into Microsoft Teams channels using Incoming Webhooks + a small translator service — severity routing, throttling, retries, dedup.

Target user: Platform engineers wiring monitoring tools to Teams
Difficulty: Intermediate
Tools: Claude, ChatGPT

The prompt

You are a senior platform engineer who has built reliable alert pipelines from multiple monitoring tools into Microsoft Teams, surviving Teams webhook throttling, brittle payload schemas, and retries.

I will provide:
- Source tools (Alertmanager, Datadog, CloudWatch, Sentry, New Relic, custom)
- Existing Teams channels + their connector webhooks
- Severity → channel routing rules
- Volume estimates (alerts/minute peak)
- Reliability requirements

Your job:

1. **Architecture choice** — direct webhook from source to Teams vs translator service in between. Recommend the translator pattern for most cases. Show why:
   - Source payloads don't match Adaptive Card schema
   - Need consistent formatting across sources
   - Throttling, retries, and dedup belong in your service, not in Teams
   - Audit + replay only possible with a service in the middle

2. **Translator service** — small HTTP service (Node/Go/Python) with:
   - **Source-specific parsers** — one per source, normalize to canonical IncidentEvent
   - **Routing matrix** — canonical event → target channel webhook URL
   - **Card builder** — IncidentEvent → Adaptive Card JSON
   - **Throttler** — Teams Incoming Webhook limit is ~4 calls/sec per webhook. Implement a token-bucket per channel.
   - **Retrier** — 5xx retries with backoff; on 429, respect Retry-After
   - **Deduper** — by alert fingerprint + window
   - **Audit log** — every inbound event + outbound result

3. **Per-source quirks**:
   - **Alertmanager** — groups alerts; payload has `commonLabels` + `alerts[]`; `status: firing|resolved`
   - **Datadog** — Markdown-flavored payload; webhook custom payload variables
   - **CloudWatch** — SNS-wrapped JSON, double-encoded
   - **Sentry** — payload per issue, includes stack traces (sanitize before Teams!)
   - **GitHub Actions** — for workflow failures

4. **Adaptive Card recommendations** — see severity styling section in the Adaptive Card design prompt. Reuse the same card template across sources for consistency.

5. **Routing matrix examples**:
   - SEV1 → `#incidents-active` + page
   - SEV2 → `#alerts-prod-<service>` + notify
   - Warnings → `#alerts-low-signal` (mute by default)

6. **Failure modes & mitigations**:
   - Teams webhook 4xx → log + alert ops (NOT to Teams!)
   - Teams webhook 5xx / timeout → retry up to N
   - Teams webhook deprecated (Microsoft is phasing out Office 365 connectors mid-2025) → migrate to Workflows / Power Automate-backed webhooks. Plan the migration NOW.
   - Translator service down → fallback path: email-on-call

7. **Microsoft's connector deprecation** — Office 365 Incoming Webhooks are deprecated. Recommend migrating to Power Automate "When a Teams webhook request is received" or a Teams bot with Graph API for long-term reliability.

8. **Observability** — metrics: alerts received per source, cards posted per channel, throttling events, retry counts, dedup hits, end-to-end latency p50/p95.

Output as: (a) translator service architecture, (b) per-source parser outline, (c) routing matrix YAML, (d) Adaptive Card template, (e) Teams connector deprecation migration plan, (f) observability dashboard.

Bias toward: a translator service in your control, future-proofing for the Office 365 connector deprecation, observability of the pipeline itself.

Free: the DevOps AI Incident-Triage Cheat Sheet