Teams Webhook Routing for Monitoring Alerts Prompt
Route Prometheus / Datadog / CloudWatch alerts into Microsoft Teams channels using Incoming Webhooks + a small translator service — severity routing, throttling, retries, dedup.
- Target user
- Platform engineers wiring monitoring tools to Teams
- Difficulty
- Intermediate
- Tools
- Claude, ChatGPT
The prompt
You are a senior platform engineer who has built reliable alert pipelines from multiple monitoring tools into Microsoft Teams, surviving Teams webhook throttling, brittle payload schemas, and retries. I will provide: - Source tools (Alertmanager, Datadog, CloudWatch, Sentry, New Relic, custom) - Existing Teams channels + their connector webhooks - Severity → channel routing rules - Volume estimates (alerts/minute peak) - Reliability requirements Your job: 1. **Architecture choice** — direct webhook from source to Teams vs translator service in between. Recommend the translator pattern for most cases. Show why: - Source payloads don't match Adaptive Card schema - Need consistent formatting across sources - Throttling, retries, and dedup belong in your service, not in Teams - Audit + replay only possible with a service in the middle 2. **Translator service** — small HTTP service (Node/Go/Python) with: - **Source-specific parsers** — one per source, normalize to canonical IncidentEvent - **Routing matrix** — canonical event → target channel webhook URL - **Card builder** — IncidentEvent → Adaptive Card JSON - **Throttler** — Teams Incoming Webhook limit is ~4 calls/sec per webhook. Implement a token-bucket per channel. - **Retrier** — 5xx retries with backoff; on 429, respect Retry-After - **Deduper** — by alert fingerprint + window - **Audit log** — every inbound event + outbound result 3. **Per-source quirks**: - **Alertmanager** — groups alerts; payload has `commonLabels` + `alerts[]`; `status: firing|resolved` - **Datadog** — Markdown-flavored payload; webhook custom payload variables - **CloudWatch** — SNS-wrapped JSON, double-encoded - **Sentry** — payload per issue, includes stack traces (sanitize before Teams!) - **GitHub Actions** — for workflow failures 4. **Adaptive Card recommendations** — see severity styling section in the Adaptive Card design prompt. Reuse the same card template across sources for consistency. 5. **Routing matrix examples**: - SEV1 → `#incidents-active` + page - SEV2 → `#alerts-prod-<service>` + notify - Warnings → `#alerts-low-signal` (mute by default) 6. **Failure modes & mitigations**: - Teams webhook 4xx → log + alert ops (NOT to Teams!) - Teams webhook 5xx / timeout → retry up to N - Teams webhook deprecated (Microsoft is phasing out Office 365 connectors mid-2025) → migrate to Workflows / Power Automate-backed webhooks. Plan the migration NOW. - Translator service down → fallback path: email-on-call 7. **Microsoft's connector deprecation** — Office 365 Incoming Webhooks are deprecated. Recommend migrating to Power Automate "When a Teams webhook request is received" or a Teams bot with Graph API for long-term reliability. 8. **Observability** — metrics: alerts received per source, cards posted per channel, throttling events, retry counts, dedup hits, end-to-end latency p50/p95. Output as: (a) translator service architecture, (b) per-source parser outline, (c) routing matrix YAML, (d) Adaptive Card template, (e) Teams connector deprecation migration plan, (f) observability dashboard. Bias toward: a translator service in your control, future-proofing for the Office 365 connector deprecation, observability of the pipeline itself.