Teams Bot Framework ChatOps Design Prompt
Design a Teams ChatOps bot on Azure Bot Service for running kubectl / az / terraform / deploy commands with Azure AD identity, RBAC, audit logging, and approval workflows.
- Target user
- Platform engineers building ChatOps for Teams-first orgs
- Difficulty
- Advanced
- Tools
- Claude, ChatGPT
The prompt
You are a senior platform engineer who has shipped ChatOps bots on the Microsoft Teams Bot Framework + Azure Bot Service to production use across 100+ engineers, with strong audit, RBAC, and approval flows. I will provide: - Target backends (AKS / GKE / EKS clusters, Azure subscriptions, Terraform Cloud, ArgoCD) - Identity provider (Azure AD / Entra) - Team / role structure - Compliance environment (SOC2, ISO 27001, FedRAMP, etc.) - Risk tolerance for self-service mutations Your job: 1. **Bot architecture**: - **Azure Bot Service** registration + Teams channel binding - **App Service Plan** vs **Container Apps** vs **Functions** — recommend Container Apps for production (warm starts, scaling, networking) - **Bot Framework SDK** (Node.js, Python, C#) — recommend choice based on team - **State backend** — Cosmos DB for conversation state, Redis for command queue - **Secrets** — Key Vault + managed identity 2. **Identity model**: - Teams user → AAD object id (Bot Framework supplies this directly) - AAD object id → role membership (read from AAD groups via Graph) - Role → command surface (RBAC matrix) - Per-resource ACL for fine-grained gates 3. **Command surface**: - **Slash commands**: `/kubectl <args>`, `/az <args>`, `/tf plan|apply <stack>`, `/deploy <service> <env>`, `/silence <alert> <duration>` - **Adaptive Card forms** for parameter input (avoid free-text parsing) - **Message extensions** for quick lookups (e.g. `@bot pod my-pod` from any message) - **Tabs** for dashboards (pod lists, deploy history) 4. **Safety controls**: - Dry-run by default, explicit `--apply` for mutations - Adaptive Card confirmation for destructive commands - Protected resource ACL (block on prod namespaces / prod subscriptions) - Per-user rate limits (token bucket) - Command timeouts; cancellable long-running ops 5. **Approval workflow**: - Post an Adaptive Card with Approve/Reject buttons to a configured channel - Require N approvers from a specified AAD group - Approver identity validated via `Action.Execute` + AAD token verification - Timeout (auto-reject after N minutes) 6. **Audit trail** — every invocation written to: - Immutable log (Azure Storage with immutability policy, or Log Analytics workspace with retention lock) - Optionally: ServiceNow / Jira ticket for change records - Teams thread reply with outcome + log link Audit record shape: user (AAD oid + UPN), command, args, resolved RBAC decision, dry-run vs apply, target resource, output (truncated), exit code, duration, approval chain. 7. **Threat model**: - Teams workspace compromise → AAD conditional access; require MFA for sensitive commands - Token theft from bot → managed identity, no static creds - Command injection from user input → typed Adaptive Card forms, never shell-expand user strings - Output leaks of secrets → output redaction layer; never return raw kubectl secret output 8. **Compliance overlay** — log immutability, retention windows, eDiscovery readiness, data residency for AAD tenants in regulated regions. Output as: (a) architecture diagram description, (b) RBAC matrix YAML, (c) command catalog with safety tiers, (d) audit log schema, (e) Adaptive Card JSON for one approval workflow, (f) deploy + ongoing maintenance plan, (g) Office 365 connector deprecation note (use Bot Framework, not webhook). Bias toward: typed input via Adaptive Cards (not free-text parsing), AAD-native identity, immutable audit.