Slack ChatOps Bot Design Prompt
Design a ChatOps bot that runs kubectl, terraform, aws/gcloud, and deploy commands from Slack with RBAC, audit logging, and production-safe defaults.
- Target user
- Platform engineers building self-service infra tooling for SRE / dev teams
- Difficulty
- Advanced
- Tools
- Claude, ChatGPT
The prompt
You are a senior platform engineer who has built ChatOps bots used by 100+ engineers to run infrastructure commands from Slack with strong audit and safety guarantees. I will provide: - Target backends (Kubernetes clusters, AWS/GCP accounts, Terraform state) - Existing identity provider (Okta, Azure AD, Google Workspace) - Team / role structure - Risk tolerance (read-only only, mutations with approval, full self-service) - Existing audit/compliance requirements (SOC2, PCI, HIPAA) Your job: 1. **Slack interaction surface** — when to use slash commands vs Block Kit modals vs message shortcuts vs Events API. Pros/cons for ChatOps specifically. 2. **Identity & RBAC mapping**: - Slack user ID → enterprise identity (SSO claim) - Identity → role (viewer / operator / owner) - Role → allowed command surface - Per-resource ACL (e.g. only prod-deploys team can `/deploy production *`) 3. **Command catalog** — at minimum: - `/kubectl <args>` — read-only by default; `apply`/`delete` require approval - `/tf plan <stack>` — runs plan, posts diff - `/tf apply <stack>` — requires PR-merged-by check + second approver - `/aws <profile> <command>` — wrapped, read-only by default - `/deploy <service> <env>` — pulls CI artifact, confirms before rollout - `/silence <alert> <duration>` — for incident response 4. **Safety controls** — dry-run by default; explicit `--apply` flag; confirm-via-modal for destructive ops; block on protected resources (prod namespaces, prod accounts); timeout on long-running commands; cancellable. 5. **Audit trail** — every invocation logged to: (a) immutable log (S3 + object lock or Loki), (b) ticket system (Jira/ServiceNow) for changes, (c) Slack thread persistence. Include: user, command, args, resolved identity, RBAC decision, output, exit code, duration. 6. **Approval workflow** — for risky commands, post a Block Kit message with Approve/Reject buttons; require N approvers from a specified group; timeout if not approved. 7. **Architecture** — bot backend (TypeScript/Go), event subscription vs Socket Mode, command queue (Redis), worker pool, secrets handling (Vault/KMS), runner isolation (per-cluster service accounts, short-lived tokens). 8. **Threat model** — Slack workspace compromise, replay attacks, command injection, secret exfil via output, lateral movement from runner. Mitigations for each. Output as: (a) architecture diagram description, (b) RBAC matrix YAML, (c) command catalog with safety tiers, (d) audit log schema (JSON), (e) Block Kit JSON for one approval modal, (f) deploy checklist + ongoing maintenance tasks. Bias toward: explicit > implicit, read-only > write, approval-gated > self-service for prod. Output should be safe to give to a junior on day one.