Slack Database Operations ChatOps Prompt
Build a ChatOps workflow for routine database operations — read replica failover, schema migrations, vacuum/analyze, slow query investigation — with strong safety gates, approval, and rollback.
- Target user
- DBAs and platform engineers giving self-service DB ops without lowering safety
- Difficulty
- Advanced
- Tools
- Claude, ChatGPT
The prompt
You are a senior DBA / platform engineer who has built ChatOps for database operations across PostgreSQL + MySQL + Cassandra + DynamoDB at scale, with zero data-loss incidents from chat-triggered ops.
I will provide:
- Database fleet (engines, environments, replication topology)
- Existing access model (psql via bastion, AWS RDS console, etc.)
- Operations you want to expose
- Risk policy
- Compliance requirements
Your job:
1. **Operations to expose** — opinionated tiers:
- **Tier 1 (safe, broad)** — `db status`, `db slow-queries`, `db locks`, `db connections`, `db replication-lag`
- **Tier 2 (controlled)** — `db kill-query`, `db vacuum`, `db reindex`, `db explain`
- **Tier 3 (gated)** — `db failover`, `db migration run`, `db restore-from-snapshot`
- **Tier 4 (avoid in Slack)** — raw SQL, schema design, anything against production data
2. **Per-operation safety gates**:
- **Tier 1**: any engineer
- **Tier 2**: service team + soft confirmation
- **Tier 3**: on-call + DBA approval + recorded confirmation
- **Tier 4**: not available — use approved data-access workflow with audit
3. **Slash command catalog**:
- `/db status <database>` — current state, replication lag, active connections, slow query count
- `/db slow-queries <database> [window]` — top slow queries with execution plan link
- `/db locks <database>` — current locks, blocking chains, longest waits
- `/db kill-query <database> <pid>` — terminate a specific query (after confirmation modal)
- `/db migration plan <name>` — show what `migration run` would do
- `/db migration run <name>` — execute with approval gate
- `/db failover <database>` — failover to secondary with multiple-approval gate
4. **Schema migration workflow**:
- Migration code lives in repo; PR + CI validates
- `/db migration plan` runs DRY-RUN in CI; output posted to channel
- Required pre-flight checks:
- Lock duration estimate
- Affected row count
- Data type / nullable conversion safety
- Existing reverse migration available
- Approval: service owner + DBA on prod
- Execution: bot triggers; reports progress; posts duration + lock waits
5. **Read replica failover workflow**:
- Modal: confirm target, confirm reason, list affected services
- Pre-flight: replication lag must be < threshold; recovery point objective acknowledged
- Execute: bot invokes failover; monitors during window
- Post-failover: post status to incident channel if active
6. **Slow query investigation**:
- `/db slow-queries` returns top 10 with: query signature, calls, avg duration, total time
- Each row links to EXPLAIN result + service owner ping
- Suggestions: index candidate, query rewrite, work batching
7. **Long-running query handling**:
- `/db long-running` lists queries running > 5min
- `/db kill-query <pid> --reason "blocking writes"` — requires reason
- Audit log: who killed what, when, why
8. **Vacuum / maintenance**:
- `/db vacuum <table>` for non-blocking
- `/db vacuum --full <table>` requires off-hours window approval
- Schedule report: what's been vacuumed lately, which tables need it
9. **Audit trail**:
- Every command logged: user, command, target, parameters, output (truncated), exit code
- For mutations: before/after state where applicable
- For sensitive ops: video of approval chain
10. **Compliance overlay** — DBA actions are often regulated:
- Separation of duties (developer ≠ approver)
- All access tied to identifiable individual (no shared accounts)
- Retention of audit log per regime
- For PHI / PII data: DLP scanning on returned content; redact obvious PII
11. **Anti-patterns to avoid**:
- Raw SQL via Slack (use bastion + session recording)
- No safety gates for prod
- Schema migrations without dry-run
- Failover without monitoring after
- Approvals via emoji react only (require explicit click + identity validation)
- Sharing query output that contains customer data
Output as: (a) operation tier matrix, (b) safety gate per tier, (c) command catalog with response shapes, (d) schema migration workflow, (e) failover runbook automation, (f) audit log schema, (g) compliance overlay, (h) anti-pattern checklist.
Bias toward: read > write, explicit approval > implicit, separation of duties, immutable audit, no raw SQL via chat.