AI for Slack Difficulty: Intermediate ClaudeChatGPT

Slack Capacity & Quota Threshold Alerts Prompt

Detect and notify on capacity threats in Slack — disk, memory, cloud quotas, license seats, RDS storage, K8s pod limits — with growth projections and provisioning lead-time-aware alerting.

Target user: SRE / platform leads preventing capacity-induced outages with lead-time alerts
Difficulty: Intermediate
Tools: Claude, ChatGPT

The prompt

You are a senior SRE who has prevented many capacity outages by surfacing growth projections to Slack with enough lead time to actually provision.

I will provide:
- Resource types in scope (cloud quotas, disk, memory, license seats, DB storage, K8s)
- Monitoring tools (Prometheus / Datadog / cloud-native)
- Provisioning lead times (cloud quotas can be hours-to-weeks; physical hardware is months)
- Pain points (capacity-bite outages, last-minute quota requests, no forecast)

Your job:

1. **What's worth monitoring for capacity**:
   - **Cloud quotas** — vCPU per region, EBS GP3 storage, Lambda concurrent, etc.
   - **Disk** — partition utilization, inode count, snapshot count
   - **Memory** — host RAM, container limits, JVM heap
   - **DB storage** — RDS / Cosmos / Bigtable allocated vs used
   - **K8s** — node CPU/memory allocatable, pod count vs limit, PV capacity
   - **License seats** — Datadog hosts, Snyk projects, GitHub seats
   - **Network** — bandwidth ceilings, connection table size, NAT gateway limits

2. **Multi-window alerts** — different lead times for different resources:
   - **Provisioning lead time = hours**: alert at 80% utilization
   - **Provisioning lead time = days**: alert at 70% utilization
   - **Provisioning lead time = weeks**: alert at 50% utilization (project growth)
   - **Provisioning lead time = months** (hardware): alert at 30%

3. **Forecast** — alert on projected, not just current:
   - Linear regression on 30-day growth → days-until-X%
   - Alert when "days until 80%" < provisioning lead time + buffer
   - Example: "RDS storage will hit 80% in 14 days; provisioning takes 7d; you have 7d buffer"

4. **Slack message anatomy**:
   - Resource name + scope (region, account, environment)
   - Current state + threshold breached + when
   - Trend (7d, 30d)
   - Projected breach date (if applicable)
   - Suggested action (provision X more, prune Y, request quota increase)
   - Owner ping + linked dashboard

5. **Quota request workflow**:
   - For cloud quotas: bot links to the cloud console quota request form
   - Pre-fills justification from the alert ("we currently have N at 80%; projected growth Y; please raise to Z")
   - Tracks the request; alerts when approved/denied
   - Re-validates that the new limit is in effect

6. **Routing**:
   - **Critical** (lead time threatened) → `#capacity-alerts` + DM on-call
   - **High** (growth trajectory concerning) → service team channel
   - **Info** (long-lead-time projections) → weekly digest

7. **Anomaly detection vs trend**:
   - Sudden spike (e.g. disk filled in 1h) → page; abnormal growth
   - Gradual growth (linear) → projection alert
   - Cyclic (peak hours) → don't alert on the peak; alert on the baseline trend

8. **Inventory + tagging**:
   - Every monitored resource has: service owner, environment, criticality
   - Ownerless resources trigger a "find owner" workflow before they have problems

9. **Action prompts in the message**:
   - Disk → `du -sh /*` + auto-scale suggestion
   - Cloud vCPU quota → link to request form + suggested new limit
   - RDS storage → enable autoscaling if not already; clear old snapshots
   - K8s nodes → suggest cluster autoscaler config check
   - License seats → review inactive users for reclamation

10. **Anti-patterns to avoid**:
   - Alert at 95% (no provisioning lead time)
   - Pages on every capacity warning (cry wolf)
   - Manual capacity reviews (drift inevitable)
   - Ignoring autoscaling failures (warning at 90% means autoscaler is failing)
   - Tracking only utilization, not growth rate

11. **Compliance overlay**:
   - For regulated systems: capacity planning is a control (SOC 2 CC9.1)
   - Document quarterly capacity reviews
   - Retain capacity-event logs for audit

Output as: (a) resource type inventory, (b) multi-window threshold policy, (c) forecast model (simple regression spec), (d) Block Kit message JSON, (e) quota request workflow, (f) routing matrix, (g) inventory + tagging requirements, (h) action prompt library.

Bias toward: project-and-alert (not just react), provisioning lead time aware, owner attribution, autoscaling failures surfaced.

Free: the DevOps AI Incident-Triage Cheat Sheet