Skip to content
CloudOps
Newsletter Sign up
All prompts
AI for Slack Difficulty: Intermediate ClaudeChatGPT

Slack Observability Dashboard Surface Prompt

Surface real-time observability data directly in Slack — live SLO status, recent deploys, current incidents, top-error queries — via slash commands and scheduled posts that bring the dashboard to where engineers already are.

Target user
SRE / platform leads reducing the gap between alerts and dashboards
Difficulty
Intermediate
Tools
Claude, ChatGPT

The prompt

You are a senior SRE who has reduced incident MTTR by bringing observability data directly into Slack — instead of asking engineers to context-switch to Grafana / Datadog every time.

I will provide:
- Observability tools (Grafana / Datadog / New Relic / Prometheus / OpenTelemetry)
- Most-asked dashboard questions during incidents
- Existing Slack alert pipelines
- Pain points (engineers hunting for the right dashboard URL while the incident is hot)

Your job:

1. **What's worth surfacing in Slack** — be opinionated:
   - **Always**: current SLO state per critical service, current incident count, deploy clock
   - **On request via slash command**: error rate for service X, top errors last 1h, deploy diff
   - **Scheduled**: daily SLO digest, weekly DORA metrics, monthly capacity trend
   - **Avoid**: full dashboards as screenshots (use links), every metric (signal-to-noise)

2. **Slash command catalog**:
   - `/status <service>` — current SLO state, recent deploys, current incidents
   - `/errors <service> [window]` — top 10 errors with counts + sample messages
   - `/deploy <service>` — last 5 deploys with success/failure + duration
   - `/slo <service>` — burn rate windows, budget remaining, recent reset history
   - `/dash <service>` — direct link to the right dashboard with time window pre-filtered
   - `/oncall <team>` — who's on call now + escalation chain

3. **Output format** — each command returns a Block Kit message with:
   - Header: service + the question answered + as-of timestamp
   - FactSet: 3-5 key numbers
   - Sparkline or mini-trend (unicode chars work in Block Kit text)
   - Link to the full dashboard with time window
   - Refresh button (re-runs the command and edits the message)

4. **Scheduled posts**:
   - **Morning SLO digest** — each weekday at 9am team-local, posted to team channel: services with budget < 50%, fast-burn alerts last 24h
   - **Weekly DORA** — Mondays: deploy frequency, lead time, change failure rate, MTTR
   - **Incident heatmap** — Fridays: incidents by service this week vs last

5. **Time-window magic** — when surfacing dashboard data:
   - Default to "last 1 hour" for live commands
   - For incident-context commands (called from within an incident channel), default to "since incident started"
   - For deploy-related, default to "since last deploy"

6. **Caching strategy**:
   - Aggregate metrics: 1-min cache (don't hammer the metrics backend)
   - Per-service status: 30-second cache
   - Top errors: 5-min cache (errors don't change that fast)
   - Pure URL generators (`/dash`): no cache

7. **Permission model**:
   - Read commands (`/status`, `/errors`): any engineer
   - Write/mutating (`/silence`, `/ack`): on-call only

8. **Error handling**:
   - Metrics backend down → return a graceful "metrics unavailable, here's the dashboard link"
   - Slow response → return preliminary data with "refreshing..." indicator
   - User asks about non-existent service → fuzzy-match suggest similar names

9. **Anti-patterns to avoid**:
   - Dumping all metrics in a single command
   - No link back to the full dashboard (people will want depth)
   - Slow responses that make Slack timeout
   - Caching that surfaces stale data during a fast-moving incident
   - Replacing dashboards instead of complementing them

10. **Measure success** — % of incident-channel chatter that's "what's the URL for X dashboard" before vs after. % of engineers using slash commands during incidents. MTTR delta.

Output as: (a) slash command catalog with response formats, (b) Block Kit JSON for one command output, (c) scheduled post schedule + content, (d) time-window heuristics, (e) caching strategy, (f) permission matrix, (g) error handling, (h) success metrics.

Bias toward: bring the answer to where the question is asked, complement dashboards don't replace, fast > comprehensive.
Newsletter

Free: the DevOps AI Incident-Triage Cheat Sheet

Subscribe and we’ll send you the one-page cheat sheet — plus weekly AI prompts, automation ideas, and tool reviews for infrastructure engineers. One email a week. No spam, unsubscribe anytime.

  • AI Incident-Triage Cheat Sheet (PDF)
  • Access to 600+ DevOps AI prompts
  • One practical workflow email per week