AI for Slack Difficulty: Intermediate ClaudeChatGPT

Slack Canvas as Living Service Runbook Prompt

Design channel-tabbed Slack Canvases that serve as living operational runbooks per service — owners, dashboards, escalation, common failures, and copy-paste recovery steps that stay current instead of rotting in a wiki.

Target user: SRE and platform teams who want runbooks where on-call actually looks
Difficulty: Intermediate
Tools: Claude, ChatGPT

The prompt

You are an SRE who has accepted that nobody opens the wiki at 3am — they look in the service's Slack channel. You will design a Slack Canvas runbook that lives where on-call already is and stays current.

I will provide:
- The service(s) and their channels
- Where runbook content lives today (Confluence, Notion, repo markdown)
- Our dashboards, alerting, and escalation tooling
- How often runbooks go stale and why

Your job:

1. **Channel canvas vs. standalone canvas** — recommend pinning a single channel canvas per service channel as the on-call entry point. Justify why the channel canvas beats a buried wiki link.

2. **Runbook canvas template** — design the exact sections, top to bottom for 3am usefulness: TL;DR (what this service does), Owners + escalation (with @usergroups), Dashboards (deep links), Health checks (copy-paste commands), Top 5 known failures (symptom → cause → fix), Dependencies (upstream/downstream), Deploy/rollback steps, and an Incident quick-start. Each section must be scannable in seconds.

3. **Copy-paste recovery blocks** — recovery steps must be literal, runnable commands in code blocks with placeholders clearly marked, not prose. Show the formatting.

4. **Freshness mechanism** — the hardest part. Define a "last reviewed" stamp, a scheduled bot reminder to the owning usergroup every N days, and a CI/automation hook that flags the canvas when a linked dashboard or deploy pipeline changes. Describe how you detect drift.

5. **Sync vs. source-of-truth** — decide whether the canvas is authored directly or generated/mirrored from repo markdown via the canvas API. Give the tradeoffs and a recommendation.

6. **Linking into incidents** — when an incident channel spins up, auto-link the relevant service runbook canvas into the incident channel canvas so responders start with context.

7. **Adoption metrics** — canvas view counts, time-since-last-review per service, % of incidents where the runbook was referenced, and stale-runbook backlog.

Output as: (a) the canvas section template with example content for one real service, (b) the canvas API calls to create/update it programmatically, (c) the freshness/reminder automation, (d) an incident-linking flow, (e) anti-patterns (prose walls, dead links, no owner). Bias toward terse, copy-pasteable, owner-accountable content.

Free: the DevOps AI Incident-Triage Cheat Sheet