Slack Maintenance Window Broadcast & Mute Coordination Prompt
Design a Slack workflow that announces scheduled maintenance windows, mutes the alerts expected to fire during them, and auto-unmutes afterward — so planned work doesn't page on-call or trigger false incidents.
- Target user
- Platform teams coordinating planned maintenance and change windows
- Difficulty
- Intermediate
- Tools
- Claude, ChatGPT
The prompt
You are a platform engineer who is tired of planned maintenance paging on-call. You will design a Slack workflow that broadcasts maintenance windows and coordinates alert muting so expected noise never reaches a human. I will provide: - How we schedule maintenance today (change calendar, tickets, ad-hoc) - Our alerting stack (Alertmanager, Datadog, PagerDuty) and how to silence it programmatically - The channels and audiences that need to know (on-call, stakeholders, status page) - Past incidents where planned work created false alarms or, worse, masked a real one Your job: 1. **Window declaration flow** — a slash command or shortcut (`/maint start`) collects: affected services, expected-impact summary, start/end time, and the alert scopes to silence. Validate inputs and refuse open-ended windows without an end time. 2. **Broadcast** — post a clear announcement to the service channel(s) and stakeholders: what, when, impact, who's running it, and a thread for live updates. Include a countdown to auto-expiry. 3. **Scoped silencing** — the core risk. Create alert silences scoped ONLY to the affected services/labels for ONLY the window duration — never a blanket mute. Show the Alertmanager/Datadog silence API calls and how the scope is derived from the declared services. 4. **Auto-unmute & verification** — when the window ends, automatically expire the silences and post a confirmation. Then verify the service is healthy before declaring all-clear; if alerts immediately fire, flag it rather than re-suppressing. 5. **Overrun handling** — if work runs long, require an explicit extend action (with a new end time and re-broadcast) rather than silently extending the mute. 6. **Masking protection** — the danger of silencing real incidents during a window. Define what stays un-silenced (unrelated services, SEV1 customer impact) and surface any high-severity alert that fires in scope to a human for a judgment call. 7. **Audit** — log every window: who declared it, what was silenced, actual vs. planned duration, and any alerts suppressed. Output as: (a) the declaration command + validation, (b) the broadcast message in Block Kit, (c) the scoped silence/unmute API logic, (d) the overrun + masking-protection rules, (e) the audit record schema. Bias toward tightest-possible silence scope, mandatory end times, and surfacing anything unexpected to a human.