Building a Slack Status Bot: Real-Time Service Health Where

Every team has a status dashboard. Almost nobody looks at it until something is already on fire — at which point twelve people pile into a channel asking “is it just me or is checkout down?” while someone hunts for the dashboard URL. The information existed; it just wasn’t where the team’s attention was. A Slack status bot fixes that by bringing live service health into the place your team already lives, on demand and on a schedule.

I’ve built a few of these, and the good ones share a property: people trust them enough to check the bot instead of asking the channel. That trust is the whole game, and it comes from accuracy and good presentation, not features. Here’s how to build one that earns it.

The two jobs of a status bot

A status bot does two distinct things, and it’s worth separating them:

On-demand status — someone types /status (or /status checkout) and gets the current health of one or all services, right now.
Proactive updates — the bot posts when a service changes state, so people don’t have to ask. This is the higher-value half: the best status bot answers “is checkout down?” before anyone types it.

Build the on-demand piece first because it’s simpler, then layer proactive notifications on top once you trust the health checks.

Where the health data comes from

Resist the urge to reimplement monitoring inside the bot. The bot is a presenter, not a monitor. Pull from sources you already trust:

Existing health-check endpoints (/healthz, /readyz) for a direct liveness read.
Your monitoring system’s API (Datadog, Grafana, Prometheus) for the real SLO-grade signal.
Upstream provider status pages for dependencies you don’t control.

A simple health aggregator the bot can call:

import requests

SERVICES = {
    "checkout": "https://checkout.internal/healthz",
    "payments": "https://payments.internal/healthz",
    "search":   "https://search.internal/healthz",
}

def check_all():
    results = {}
    for name, url in SERVICES.items():
        try:
            r = requests.get(url, timeout=3)
            results[name] = "up" if r.ok else "degraded"
        except requests.RequestException:
            results[name] = "down"
    return results

The bot calls this, formats the result, and posts it. The monitoring stays in the monitoring tools; the bot just surfaces it where people look.

Presentation is what builds trust

A status response people trust at a glance uses clear visual state and zero ambiguity. Block Kit with status emoji does this well:

{
  "blocks": [
    { "type": "header", "text": { "type": "plain_text", "text": "Service Status" } },
    { "type": "section", "text": { "type": "mrkdwn",
      "text": ":large_green_circle: *checkout* — operational\n:large_yellow_circle: *payments* — degraded\n:red_circle: *search* — down" } },
    { "type": "context", "elements": [
      { "type": "mrkdwn", "text": "Updated <!date^1718200000^{time}|just now> · /status for details" } ] }
  ]
}

Green/yellow/red is instantly readable. The “updated” timestamp matters more than it looks: a status bot that might be showing stale data is a status bot nobody trusts. Always stamp freshness, and if a health check itself failed to run, say “unknown” rather than guessing “up.”

Proactive updates without the spam

The notification half is where status bots go wrong. Post on every flicker and people mute the bot within a day; then it’s useless precisely when it matters. Discipline:

Only post on state transitions. “checkout went down” and “checkout recovered” — not “checkout is still down” every 60 seconds.
Debounce. Require a state to hold for a couple of check cycles before announcing, so a single failed health check doesn’t cry wolf.
Update one message rather than posting a stream. Edit a pinned status message as state evolves so the channel isn’t a wall of notifications.
Route by severity. A degraded internal tool is a low-priority post; a down customer-facing service is a notify-everyone event.

Get this right and the bot becomes the channel’s trusted heartbeat instead of noise people have learned to ignore.

A public status bot variant

The same machinery can power a customer-facing status update in a shared or community channel. The difference is register: external status messages drop internal jargon, never speculate on root cause, and stick to impact and ETA. If you build this, keep the customer-facing copy generation tightly templated — and if you use AI to draft it, constrain the prompt hard (no internal hostnames, no root-cause guessing). A saved status-update prompt in your prompt library keeps that copy consistent under pressure.

Let AI explain, not decide

A nice enhancement: when a service goes degraded, have the bot attach a short AI-generated “what this might mean” note pulled from the recent alerts and deploy history. Keep it strictly advisory — the bot reports the fact (search is down) deterministically from the health check, and the AI adds context (a deploy went out 4 minutes ago) that a responder can act on. Never let the model’s interpretation override the measured health state; the green circle must come from the health check, not the LLM.

Ship the on-demand version this week

Start small: a /status command that hits your existing health endpoints and posts a clean, color-coded, timestamped summary. That alone kills the “is it just me?” pile-ups. Once the team trusts the readout, add debounced transition notifications and you’ve got a status bot that answers the question before it’s asked.

The dashboard nobody opens becomes the bot everybody checks — because it’s finally where the team already is. For the surrounding patterns, from health-check alerting to Block Kit design, see our other AI for Slack guides.

Building a Slack Status Bot: Real-Time Service Health Where Your Team Lives