Build a RAG Runbook Bot That Answers Ops Questions in Slack

I have written the same Slack answer roughly forty times: “how do we rotate the Redis password in staging?” It’s documented. The doc exists. Nobody can find it. So when someone @-mentions our bot now and gets the answer with a link to the exact runbook section, it feels like I’ve reclaimed a small piece of my life. That’s the promise of a retrieval-augmented (RAG) runbook bot: it doesn’t know your ops — it looks things up and reads them back, grounded in documents you control. The grounding is the entire point, because an ungrounded LLM answering ops questions is a liability, not a teammate.

This post covers the working shape of that bot: an app_mention handler, a retrieval step over your docs, a prompt that forces the model to cite, and a Block Kit reply with source links. Plus the rules that keep it from confidently inventing a procedure that bricks your database.

Why RAG and not “just ask the model”

A base LLM will answer “how do we rotate the Redis password?” no matter what — it’ll generate a plausible-sounding procedure that may be completely wrong for your environment. An LLM without retrieval is a confident stranger guessing at your infrastructure. RAG fixes this by retrieving your actual runbook chunks and instructing the model to answer only from them, with citations. If nothing relevant is retrieved, a well-prompted bot says “I don’t have a runbook for that” instead of improvising.

The mental model I keep coming back to: the model is a fast junior engineer who’s great at summarizing the docs you hand it, and terrible at admitting when it hasn’t read them. Retrieval forces the docs into its hands. The prompt forces it to stay there.

Listening for the mention

Slack delivers app_mention events over the Events API. Bolt verifies the request signature with your signing secret before invoking the handler, so you’re not hand-rolling HMAC — but if you ever do build the endpoint raw, you must compute an HMAC SHA256 over v0:{x-slack-request-timestamp}:{body} and compare it to x-slack-signature in constant time. Always verify webhook signatures; an open Events endpoint will get probed.

import os
from slack_bolt import App

app = App(
    token=os.environ["SLACK_BOT_TOKEN"],
    signing_secret=os.environ["SLACK_SIGNING_SECRET"],  # Bolt verifies signatures
)

@app.event("app_mention")
def handle_mention(event, say, client):
    # Strip the leading <@BOTID> mention from the text
    question = strip_mention(event["text"]).strip()
    if not question:
        say(text="Ask me an ops question, e.g. `@runbook how do we rotate the Redis password?`")
        return

    chunks = retrieve(question, k=4)          # vector search over your docs
    if not chunks:
        say(text="I don't have a runbook covering that. Try #ops-help.")
        return

    answer, sources = answer_with_sources(question, chunks)
    say(blocks=render_answer(answer, sources), text=answer)  # text= is the fallback

The retrieval step

Index your runbooks ahead of time: chunk them, embed each chunk, store the vectors plus metadata (title, URL, last-updated). At query time you embed the question and pull the top-k nearest chunks.

def retrieve(question, k=4):
    q_vec = embed(question)
    hits = vector_store.search(q_vec, top_k=k)
    return [
        {
            "text": h.payload["text"],
            "title": h.payload["title"],
            "url": h.payload["url"],
            "score": h.score,
        }
        for h in hits
        if h.score > 0.75  # drop weak matches so you fail closed
    ]

Pro Tip: the score threshold is a feature, not a nuisance. A bot that returns nothing for an unindexed topic is far safer than one that pads a thin retrieval with the model’s imagination. Tune it to err toward “I don’t know.”

One non-negotiable: never index secrets. Your runbooks should reference where a credential lives (“in Vault under secret/redis/staging”), never the value. If a real token lands in your vector store, every answer the bot gives risks leaking it into a channel. Scrub the corpus before you embed it, and never hand the model real tokens or secrets as part of the context.

Constructing the grounded prompt

The prompt does the heavy lifting. It hands over the retrieved chunks, demands citations, and explicitly licenses “I don’t know.”

def answer_with_sources(question, chunks):
    context = "\n\n".join(
        f"[{i+1}] {c['title']}\n{c['text']}" for i, c in enumerate(chunks)
    )
    system = (
        "You are an ops assistant. Answer ONLY using the numbered sources below. "
        "Cite sources inline like [1]. If the sources do not contain the answer, "
        "say you don't have a runbook for it. Do not invent commands or values."
    )
    user = f"Sources:\n{context}\n\nQuestion: {question}"

    answer = call_model(system, user)
    return answer, chunks

Because the system prompt confines the model to the supplied sources, the output stays anchored to your real docs. It’s still a draft a human should sanity-check before following anything destructive — RAG dramatically reduces hallucination, but it doesn’t eliminate it. The model can still mis-summarize a step or stitch two runbooks together awkwardly. If you’re refining these prompts, a prompt workspace makes it easy to A/B the grounding instructions and test citation-style templates.

Posting the answer with sources in Block Kit

The reply shows the answer and, critically, the source links so a human can verify. The sources are the receipts.

def render_answer(answer, sources):
    blocks = [
        {"type": "section", "text": {"type": "mrkdwn", "text": answer}},
        {"type": "divider"},
    ]
    src_lines = "\n".join(
        f"[{i+1}] <{s['url']}|{s['title']}>" for i, s in enumerate(sources)
    )
    blocks.append({
        "type": "context",
        "elements": [{"type": "mrkdwn", "text": f"*Sources:*\n{src_lines}"}],
    })
    blocks.append({
        "type": "context",
        "elements": [{"type": "mrkdwn", "text": "AI-generated from runbooks — verify before acting."}],
    })
    return blocks

That trailing “verify before acting” context block isn’t boilerplate. It sets the expectation that the answer is a starting point, not gospel, and the linked sources let anyone check the bot’s work in one click. This is the same posture I’d take with any AI assistant — whether it’s Claude or a smaller self-hosted model like Gemma, you ground it, you cite it, and a human verifies before the answer turns into an action.

Keeping it honest in production

A few operational habits keep the bot trustworthy. Log every question, the retrieved chunk IDs, and the answer, so you can audit when it got something wrong. Re-index on doc changes so stale runbooks don’t haunt you. Rate-limit per user. And have a human review the bot’s plumbing before it goes near a real workspace — especially the retrieval filter and the secret-scrubbing step, which are exactly the places an AI-scaffolded prototype tends to cut corners. When the bot’s answers feed into an actual incident, pipe them through your incident response flow so there’s a human owner, and use monitoring alerts to catch when retrieval quality drifts.

Conclusion

A RAG runbook bot turns your scattered, unfindable documentation into something people actually use, without pretending the LLM knows your infrastructure. The discipline is simple to state and easy to skip: retrieve real docs, cite them, fail closed when retrieval is weak, keep secrets out of the index, and let a human verify before anyone acts on the answer. Do that and you get a bot that earns trust by showing its sources. More Slack patterns are in the Slack category.