Build an AI On-Call Assistant Card for Microsoft Teams

At 3 a.m., the most valuable thing on-call has isn’t another dashboard — it’s the one teammate who remembers that the payments service needs a cache flush after a deploy, and who’s asleep. That tribal knowledge lives in runbooks nobody reads under pressure and in Slack-or-Teams threads from incidents past. I wanted a bot that could answer “what do I do when checkout latency spikes?” in the channel, grounded in our actual runbooks, without me having to wake the one person who knows.

That’s a retrieval-augmented generation (RAG) bot, and it’s a genuinely good fit for Teams: the engineer asks in the channel, the bot retrieves relevant runbook content, an LLM drafts an answer, and it comes back as an Adaptive Card with the sources cited. The non-negotiable framing: this is a fast junior engineer handing you notes from the runbook. It does not execute anything, it cites its sources so you can verify, and a human decides what to actually do.

Retrieve first, then generate — never generate from memory

The whole value is grounding. An LLM answering from its training data about your infrastructure is a liability — it’ll confidently describe a runbook you don’t have. So the flow is retrieve-then-generate: the question goes to a vector search over your indexed runbooks, the top chunks become context, and the model answers only from that context.

async def answer_question(question: str) -> dict:
    chunks = await vector_search(question, k=5)   # your runbooks, indexed
    context = "\n\n".join(f"[{c.source}] {c.text}" for c in chunks)
    answer = await llm(
        system=GROUNDING_PROMPT,
        user=f"Question: {question}\n\nContext:\n{context}",
    )
    return {"answer": answer, "sources": [c.source for c in chunks]}

The grounding prompt is doing critical work:

Answer the on-call engineer's question using ONLY the provided context
from our runbooks. If the context doesn't contain the answer, say
"I don't have a runbook for that — escalate to the service owner."
Never invent commands, hostnames, or procedures. Cite the source of
each step. Do not suggest destructive commands; describe them and tell
the engineer to run them deliberately.

That “I don’t have a runbook for that” escape hatch is what stops the bot from confabulating procedures. An on-call bot that says “I don’t know” is trustworthy; one that always has a confident answer is dangerous.

Render the answer as a card with sources

The answer comes back as an Adaptive Card, with the sources as clickable links so the engineer can verify against the real runbook before acting. Citations aren’t decoration here — they’re the verification mechanism.

{
  "type": "AdaptiveCard",
  "version": "1.5",
  "body": [
    { "type": "TextBlock", "text": "🤖 On-call assistant (AI — verify before acting)",
      "weight": "Bolder", "color": "Accent" },
    { "type": "TextBlock", "text": "${answer}", "wrap": true },
    { "type": "TextBlock", "text": "**Sources**", "wrap": true, "spacing": "Medium" },
    { "type": "TextBlock", "text": "${sources}", "wrap": true, "isSubtle": true }
  ],
  "actions": [
    { "type": "Action.OpenUrl", "title": "Open full runbook", "url": "${primarySource}" },
    { "type": "Action.Submit", "title": "👎 Wrong answer",
      "data": { "verb": "flag_answer" } }
  ]
}

The ”👎 Wrong answer” action matters more than it looks. It gives engineers a one-tap way to flag bad responses, which feeds your evaluation set. A RAG bot that can’t be corrected drifts into uselessness.

Pro Tip: Index runbook chunks with their source URL and a “last reviewed” date in the metadata. Surface that date on the card — an answer pulled from a runbook last touched two years ago deserves more skepticism, and showing the date prompts it.

Wire it into the bot, scoped to a channel

In the Bot Framework handler, you trigger the assistant on an @mention or a slash-style prefix so it doesn’t respond to every message. Respond with the card, and keep the bot’s scope tight.

this.onMessage(async (context, next) => {
  const text = removeMention(context.activity);
  if (!text.startsWith("ask")) return await next();

  const question = text.slice(3).trim();
  const result = await answerQuestion(question);
  await context.sendActivity({ attachments: [assistantCard(result)] });
  await next();
});

The assistant describes actions — it never takes them

This is the line I won’t cross. The on-call assistant retrieves and explains. It does not run commands, restart services, or change anything. If a runbook step is “run kubectl rollout restart deploy/checkout,” the bot shows you that command and tells you to run it deliberately — it doesn’t offer a button that does it. The moment a bot can execute infrastructure changes off an LLM’s interpretation of a tired engineer’s question, you’ve built a way to take down production by accident.

If you do want action-taking ChatOps, that’s a separate, heavily-gated system with explicit confirmation cards and authorization checks — not this read-only assistant. Keep the two firmly separated.

Keep credentials and customer data out of the model

The bot’s app secret, the vector DB connection, the Graph token if you index from Teams — none of it goes to the model. The model receives the question and the retrieved runbook chunks, and returns text. Be deliberate about what’s in your runbook index, too: if runbooks contain credentials (they shouldn’t), those chunks could surface in an answer, so scrub secrets out of indexed content. The retrieval context is effectively model input, and the rule against feeding the model real tenant credentials applies to it.

For a hosted model, treat the retrieved runbook text as sensitive internal data and pick a model and data-handling posture that matches your policy.

What you end up with

A well-built on-call assistant turns “I have no idea, let me wake someone up” into “the bot found the runbook, I read it, I verified, I acted.” That’s a real reduction in 3 a.m. escalations, and the engineer stays in control the whole time because the bot cites sources and never acts on its own.

The framing holds: AI is a fast junior engineer handing you notes from the runbook, a human reviews and decides before touching the tenant, every answer cites sources so you can verify, the bot never executes, and you never hand the model real tenant credentials. For the bot mechanics and the broader picture, see the Microsoft Teams category and the incident-response dashboard. The prompt workspace is useful for tuning the grounding prompt, the prompt library has RAG starting points, and Claude handles the retrieve-then-ground pattern well with its large context window.