Generate Test Cases for Your Slack Bot Handlers With AI

Slack bots are deceptively hard to test. A handler looks like a tidy function, but in production it receives a sprawl of event shapes: mentions, edited messages, messages from other bots, payloads with missing fields, interactions from buttons that no longer exist. I kept shipping bots that worked in the happy path and fell over on the third weird event. So I started using AI to generate test cases for my handlers, and it surfaced edge cases I would never have thought to write. The discipline was reviewing every generated test, because a test that asserts the wrong thing is worse than no test.

Why bot handlers are testing traps

The Slack platform sends you a lot of shapes. An app_mention and a message event share fields but differ in subtle ways. A bot message has a bot_id; ignore it and you get infinite loops where your bot replies to itself. Interaction payloads from buttons carry stale value data if a message is old. The matrix of “what could arrive here” is large and boring to enumerate by hand, which is precisely why bugs hide in it. Enumerating boring matrices is something a model does tirelessly.

The handler under test

Here is a representative handler. It answers mentions but must ignore its own messages and handle missing text gracefully:

async function handleMention(event, client) {
  if (event.bot_id) return; // don't reply to bots, including ourselves
  const text = (event.text || "").replace(/<@[^>]+>/, "").trim();
  if (!text) {
    return client.chat.postMessage({ channel: event.channel, text: "Did you mean to ask me something?" });
  }
  const answer = await respond(text);
  return client.chat.postMessage({ channel: event.channel, thread_ts: event.ts, text: answer });
}

Three behaviors to verify: ignore bot messages, prompt on empty text, answer real questions in-thread. That is the spec I hand the model.

Asking AI for the test matrix

I give the model the handler and the spec and ask for a table of cases with inputs and expected behavior, not finished code yet.

const prompt = `Here is a Slack Bolt handler and its intended behavior.
List test cases as JSON: { name, event, expected }.
Include edge cases: bot_id present, missing text field, only a mention with no words,
unicode text, very long text, and event from a thread.

Handler:
${handlerSource}`;

The model is a fast junior QA engineer here. It reliably remembers the cases I forget, like the empty-text case and the bot_id case, and it adds ones I would not have, like unicode and thread replies. Asking for the matrix first, before code, lets me sanity-check the thinking before any test gets written.

Pro Tip: Ask for the case list before the test code. Reviewing a table of “input and expected” is fast; reviewing twenty generated test functions is slow, and it is where a wrong assertion sneaks past you.

Turning cases into tests

Once I have approved the matrix, I let the model write the actual tests against my framework, with a mocked Slack client so nothing touches a real workspace:

const { handleMention } = require("../bot");

function mockClient() {
  return { chat: { postMessage: jest.fn().mockResolvedValue({ ok: true }) } };
}

test("ignores messages from bots", async () => {
  const client = mockClient();
  await handleMention({ bot_id: "B123", text: "hi", channel: "C1" }, client);
  expect(client.chat.postMessage).not.toHaveBeenCalled();
});

test("prompts when mention has no text", async () => {
  const client = mockClient();
  await handleMention({ text: "<@U1>", channel: "C1", ts: "1.1" }, client);
  expect(client.chat.postMessage).toHaveBeenCalledWith(
    expect.objectContaining({ text: "Did you mean to ask me something?" })
  );
});

The mock client is essential: tests must never hit a real Slack workspace. A test suite that posts to a channel is a test suite that spams your team.

Testing signature verification

The most important test is one the model often forgets unless prompted: that your endpoint rejects unsigned or replayed requests. I always ask explicitly for it.

test("rejects requests with an invalid signature", async () => {
  const res = await postToEndpoint("/slack/events", body, { "X-Slack-Signature": "v0=bad" });
  expect(res.status).toBe(401);
});

test("rejects requests with a stale timestamp", async () => {
  const old = Math.floor(Date.now() / 1000) - 60 * 10; // 10 minutes old
  const res = await postToEndpoint("/slack/events", body, signWith(old));
  expect(res.status).toBe(401);
});

This is not optional coverage. Your bot’s endpoint is internet-facing, and signature verification is the thing standing between you and a forged event. A test that proves you reject a bad signature and a replayed timestamp is worth ten happy-path tests.

Reviewing every generated test

Here is the rule I never break: I read every generated test before it lands in CI. The model occasionally writes a test that asserts the current behavior rather than the correct behavior, which means it would happily lock in a bug. A green test you did not read is a false sense of safety. The AI drafts the suite fast; I review each assertion against what the handler should do, not just what it does. That is the same human-in-the-loop discipline I apply before shipping any bot logic to a real workspace.

Tuning and fit

I refined the test-generation prompt in the prompt workspace across several handlers, then saved the working version to prompts. The testing templates in the prompt packs include the signature-verification cases so you do not forget them. This pairs naturally with the review-focused tooling in code review, which catches the bugs the tests do not.

For generating the tests I used Cursor and Claude, reviewing each, and the broader Slack handler patterns are in the Slack category.

Conclusion

AI-generated tests are the fastest way to cover the boring, bug-harboring matrix of Slack event shapes you would otherwise skip. Ask for the case list first, mock the Slack client so tests never hit a real workspace, always include signature and replay-rejection tests, and read every generated assertion before it lands in CI. The model enumerates tirelessly; you verify that each test asserts the right thing. That combination gives you a suite you actually trust before the next weird event arrives.