Natural-Language ChatOps: Parsing Slash Commands With AI

The first time someone on my team typed /ops restart the payments worker in staging into Slack and watched it actually happen, I felt two things at once: delight, and a cold spike of dread. Delight because nobody had to remember the exact kubectl rollout restart incantation. Dread because I’d just wired a large language model into a path that could touch production infrastructure. That tension is the whole story of natural-language ChatOps. The payoff is real, but the LLM is the least trustworthy component in the system, and you have to design around that fact from the very first line of code.

In this post I’ll walk through the pattern I landed on: a slash command captures free text, an LLM parses it into a structured action, the action is validated against an allow-list, the user confirms in a modal, and only then does anything execute. The model never runs commands. It only proposes them.

The mental model: the LLM is a fast junior engineer

Here’s the frame that keeps me honest. An LLM parsing your ops commands is like a fast, eager junior engineer who read the runbook once and is very confident. It’s genuinely useful for translating fuzzy human intent into structure. It is also capable of confidently producing nonsense, and it will happily “interpret” delete the staging database as a real instruction if you let it. So you treat its output the way you’d treat a junior’s first PR: a human (or a hard-coded allow-list) reviews before anything ships.

That means the model’s job is narrow. It does not decide whether an action is allowed. It only proposes which allow-listed action best matches the text, plus the parameters. Validation is plain code you wrote and reviewed.

Step 1: the slash command handler

Slack sends slash command invocations as a application/x-www-form-urlencoded POST. The text field holds everything after the command. With Bolt for JavaScript you ack() within three seconds, then do the slow work.

const { App } = require('@slack/bolt');

const app = new App({
  token: process.env.SLACK_BOT_TOKEN,
  signingSecret: process.env.SLACK_SIGNING_SECRET, // Bolt verifies the signature for you
});

app.command('/ops', async ({ command, ack, client, respond }) => {
  await ack(); // must respond within 3 seconds

  const userText = command.text?.trim();
  if (!userText) {
    return respond('Try: `/ops restart the payments worker in staging`');
  }

  // Parse intent with the LLM (never let it execute anything)
  const proposed = await parseIntent(userText);
  if (!proposed) {
    return respond("I couldn't map that to a known action. Try rephrasing.");
  }

  // Open a confirmation modal — nothing runs yet
  await client.views.open({
    trigger_id: command.trigger_id,
    view: confirmationModal(proposed, command.user_id),
  });
});

Note that Bolt verifies the request signature using your signing secret before your handler ever runs. If you build the endpoint by hand, you must verify x-slack-signature against an HMAC SHA256 of v0:{x-slack-request-timestamp}:{raw_body} yourself. Never skip signature verification — an unverified slash command endpoint is an open door into your infrastructure.

Step 2: parsing intent into an allow-listed action

The LLM call is constrained. I give it the exact set of actions it’s allowed to emit and force structured output. The model picks one of my actions; it cannot invent a new one.

const ACTIONS = {
  restart_worker: {
    services: ['payments', 'billing', 'notifications'],
    envs: ['staging', 'sandbox'], // note: prod is intentionally absent here
  },
  scale_deployment: {
    services: ['payments', 'billing'],
    envs: ['staging'],
  },
};

async function parseIntent(text) {
  const system = `You translate ops requests into one of these actions: ${JSON.stringify(ACTIONS)}.
Respond ONLY with JSON: {"action": "...", "service": "...", "env": "...", "confidence": 0-1}.
If nothing matches, return {"action": null}.`;

  const raw = await callModel(system, text); // your Claude/OpenAI/etc. wrapper
  let parsed;
  try {
    parsed = JSON.parse(raw);
  } catch {
    return null;
  }

  // Hard validation in plain code — the LLM does NOT get the final say
  const spec = ACTIONS[parsed.action];
  if (!spec) return null;
  if (!spec.services.includes(parsed.service)) return null;
  if (!spec.envs.includes(parsed.env)) return null;
  if (parsed.confidence < 0.6) return null;

  return parsed;
}

The validation block is the load-bearing wall. Even if the model hallucinates {"action": "rm_rf_everything"}, ACTIONS[parsed.action] is undefined and we bail. The allow-list is your security boundary, not the prompt. Prompts can be jailbroken; a key lookup against a hard-coded object cannot.

Pro Tip: keep prod out of the LLM-reachable allow-list entirely for high-blast-radius actions. If something must touch production, route it to a separate, more heavily gated path with explicit approvals — don’t let a clever sentence get there.

If you want to iterate on the parsing prompt without redeploying, a prompt workspace is handy for testing edge phrasings, and our prompt library has starting points for intent-extraction tasks.

Destructive or stateful actions get a Block Kit modal so the human sees exactly what’s about to happen. This is where the parsed intent becomes human-reviewable.

{
  "type": "modal",
  "callback_id": "ops_confirm",
  "title": { "type": "plain_text", "text": "Confirm action" },
  "submit": { "type": "plain_text", "text": "Run it" },
  "close": { "type": "plain_text", "text": "Cancel" },
  "blocks": [
    {
      "type": "section",
      "text": {
        "type": "mrkdwn",
        "text": "*Action:* restart_worker\n*Service:* payments\n*Env:* staging"
      }
    },
    {
      "type": "context",
      "elements": [
        { "type": "mrkdwn", "text": "Parsed from your message. Review before running." }
      ]
    }
  ]
}

I stash the validated action in the modal’s private_metadata so the execution step doesn’t re-trust anything from the model:

function confirmationModal(proposed, userId) {
  return {
    type: 'modal',
    callback_id: 'ops_confirm',
    private_metadata: JSON.stringify({ ...proposed, userId }),
    title: { type: 'plain_text', text: 'Confirm action' },
    submit: { type: 'plain_text', text: 'Run it' },
    close: { type: 'plain_text', text: 'Cancel' },
    blocks: [/* blocks from above, rendered from proposed */],
  };
}

Step 4: execute on confirmation

Only when the user clicks Run it does anything happen — and we re-validate, because defense in depth beats trust.

app.view('ops_confirm', async ({ ack, view, body, client }) => {
  await ack();
  const action = JSON.parse(view.private_metadata);

  // Re-validate against the allow-list one more time
  const spec = ACTIONS[action.action];
  if (!spec || !spec.services.includes(action.service) || !spec.envs.includes(action.env)) {
    return; // refuse silently or DM the user
  }

  await runAllowlistedAction(action); // your real executor, audited and logged

  await client.chat.postMessage({
    channel: body.user.id,
    text: `Done: ${action.action} on ${action.service}/${action.env}`,
  });
});

Everything funnels through runAllowlistedAction, which is ordinary, reviewed, audit-logged code. The LLM never gets a shell, a token, or a kubeconfig.

Secrets, review, and the things that bite

Three hard rules I won’t bend on. First, never hand the model real tokens or secrets. The LLM sees the user’s text and the action schema — nothing else. Your SLACK_BOT_TOKEN, cloud credentials, and signing secret live in environment variables the model never reads. Second, a human reviews the generated parsing code before it touches a real workspace. If you used an AI assistant to scaffold these handlers — totally reasonable — treat that output like any PR: read every line, especially the validation. Tools like Claude or Cursor will happily generate a handler that looks complete but quietly omits the signature check. Third, always verify webhook signatures.

For anything that overlaps with on-call response, wire your audit trail and approvals into a real workflow — our incident response dashboard is built around exactly this kind of human-in-the-loop gating.

Conclusion

Natural-language ChatOps is one of the most satisfying things you can build into a Slack workspace, but the magic is a thin layer over a very boring, very strict foundation: an allow-list, a confirmation step, signature verification, and a human who owns the result. The LLM translates intent. Your code decides what’s permitted. Keep those responsibilities separate and you get the convenience without betting the cluster on a model’s good mood. Start small, allow-list aggressively, and review every line before it meets a real workspace. More patterns live in the Slack category.