Writing Sigma Detection Rules with AI Without Drowning in

The day I realized our detection coverage was a pile of vendor-specific queries nobody could read across teams, I went looking for a portable format. Sigma is that format: a YAML-based, vendor-neutral way to write detection logic once and convert it to whatever your SIEM speaks, whether that’s Elastic, Splunk, or something self-hosted. The promise is detection-as-code you can version, review, and share. The reality is that writing a good Sigma rule, one that fires on real attacks and stays quiet otherwise, is a craft that takes iteration.

That iteration is where AI earns its keep. I treat the model as a fast junior detection engineer: it drafts the rule, suggests field mappings, and helps me reason about false positives. But I verify every rule against real (sanitized) log samples before it goes live, because a noisy rule that everyone learns to ignore is worse than no rule at all. This is defensive detection engineering only. I describe log structure and substitute placeholders rather than pasting raw production logs full of internal detail.

Start from the behavior, not the field names

The mistake I see in AI-drafted rules is jumping straight to field names that may not match your schema. I anchor the model in the behavior first:

Write a Sigma rule that detects a process spawning a shell from a web server parent process (a common web-shell signal). Use the standard Sigma process_creation logsource. Explain what each condition catches. Defensive detection only.

A clean draft looks like this:

title: Shell Spawned by Web Server Process
status: experimental
logsource:
  category: process_creation
  product: linux
detection:
  selection:
    ParentImage|endswith:
      - '/nginx'
      - '/apache2'
      - '/php-fpm'
    Image|endswith:
      - '/sh'
      - '/bash'
  condition: selection
falsepositives:
  - Legitimate CGI scripts that shell out
level: high

I read every line. The falsepositives block is not boilerplate; it’s the part that decides whether anyone trusts the rule. I make the AI fill it in honestly and I add the ones it misses from my own environment.

Tune against real noise before you trust it

A rule that looks correct on paper can be a pager nightmare in practice. The honest test is running the logic against a window of real, sanitized logs and counting hits. I export a sample, scrub it, and ask the AI to predict matches:

Here is a sanitized sample of process-creation events (no hostnames, no usernames, just the fields the rule uses). Walk through which events this Sigma rule would match and why. Identify benign patterns that would trigger it.

When it flags a benign cron job or a legitimate deploy script that shells out, I refine the selection or add a filter block:

detection:
  selection:
    ParentImage|endswith: '/php-fpm'
    Image|endswith: '/sh'
  filter_deploy:
    CommandLine|contains: '/opt/deploy/healthcheck.sh'
  condition: selection and not filter_deploy

Pro Tip: tune by excluding specific known-good paths and command lines, not by broadening the selection. Every time you loosen the trigger to silence one false positive, you risk silencing the real attack too. Filters are surgical; loosened selections are blunt.

Map fields to your actual schema

Sigma’s portability depends on field names matching what your pipeline produces. If your log shipper calls the parent process proc.pname instead of ParentImage, the rule converts to a query that matches nothing and fails silently, which is the worst kind of failure in detection. I give the AI my field taxonomy and have it produce a mapping:

Here is my SIEM’s field schema for process events. Map the Sigma standard fields used in this rule to my field names, and tell me which ones have no equivalent so I know the coverage gaps.

Then I convert with sigma-cli and read the output query, never assuming the conversion is correct:

sigma convert -t elasticsearch -p ecs_windows rules/web_shell.yml

If a field didn’t map, the query reveals it, and a query that can’t match anything is its own bug to fix.

Cover the technique, not just one tool

A rule that detects exactly one binary is brittle; attackers swap tools. I ask the model to generalize toward the technique. For credential-dumping or persistence, that means thinking about the behavior an attacker can’t avoid rather than the specific utility they happened to use this week. The AI is genuinely useful here for enumerating variants, but I cross-check its suggestions against the MITRE ATT&CK technique it claims to cover, because models will occasionally invent a plausible-sounding mapping that doesn’t hold up.

Keep rules in version control and review them like code

Detection-as-code only pays off if you treat it like code. I keep rules in a repo, require review on changes, and run a Sigma linter in CI to catch malformed YAML before it reaches the SIEM. When the AI drafts or edits a rule, the diff goes through review like any other change. I route larger rule batches through the code review dashboard so the detection logic gets a second set of eyes and inline comments before merge, rather than the model committing anything itself.

When a rule fires in production and we work the resulting alert, the triage and timeline live in the incident response dashboard, and the lessons feed back into rule tuning.

Build a tuning loop, not a one-shot

The rules that survive are the ones that go through cycles: draft, test against sanitized logs, deploy in alert-only mode, watch the false-positive rate, tighten. The AI accelerates each cycle but never closes the loop on its own. I keep the detection-engineering prompts in our prompts library, with the security set packaged in the DevOps security prompt pack. For the drafting and field-mapping work I’ve leaned on Claude, which handles structured YAML and explanation well.

The takeaway

Sigma gives you portable, reviewable detection logic, and AI makes drafting and tuning those rules dramatically faster. Treat the model as the fast junior detection engineer that proposes rules and reasons about false positives, while you verify every rule against real sanitized logs, confirm the field mappings, and own what goes live. Keep raw production logs and secrets out of the prompts, keep the rules in version control, and tune by exclusion rather than by loosening. The rest of the security hardening category covers the logging and audit foundations your detections depend on.

Writing Sigma Detection Rules with AI Without Drowning in False Positives