Skip to content
DevOps AI ToolKit
Newsletter
All guides
AI for DevOps Security & Hardening By James Joyner IV · · 9 min read

Runtime Threat Detection with Falco: Catching the Breach as It Happens

Scanning catches bad images before they run. Falco catches bad behavior while they run. Here's how to deploy runtime detection that flags the breach in real time without alert fatigue.

  • #security
  • #hardening
  • #falco
  • #runtime
  • #detection
  • #kubernetes

Most security tooling looks at your systems before anything runs: image scanners, policy checks, IaC linters. That’s necessary, but it’s all pre-flight. It tells you nothing about what’s happening at 03:00 when an attacker who got past all of it is spawning a shell inside a container and reading /etc/shadow. For that you need runtime detection — eyes on actual behavior, in real time — and Falco is the open-source standard for it.

The pitch is simple: Falco watches the syscalls your workloads make and fires when behavior matches a rule. A container that suddenly spawns bash, writes to a binary directory, or opens an outbound connection to an unexpected host is doing something, and doing is where you catch the attack that signatures missed.

How Falco sees

Falco taps into the kernel — historically via a kernel module, now preferably via a modern eBPF probe — and streams every syscall the host makes. It enriches those events with container and Kubernetes context (which pod, namespace, image) so an alert says “the payments pod spawned an unexpected shell” instead of “PID 4471 called execve.”

Install it via Helm with the eBPF driver, which avoids the headaches of compiling kernel modules across a heterogeneous fleet:

helm repo add falcosecurity https://falcosecurity.github.io/charts
helm install falco falcosecurity/falco \
  --namespace falco --create-namespace \
  --set driver.kind=modern_ebpf

Within minutes it’s watching every node and emitting events against its default ruleset.

The rules that earn their keep

Falco ships with a solid default ruleset, but the high-signal ones are worth knowing because they map directly to attacker behavior:

  • Shell in a container. Production containers should almost never spawn an interactive shell. When one does, it’s either a human debugging in prod (a process problem) or an intruder (a security problem). Either way you want to know.
  • Write below a read-only path. Writes to /bin, /usr/bin, or /etc inside a running container often mean someone’s planting a tool or backdoor.
  • Reading sensitive files. Access to /etc/shadow, cloud credential files, or service-account tokens by a process that has no business reading them.
  • Unexpected outbound connections. A web server suddenly talking to a random IP on a high port looks a lot like C2 or exfiltration.

A custom rule reads clearly enough to review like code:

- rule: Shell spawned in payments container
  desc: An interactive shell started in a production payments pod
  condition: >
    spawned_process and container and
    proc.name in (bash, sh, zsh) and
    k8s.ns.name = "payments"
  output: >
    Shell in payments container
    (pod=%k8s.pod.name image=%container.image.repository
     cmd=%proc.cmdline user=%user.name)
  priority: WARNING
  tags: [container, shell, mitre_execution]

The tags mapping to MITRE ATT&CK technique IDs is worth doing consistently — it lets you talk about coverage in terms auditors and the security team already use.

Tuning before alert fatigue kills it

Here’s the failure mode that kills most Falco deployments: out of the box it’s noisy, the alerts fire constantly, the channel gets muted within a week, and now you have a detection tool that nobody reads. A muted alert is worse than no alert, because it gives you false confidence.

So tuning isn’t optional, it’s the whole job. The pattern:

  1. Run in observe mode first. Send everything to a log, not to the on-call channel, for a couple of weeks.
  2. Find your noise. Your CI runners legitimately spawn shells. Your backup job legitimately reads sensitive paths. Your service mesh legitimately makes odd connections. These are expected and need exceptions.
  3. Write macros for known-good behavior, then exclude them:
- macro: known_shell_callers
  condition: (k8s.pod.name startswith "ci-runner" or
              k8s.pod.name startswith "debug-")

- rule: Unexpected shell in container
  condition: spawned_process and container and
             proc.name in (shell_binaries) and
             not known_shell_callers
  output: "Unexpected shell (pod=%k8s.pod.name cmd=%proc.cmdline)"
  priority: WARNING
  1. Only then route to humans. Once the noise is gone, a Falco alert means something, and people will act on it.

The goal is a channel where every alert is worth a human’s attention. That’s a high bar and it takes iteration, but it’s the only version of runtime detection that actually works.

From alert to response

Detection without response is just expensive logging. Wire Falco’s output through Falcosidekick to fan out to Slack, your SIEM, and an automated responder:

helm upgrade falco falcosecurity/falco \
  --set falcosidekick.enabled=true \
  --set falcosidekick.config.slack.webhookurl="$SLACK_WEBHOOK"

For the highest-severity rules — say, a write to a system binary in production — you can trigger automated containment: cordon the node, kill the pod, or apply a deny-all network policy to the affected workload while a human investigates. Be conservative here; auto-remediation that’s wrong causes its own outages, so reserve it for high-confidence rules and start with “alert loudly” before “act automatically.”

Where it fits

Runtime detection is the layer that assumes your other controls failed. Image scanning, admission policy, and least-privilege all reduce the odds of a breach; Falco is what tells you when one happened anyway. It pairs naturally with the rest of a defense-in-depth posture — the broader security hardening guides cover the preventive layers, and reviewing new Falco rules through automated code review keeps a typo from silently disabling a detection.

Deploy it in observe mode this week, spend two weeks learning your environment’s normal, tune ruthlessly, then turn on the alerts you trust. The first time it catches a real shell in production, the effort pays for itself entirely.

Falco rules and remediation actions are starting points. Tune detections against your real workloads in observe mode before alerting, and test any auto-remediation carefully to avoid self-inflicted outages.

Free download · 368-page PDF

Download the Free 500-Prompt DevOps AI Toolkit

500 battle-tested, copy-paste AI prompts engineered by a senior systems engineer — every one with fill-in placeholders and safety/back-out notes. Drop your email and it's yours.

  • 500 prompts: Linux · Kubernetes · Terraform · OpenStack · GitLab · Docker · Monitoring · Incident Response
  • Instant PDF download — yours free, forever
  • Plus one practical AI-workflow email a week (no spam)

Single opt-in · unsubscribe anytime · no spam.