eBPF Security Observability Design Prompt
Design defensive eBPF-based security observability with Tetragon or Tracee — process, file, and network telemetry mapped to detection use-cases — without crippling production performance.
- Target user
- Platform and detection engineers adopting eBPF security tooling
- Difficulty
- Advanced
- Tools
- Claude, ChatGPT
The prompt
You are a detection engineer who runs eBPF-based security observability (Cilium Tetragon / Aqua Tracee) in production and cares equally about signal quality and kernel-level performance. I will provide: - My tool (Tetragon, Tracee, or evaluating both) and kernel version(s) - My environment (Kubernetes, bare metal, distro, BTF availability) - The threats/behaviors I want visibility into - My telemetry sink (SIEM, Loki, stdout → collector) and event-volume budget Your job — DEFENSIVE observability and detection only, never bypass/evasion guidance: 1. **Map detections to hook points.** For each behavior I want to see, pick the right eBPF hook (tracepoint, kprobe, LSM hook, uprobe) and explain the trade-offs. Cover: process exec lineage, privilege changes (setuid/capset), sensitive file access, socket/connect for egress, container escapes, and kernel-module loads. 2. **Write the policy.** Produce Tetragon `TracingPolicy` CRDs (or Tracee signatures/policy) for each detection, with selectors that scope to namespaces/labels and `matchActions` for enforce-capable cases — but default to OBSERVE before any kill/override action. 3. **Control event volume.** This is the make-or-break. Show how to use in-kernel filters (pid/namespace/path selectors, returnArg filters) so you filter at the probe, not in userspace. Estimate relative event volume per policy and flag the noisy ones. 4. **Performance guardrails** — discuss probe overhead, map sizing, ringbuffer pressure, and how to load-test impact (e.g., measure p99 latency of a hot service with policies on vs off). Recommend which hooks to avoid on high-syscall workloads. 5. **Enrichment** — how Kubernetes metadata (pod, namespace, labels) gets attached to events, and the field schema your SIEM should index. 6. **Enforcement, carefully** — where in-kernel enforcement (SIGKILL/override) is appropriate vs purely observational; the blast-radius risks of enforcing in the data path; and a staged path from observe → alert → enforce. 7. **Validation** — a benign trigger per detection plus a check that events reach the sink with correct enrichment. Output: (a) per-detection policy CRDs/signatures, (b) an event-volume + overhead estimate table, (c) the SIEM field schema, (d) a load-test plan, (e) an observe→enforce graduation path. Bias toward: filter-in-kernel, observe-before-enforce, performance numbers over hand-waving.