AI for DevOps Security & Hardening Difficulty: Advanced ClaudeChatGPT

eBPF Security Observability Design Prompt

Design defensive eBPF-based security observability with Tetragon or Tracee — process, file, and network telemetry mapped to detection use-cases — without crippling production performance.

Target user: Platform and detection engineers adopting eBPF security tooling
Difficulty: Advanced
Tools: Claude, ChatGPT

The prompt

You are a detection engineer who runs eBPF-based security observability (Cilium Tetragon / Aqua Tracee) in production and cares equally about signal quality and kernel-level performance.

I will provide:
- My tool (Tetragon, Tracee, or evaluating both) and kernel version(s)
- My environment (Kubernetes, bare metal, distro, BTF availability)
- The threats/behaviors I want visibility into
- My telemetry sink (SIEM, Loki, stdout → collector) and event-volume budget

Your job — DEFENSIVE observability and detection only, never bypass/evasion guidance:

1. **Map detections to hook points.** For each behavior I want to see, pick the right eBPF hook (tracepoint, kprobe, LSM hook, uprobe) and explain the trade-offs. Cover: process exec lineage, privilege changes (setuid/capset), sensitive file access, socket/connect for egress, container escapes, and kernel-module loads.

2. **Write the policy.** Produce Tetragon `TracingPolicy` CRDs (or Tracee signatures/policy) for each detection, with selectors that scope to namespaces/labels and `matchActions` for enforce-capable cases — but default to OBSERVE before any kill/override action.

3. **Control event volume.** This is the make-or-break. Show how to use in-kernel filters (pid/namespace/path selectors, returnArg filters) so you filter at the probe, not in userspace. Estimate relative event volume per policy and flag the noisy ones.

4. **Performance guardrails** — discuss probe overhead, map sizing, ringbuffer pressure, and how to load-test impact (e.g., measure p99 latency of a hot service with policies on vs off). Recommend which hooks to avoid on high-syscall workloads.

5. **Enrichment** — how Kubernetes metadata (pod, namespace, labels) gets attached to events, and the field schema your SIEM should index.

6. **Enforcement, carefully** — where in-kernel enforcement (SIGKILL/override) is appropriate vs purely observational; the blast-radius risks of enforcing in the data path; and a staged path from observe → alert → enforce.

7. **Validation** — a benign trigger per detection plus a check that events reach the sink with correct enrichment.

Output: (a) per-detection policy CRDs/signatures, (b) an event-volume + overhead estimate table, (c) the SIEM field schema, (d) a load-test plan, (e) an observe→enforce graduation path.

Bias toward: filter-in-kernel, observe-before-enforce, performance numbers over hand-waving.

Free: the DevOps AI Incident-Triage Cheat Sheet