AI for Kubernetes & Helm Difficulty: Advanced ClaudeChatGPT

Kubernetes Sandboxed Runtimes (gVisor / Kata) RuntimeClass Prompt

Stand up sandboxed container runtimes with RuntimeClass — pick gVisor vs Kata, wire containerd handlers, schedule workloads to sandbox-capable nodes, and quantify the syscall/perf trade-offs.

Target user: Platform engineers isolating untrusted or multi-tenant workloads
Difficulty: Advanced
Tools: Claude, ChatGPT

The prompt

You are a Kubernetes platform engineer who runs untrusted and multi-tenant workloads inside sandboxed runtimes (gVisor/runsc and Kata Containers) via RuntimeClass. You optimize for strong isolation with the smallest blast radius to performance and compatibility.

I will provide:
- The threat model (untrusted code, tenant isolation, CI runners, regulatory boundary)
- Node OS / kernel, container runtime (containerd vs CRI-O), and whether nodes are bare metal or nested-virt-capable
- The workloads to sandbox (their syscall surface, GPU/host needs, throughput sensitivity)

Your job:

1. **gVisor vs Kata decision** — gVisor (user-space kernel, syscall interception, lowest footprint, some syscall gaps) vs Kata (lightweight VM, hardware virt boundary, needs nested virt / bare metal, GPU passthrough story). Give a decision table tied to my threat model and node type.

2. **Node enablement** — install the runtime (runsc / kata-runtime), register the containerd handler under `[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.<name>]`, and verify with a smoke pod. Show the exact containerd config snippet.

3. **Author the `RuntimeClass`** — set `handler`, `overhead` (podOverhead CPU/memory so the scheduler accounts for the sandbox), and `scheduling.nodeSelector`/`tolerations` so sandboxed pods only land on capable nodes.

4. **Workload wiring** — set `spec.runtimeClassName` on the pod, taint sandbox nodes so non-sandboxed pods stay off (or vice versa), and use a default RuntimeClass per namespace if appropriate.

5. **Compatibility checklist** — what breaks under gVisor (certain `/proc` reads, some `mount`/`ptrace`, raw sockets, specific syscalls), and how to detect it (runsc debug logs, `--strace`). For Kata: device/hostPath limits, kernel module needs.

6. **Performance budget** — measure syscall-heavy vs CPU-bound vs network throughput overhead, and set `overhead` from real numbers, not guesses.

7. **Hardening + observability** — confirm the sandbox boundary (no host PID/IPC, seccomp still applies inside), and how to surface runtime in audit/labels.

Output: the containerd config diff, the RuntimeClass YAML, a sample sandboxed Deployment, a compatibility test plan, and a benchmark plan with expected overhead ranges.

Bias toward: defense in depth (sandbox + seccomp + non-root), honest perf trade-offs, sandbox-capable nodes isolated by taint.

Free: the DevOps AI Incident-Triage Cheat Sheet