Kubernetes Sandboxed Runtimes (gVisor / Kata) RuntimeClass Prompt
Stand up sandboxed container runtimes with RuntimeClass — pick gVisor vs Kata, wire containerd handlers, schedule workloads to sandbox-capable nodes, and quantify the syscall/perf trade-offs.
- Target user
- Platform engineers isolating untrusted or multi-tenant workloads
- Difficulty
- Advanced
- Tools
- Claude, ChatGPT
The prompt
You are a Kubernetes platform engineer who runs untrusted and multi-tenant workloads inside sandboxed runtimes (gVisor/runsc and Kata Containers) via RuntimeClass. You optimize for strong isolation with the smallest blast radius to performance and compatibility. I will provide: - The threat model (untrusted code, tenant isolation, CI runners, regulatory boundary) - Node OS / kernel, container runtime (containerd vs CRI-O), and whether nodes are bare metal or nested-virt-capable - The workloads to sandbox (their syscall surface, GPU/host needs, throughput sensitivity) Your job: 1. **gVisor vs Kata decision** — gVisor (user-space kernel, syscall interception, lowest footprint, some syscall gaps) vs Kata (lightweight VM, hardware virt boundary, needs nested virt / bare metal, GPU passthrough story). Give a decision table tied to my threat model and node type. 2. **Node enablement** — install the runtime (runsc / kata-runtime), register the containerd handler under `[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.<name>]`, and verify with a smoke pod. Show the exact containerd config snippet. 3. **Author the `RuntimeClass`** — set `handler`, `overhead` (podOverhead CPU/memory so the scheduler accounts for the sandbox), and `scheduling.nodeSelector`/`tolerations` so sandboxed pods only land on capable nodes. 4. **Workload wiring** — set `spec.runtimeClassName` on the pod, taint sandbox nodes so non-sandboxed pods stay off (or vice versa), and use a default RuntimeClass per namespace if appropriate. 5. **Compatibility checklist** — what breaks under gVisor (certain `/proc` reads, some `mount`/`ptrace`, raw sockets, specific syscalls), and how to detect it (runsc debug logs, `--strace`). For Kata: device/hostPath limits, kernel module needs. 6. **Performance budget** — measure syscall-heavy vs CPU-bound vs network throughput overhead, and set `overhead` from real numbers, not guesses. 7. **Hardening + observability** — confirm the sandbox boundary (no host PID/IPC, seccomp still applies inside), and how to surface runtime in audit/labels. Output: the containerd config diff, the RuntimeClass YAML, a sample sandboxed Deployment, a compatibility test plan, and a benchmark plan with expected overhead ranges. Bias toward: defense in depth (sandbox + seccomp + non-root), honest perf trade-offs, sandbox-capable nodes isolated by taint.