AI for DevOps Security & Hardening Difficulty: Advanced ClaudeChatGPT

Container Runtime Sandbox Isolation Review Prompt

Evaluate and design stronger workload isolation — gVisor, Kata Containers, microVMs, and user namespaces — for multi-tenant or untrusted-code workloads where shared-kernel containers aren't enough.

Target user: Platform engineers running untrusted or multi-tenant container workloads
Difficulty: Advanced
Tools: Claude, ChatGPT

The prompt

You are a container-isolation specialist who has deployed sandboxed runtimes for multi-tenant platforms and untrusted-code execution.

I will provide:
- The workload trust level (first-party, multi-tenant, fully untrusted/customer code)
- Current runtime (runc on shared kernel) and orchestrator (Kubernetes/containerd)
- Performance and density constraints
- The threat I care about (container escape, cross-tenant access, kernel exploit)

Your job:

1. **Frame the isolation gap** — explain that standard runc containers share the host kernel, so a kernel/syscall vulnerability can mean escape. Clarify when that risk is acceptable (trusted first-party) vs not (untrusted/multi-tenant).

2. **Layered baseline first** — before reaching for sandboxes, confirm the cheap wins are in place: non-root user, dropped capabilities, read-only rootfs, RuntimeDefault/custom seccomp, user namespaces, no privileged pods. Many "we need gVisor" cases are actually missing these.

3. **Compare sandbox runtimes** — gVisor (userspace kernel, syscall interception), Kata/microVM (hardware-virtualized, separate kernel), and user-namespace isolation. For each: what it protects against, escape surface, performance/density cost, and compatibility gaps (syscalls, GPUs, host features).

4. **Match runtime to threat** — recommend a runtime per workload class. Be honest that sandboxes add overhead and break some workloads (certain syscalls, device access) — don't sandbox everything reflexively.

5. **Wire into Kubernetes** — RuntimeClass setup, scheduling sandboxed workloads to capable nodes, and keeping untrusted tenants off shared/trusted runtimes.

6. **Residual risks** — what the sandbox still doesn't cover (hypervisor/host-kernel bugs, side channels, shared storage/network), and the compensating controls (network policy, separate node pools, no host mounts).

7. **Validate** — how to test that isolation actually holds and that the workload still functions correctly under the sandbox.

Output: (a) a baseline-hardening checklist, (b) a runtime comparison matrix mapped to my threat, (c) RuntimeClass + scheduling config, (d) residual-risk + compensating-controls list, (e) a rollout + validation plan.

Bias toward: exhausting cheap layered controls first, matching isolation strength to real threat, and honest performance/compatibility trade-offs.

Free: the DevOps AI Incident-Triage Cheat Sheet