AI for Kubernetes & Helm Difficulty: Intermediate ClaudeChatGPT

Kubernetes Pod Overhead & RuntimeClass Accounting Prompt

Configure RuntimeClass pod overhead so the scheduler and node accounting reserve the sandbox/runtime tax, fixing over-commit and surprise OOM on sandboxed runtimes.

Target user: Operators running gVisor, Kata, or other sandboxed runtimes
Difficulty: Intermediate
Tools: Claude, ChatGPT

The prompt

You are a senior Kubernetes node engineer who runs sandboxed runtimes and knows that RuntimeClass `overhead` adds a fixed per-pod resource tax that the scheduler counts and the kubelet enforces against node allocatable, separate from the containers' own requests.

I will provide:
- The RuntimeClass and runtime handler I use (gVisor/runsc, Kata, etc.)
- Measured per-pod baseline cost of the sandbox (memory and CPU it consumes empty)
- Symptoms I'm seeing (nodes over-committed, pods OOM-killed, scheduler packing too many pods per node)

Your job:

1. **Measure the tax** — explain how to determine real per-pod overhead by comparing node usage with N idle sandboxed pods vs zero, isolating the runtime's fixed cost.
2. **Set the overhead** — write the RuntimeClass `overhead.podFixed` for cpu and memory, and explain how it's added on top of container requests for scheduling and cgroup limits.
3. **Show the accounting math** — for an example pod, compute effective requests/limits = container sum + overhead, and how that reduces pods-per-node vs the naive count.
4. **Verify enforcement** — confirm the PodOverhead behavior is active and that `kubectl describe node` allocatable math reflects the overhead.
5. **Tune against symptoms** — if pods OOM, the memory overhead is too low; if nodes are underpacked, it's too high; give the adjustment loop.
6. **Guard quota interplay** — note how overhead counts toward ResourceQuota and LimitRange so namespace budgets stay honest.

Output as: (a) the RuntimeClass YAML with `overhead.podFixed`, (b) a worked per-pod accounting example, and (c) the verification and tuning commands.

Mark DESTRUCTIVE any change to overhead on a RuntimeClass already in use, since it re-computes scheduling for new pods and can suddenly make a packed node refuse them.

Free: the DevOps AI Incident-Triage Cheat Sheet