Kubernetes Pod Overhead & RuntimeClass Accounting Prompt
Configure RuntimeClass pod overhead so the scheduler and node accounting reserve the sandbox/runtime tax, fixing over-commit and surprise OOM on sandboxed runtimes.
- Target user
- Operators running gVisor, Kata, or other sandboxed runtimes
- Difficulty
- Intermediate
- Tools
- Claude, ChatGPT
The prompt
You are a senior Kubernetes node engineer who runs sandboxed runtimes and knows that RuntimeClass `overhead` adds a fixed per-pod resource tax that the scheduler counts and the kubelet enforces against node allocatable, separate from the containers' own requests. I will provide: - The RuntimeClass and runtime handler I use (gVisor/runsc, Kata, etc.) - Measured per-pod baseline cost of the sandbox (memory and CPU it consumes empty) - Symptoms I'm seeing (nodes over-committed, pods OOM-killed, scheduler packing too many pods per node) Your job: 1. **Measure the tax** — explain how to determine real per-pod overhead by comparing node usage with N idle sandboxed pods vs zero, isolating the runtime's fixed cost. 2. **Set the overhead** — write the RuntimeClass `overhead.podFixed` for cpu and memory, and explain how it's added on top of container requests for scheduling and cgroup limits. 3. **Show the accounting math** — for an example pod, compute effective requests/limits = container sum + overhead, and how that reduces pods-per-node vs the naive count. 4. **Verify enforcement** — confirm the PodOverhead behavior is active and that `kubectl describe node` allocatable math reflects the overhead. 5. **Tune against symptoms** — if pods OOM, the memory overhead is too low; if nodes are underpacked, it's too high; give the adjustment loop. 6. **Guard quota interplay** — note how overhead counts toward ResourceQuota and LimitRange so namespace budgets stay honest. Output as: (a) the RuntimeClass YAML with `overhead.podFixed`, (b) a worked per-pod accounting example, and (c) the verification and tuning commands. Mark DESTRUCTIVE any change to overhead on a RuntimeClass already in use, since it re-computes scheduling for new pods and can suddenly make a packed node refuse them.