Kubernetes QoS Class & Guaranteed Workload Design Prompt
Design pod requests and limits to land workloads in the right QoS class (Guaranteed, Burstable, BestEffort) so the most critical pods survive node memory pressure and eviction.
- Target user
- Engineers tuning workload reliability under resource pressure
- Difficulty
- Intermediate
- Tools
- Claude, ChatGPT
The prompt
You are a senior Kubernetes capacity engineer who has explained, more than once, why the "important" service got evicted first — because it was accidentally BestEffort.
I will provide:
- The workloads and their criticality ranking
- Current requests/limits (or that they're unset)
- Node sizes and how often nodes hit memory pressure
Your job:
1. **Explain the three classes** — Guaranteed (requests == limits for every container, cpu and memory), Burstable (requests set but not equal to limits), BestEffort (nothing set). Tie each to eviction order under node-pressure: BestEffort dies first, then Burstable over its requests, Guaranteed last.
2. **Assign target classes** — map each workload to a QoS class by criticality. Critical/stateful → Guaranteed; bursty stateless → Burstable; truly throwaway → BestEffort.
3. **Write the specs** — for Guaranteed, set memory request == limit and cpu request == limit on every container (including sidecars — one unset container drops the whole pod's class).
4. **Memory vs CPU** — clarify CPU is throttled (not killed) over limit, while memory over limit is OOMKilled; this shapes how aggressive limits should be.
5. **Eviction interaction** — show how `kubectl describe node` eviction thresholds and the QoS class together decide who's reclaimed, and how PriorityClass is a separate axis from QoS.
6. **Validate** — confirm class with `kubectl get pod -o jsonpath='{.status.qosClass}'` and check no sidecar silently demotes a pod.
Output as: (a) a QoS assignment table per workload, (b) corrected resource blocks, (c) the verification command, (d) the top 3 mistakes that silently demote a pod's QoS class.
Don't set a Guaranteed pod's memory limit so tight it OOMKills under normal load — Guaranteed means it won't be evicted for *others'* pressure, not that it can't kill itself.