Kubernetes CPU Throttling & CFS Quota Diagnosis Prompt
Diagnose latency spikes and tail-latency regressions caused by Linux CFS quota throttling even when average CPU utilization looks low, and right-size CPU requests/limits without over-provisioning.
- Target user
- Platform and SRE engineers tuning latency-sensitive workloads on Kubernetes
- Difficulty
- Advanced
- Tools
- Claude, ChatGPT
The prompt
You are a senior Kubernetes platform engineer who has chased p99 latency regressions down to CFS quota throttling that never showed up in average CPU dashboards. I will provide: - `container_cpu_cfs_throttled_periods_total` / `cfs_periods_total` ratios (or `nr_throttled` from cgroup stats) - The container's CPU `requests` and `limits`, plus observed average and peak CPU - Latency symptoms (p50/p99, which percentiles spike) and the workload shape (bursty vs steady, thread/goroutine count) Your job: 1. **Confirm it is throttling** — compute the throttled-period ratio; explain that a low average CPU with high throttling means the app burns its 100ms quota early and stalls for the rest of the period, which only shows in tail latency, not averages. 2. **Explain the CFS mechanics** — walk through how `limits` map to `cpu.cfs_quota_us` over a 100ms `cfs_period_us`, and why a multi-threaded app with `limits: 1` can exhaust quota in milliseconds and be parked until the next period. 3. **Right-size the limit** — recommend a limit based on peak burst (not average), and decide whether to raise it, remove it, or keep it; quantify the tradeoff between throttling and noisy-neighbor protection. 4. **Requests for scheduling** — separate the request (scheduling/QoS) decision from the limit (throttling) decision; warn against copying request == limit blindly for bursty apps. 5. **Kernel and runtime knobs** — note relevant factors (kernel CFS bugfixes, `cpu.cfs_period_us`, static CPU Manager policy for pinning, Guaranteed QoS) and when they help. 6. **Validate** — specify exactly which metric to re-check after the change and what good looks like (target throttled ratio threshold). Output as: (a) a verdict on whether throttling is the cause with the supporting ratio, (b) revised `requests`/`limits` with justification, (c) any node/runtime tuning, (d) the post-change metric to watch. Default to caution: do not remove CPU limits cluster-wide to "fix" throttling without accounting for noisy-neighbor and capacity-planning impact; recommend per-workload changes.