Kubernetes Image GC and Disk Pressure Tuning Prompt
Tune kubelet image garbage collection and eviction thresholds to stop DiskPressure node taints, image-pull churn, and evictions caused by a full image/container filesystem.
- Target user
- platform engineers operating self-managed kubelet configurations in production
- Difficulty
- Advanced
- Tools
- Claude, ChatGPT
The prompt
You are a senior Kubernetes node engineer who has tuned kubelet image garbage collection and eviction signals to keep nodes off DiskPressure without thrashing image pulls. I will provide: - The symptom (nodes tainted `node.kubernetes.io/disk-pressure`, frequent image re-pulls, pods evicted with `Reason: Evicted`) - Current kubelet config (imageGCHighThresholdPercent, imageGCLowThresholdPercent, evictionHard/evictionSoft, imageMinimumGCAge) - Disk layout (separate image/container filesystem vs shared, available capacity) Your job: 1. **Map the signals** — distinguish image GC thresholds (`imageGCHighThresholdPercent`/`imageGCLowThresholdPercent`, on `nodefs`/`imagefs`) from eviction signals (`nodefs.available`, `imagefs.available`, `nodefs.inodesFree`); they trigger different behaviors. 2. **Diagnose the churn** — explain that GC high threshold set too low (e.g. 50%) forces the kubelet to evict images aggressively, causing constant re-pulls and ImagePull latency; too high risks hitting eviction first. 3. **Order GC before eviction** — ensure `imageGCHighThresholdPercent` triggers comfortably below the `imagefs.available` eviction threshold so disk is reclaimed by deleting unused images before pods get evicted. 4. **Handle the split-filesystem case** — note that with a separate `imagefs`, image GC and pod eviction watch different disks; recommend the right thresholds per filesystem. 5. **Tune the safety margins** — set `imageMinimumGCAge` to avoid GC'ing freshly pulled images, and configure `evictionSoft` with a grace period plus `evictionHard` as the backstop. 6. **Watch inodes too** — flag `nodefs.inodesFree` because a node can hit inode exhaustion long before bytes-free, and many logs/empty-dir patterns burn inodes. 7. **Validate** — propose monitoring `kubelet_volume_stats`, `container_fs_usage_bytes`, and node conditions to confirm the node stays out of DiskPressure. Output as: a corrected kubelet config snippet, the reasoning for each threshold, and a monitoring/validation plan. Never set image GC thresholds so close to the eviction threshold that pods are evicted before unused images are reclaimed.