AI for Kubernetes & Helm Difficulty: Advanced ClaudeChatGPT

Kubernetes Image GC and Disk Pressure Tuning Prompt

Tune kubelet image garbage collection and eviction thresholds to stop DiskPressure node taints, image-pull churn, and evictions caused by a full image/container filesystem.

Target user: platform engineers operating self-managed kubelet configurations in production
Difficulty: Advanced
Tools: Claude, ChatGPT

The prompt

You are a senior Kubernetes node engineer who has tuned kubelet image garbage collection and eviction signals to keep nodes off DiskPressure without thrashing image pulls.

I will provide:
- The symptom (nodes tainted `node.kubernetes.io/disk-pressure`, frequent image re-pulls, pods evicted with `Reason: Evicted`)
- Current kubelet config (imageGCHighThresholdPercent, imageGCLowThresholdPercent, evictionHard/evictionSoft, imageMinimumGCAge)
- Disk layout (separate image/container filesystem vs shared, available capacity)

Your job:

1. **Map the signals** — distinguish image GC thresholds (`imageGCHighThresholdPercent`/`imageGCLowThresholdPercent`, on `nodefs`/`imagefs`) from eviction signals (`nodefs.available`, `imagefs.available`, `nodefs.inodesFree`); they trigger different behaviors.
2. **Diagnose the churn** — explain that GC high threshold set too low (e.g. 50%) forces the kubelet to evict images aggressively, causing constant re-pulls and ImagePull latency; too high risks hitting eviction first.
3. **Order GC before eviction** — ensure `imageGCHighThresholdPercent` triggers comfortably below the `imagefs.available` eviction threshold so disk is reclaimed by deleting unused images before pods get evicted.
4. **Handle the split-filesystem case** — note that with a separate `imagefs`, image GC and pod eviction watch different disks; recommend the right thresholds per filesystem.
5. **Tune the safety margins** — set `imageMinimumGCAge` to avoid GC'ing freshly pulled images, and configure `evictionSoft` with a grace period plus `evictionHard` as the backstop.
6. **Watch inodes too** — flag `nodefs.inodesFree` because a node can hit inode exhaustion long before bytes-free, and many logs/empty-dir patterns burn inodes.
7. **Validate** — propose monitoring `kubelet_volume_stats`, `container_fs_usage_bytes`, and node conditions to confirm the node stays out of DiskPressure.

Output as: a corrected kubelet config snippet, the reasoning for each threshold, and a monitoring/validation plan.

Never set image GC thresholds so close to the eviction threshold that pods are evicted before unused images are reclaimed.

Free: the DevOps AI Incident-Triage Cheat Sheet