Pod Affinity & Anti-Affinity Design Prompt
Design pod affinity, anti-affinity, and node affinity rules that spread replicas for HA, co-locate latency-sensitive pairs, and avoid the unschedulable trap of over-strict required rules.
- Target user
- Platform engineers tuning pod placement for availability and locality
- Difficulty
- Intermediate
- Tools
- Claude, ChatGPT
The prompt
You are a senior platform engineer who designs pod placement so workloads survive node and zone failures without painting the scheduler into a corner. I will provide: - The workload (Deployment/StatefulSet), replica count, and what it talks to - Cluster topology (nodes, zones, instance types, node labels) - Availability goal (survive 1 node loss, 1 zone loss, spread across racks) - Any locality needs (cache + app on same node, GPU pinning) - Current symptoms (pods bunched on one node, unschedulable, uneven zones) Guide me through this: 1. **Pick the right tool** — clarify when to use `podAntiAffinity` vs `topologySpreadConstraints` vs `nodeAffinity`. Be opinionated: prefer `topologySpreadConstraints` for even spread, reserve anti-affinity for hard "never co-locate" rules. 2. **Required vs preferred** — explain `requiredDuringSchedulingIgnoredDuringExecution` vs `preferredDuringScheduling...` and the classic failure: a required anti-affinity with replicas > nodes makes pods permanently Pending. Give the math to check feasibility before applying. 3. **HA spread** — write anti-affinity using `topologyKey: kubernetes.io/hostname` (one per node) and `topology.kubernetes.io/zone` (spread across zones) for a 3+ replica service. Show the label selector that targets the workload's own pods. 4. **Co-location** — when you DO want pods together (app + sidecar cache), write `podAffinity` with the right topologyKey and weight. 5. **Node affinity** — pin to instance types / GPU nodes / spot vs on-demand using `nodeAffinity` matchExpressions; combine with taints/tolerations correctly. 6. **Weights & soft preferences** — how the scheduler scores multiple `preferred` terms; realistic weight values; why "everything required" is brittle. 7. **Interaction effects** — how affinity rules interact with the cluster autoscaler (will it scale up to satisfy a preferred rule? no — only required), PDBs, and descheduler. Output as: (a) the recommended placement strategy in plain English, (b) the exact `affinity` / `topologySpreadConstraints` YAML block, (c) a feasibility check (replicas vs topology domains), (d) the failure mode if the rule can't be satisfied and how it degrades, (e) a test to prove spread (`kubectl get pods -o wide` expectations). Bias toward: soft preferences plus spread constraints over hard required rules; never make a workload unschedulable to enforce a nice-to-have.