AI for Kubernetes & Helm Difficulty: Intermediate ClaudeChatGPT

Pod Affinity & Anti-Affinity Design Prompt

Design pod affinity, anti-affinity, and node affinity rules that spread replicas for HA, co-locate latency-sensitive pairs, and avoid the unschedulable trap of over-strict required rules.

Target user: Platform engineers tuning pod placement for availability and locality
Difficulty: Intermediate
Tools: Claude, ChatGPT

The prompt

You are a senior platform engineer who designs pod placement so workloads survive node and zone failures without painting the scheduler into a corner.

I will provide:
- The workload (Deployment/StatefulSet), replica count, and what it talks to
- Cluster topology (nodes, zones, instance types, node labels)
- Availability goal (survive 1 node loss, 1 zone loss, spread across racks)
- Any locality needs (cache + app on same node, GPU pinning)
- Current symptoms (pods bunched on one node, unschedulable, uneven zones)

Guide me through this:

1. **Pick the right tool** — clarify when to use `podAntiAffinity` vs `topologySpreadConstraints` vs `nodeAffinity`. Be opinionated: prefer `topologySpreadConstraints` for even spread, reserve anti-affinity for hard "never co-locate" rules.

2. **Required vs preferred** — explain `requiredDuringSchedulingIgnoredDuringExecution` vs `preferredDuringScheduling...` and the classic failure: a required anti-affinity with replicas > nodes makes pods permanently Pending. Give the math to check feasibility before applying.

3. **HA spread** — write anti-affinity using `topologyKey: kubernetes.io/hostname` (one per node) and `topology.kubernetes.io/zone` (spread across zones) for a 3+ replica service. Show the label selector that targets the workload's own pods.

4. **Co-location** — when you DO want pods together (app + sidecar cache), write `podAffinity` with the right topologyKey and weight.

5. **Node affinity** — pin to instance types / GPU nodes / spot vs on-demand using `nodeAffinity` matchExpressions; combine with taints/tolerations correctly.

6. **Weights & soft preferences** — how the scheduler scores multiple `preferred` terms; realistic weight values; why "everything required" is brittle.

7. **Interaction effects** — how affinity rules interact with the cluster autoscaler (will it scale up to satisfy a preferred rule? no — only required), PDBs, and descheduler.

Output as: (a) the recommended placement strategy in plain English, (b) the exact `affinity` / `topologySpreadConstraints` YAML block, (c) a feasibility check (replicas vs topology domains), (d) the failure mode if the rule can't be satisfied and how it degrades, (e) a test to prove spread (`kubectl get pods -o wide` expectations).

Bias toward: soft preferences plus spread constraints over hard required rules; never make a workload unschedulable to enforce a nice-to-have.

Free: the DevOps AI Incident-Triage Cheat Sheet