Skip to content
DevOps AI ToolKit
Newsletter
All guides
AI for Kubernetes & Helm By James Joyner IV · · 8 min read

Spreading Pods Across Nodes and Zones With Topology Spread Constraints

Three replicas on one node is not high availability. Topology spread constraints force Kubernetes to distribute pods across failure domains.

  • #kubernetes
  • #scheduling
  • #availability
  • #topology
  • #affinity
  • #reliability

Adding replicas feels like buying availability, but the scheduler doesn’t owe you distribution. I’ve seen a five-replica deployment where four pods landed on the same node because that node had the most free capacity — and when it failed, 80% of the service went with it. Topology spread constraints are how you tell Kubernetes that where pods run matters as much as how many. They’re the cleaner successor to pod anti-affinity, and once you understand them you’ll reach for them on every important workload.

The failure domains you actually care about

“High availability” is meaningless without naming the failure domain. The ones that matter:

  • Node — a single machine dies (hardware, kernel panic, drain).
  • Zone — an entire availability zone goes down (power, network).
  • Region — a whole region outage (rare, but real).

Nodes carry standard topology labels — kubernetes.io/hostname and topology.kubernetes.io/zone — and spread constraints use those labels to define the domain you want to spread across.

A constraint that spreads across nodes

apiVersion: apps/v1
kind: Deployment
metadata:
  name: api
spec:
  replicas: 6
  template:
    spec:
      topologySpreadConstraints:
      - maxSkew: 1
        topologyKey: kubernetes.io/hostname
        whenUnsatisfiable: DoNotSchedule
        labelSelector:
          matchLabels:
            app: api
      containers:
      - name: api
        image: api:1.4.0

Three knobs do the work:

  • topologyKey — the label that defines a domain. Here, each node is a domain.
  • maxSkew — the maximum allowed difference in pod count between the most- and least-populated domains. 1 means the spread must be near-perfectly even.
  • whenUnsatisfiable — what to do if the constraint can’t be met.

With maxSkew: 1 across hostnames, six replicas distribute as evenly as the node count allows instead of piling onto whichever node had room.

DoNotSchedule vs. ScheduleAnyway

This is the decision that defines the behavior:

  • DoNotSchedule — a hard constraint. If spreading evenly isn’t possible, the pod stays Pending. Use this when distribution is non-negotiable and you’d rather have a pod wait than cluster up.
  • ScheduleAnyway — a soft preference. The scheduler tries to spread but will place the pod anyway if it can’t. Use this when you want best-effort distribution without risking stuck pods.

My rule: ScheduleAnyway for spreading across nodes (you rarely want a Pending pod just because nodes are unbalanced), and DoNotSchedule for spreading across zones (you really do want to refuse to stack everything in one zone). But it depends on whether you’d rather degrade availability or degrade capacity when the cluster is tight.

Spreading across zones

The high-value version. To guarantee replicas land in different availability zones:

      topologySpreadConstraints:
      - maxSkew: 1
        topologyKey: topology.kubernetes.io/zone
        whenUnsatisfiable: DoNotSchedule
        labelSelector:
          matchLabels:
            app: api

With three zones and three replicas, this puts one pod in each. Now a full zone outage costs you a third of capacity, not all of it. This is the constraint that makes “multi-AZ” real instead of aspirational.

Combining node and zone spread

You can stack constraints — Kubernetes ANDs them together. A common production pattern is a hard zone spread plus a soft node spread:

      topologySpreadConstraints:
      - maxSkew: 1
        topologyKey: topology.kubernetes.io/zone
        whenUnsatisfiable: DoNotSchedule
        labelSelector:
          matchLabels:
            app: api
      - maxSkew: 1
        topologyKey: kubernetes.io/hostname
        whenUnsatisfiable: ScheduleAnyway
        labelSelector:
          matchLabels:
            app: api

This says: insist on even zone distribution, and prefer even node distribution within that. It’s the configuration I default to for tier-1 services.

Why this beats pod anti-affinity

You can approximate spreading with podAntiAffinity, and it was the old way. But anti-affinity is binary — “don’t co-locate” — and gets clumsy fast. With anti-affinity, “spread across 3 zones with 6 replicas” doesn’t express cleanly; you end up fighting the scheduler. Spread constraints express degree of evenness with maxSkew, scale naturally with replica count, and read far more clearly. Use spread constraints for distribution; reserve anti-affinity for genuine “these two must never share a node” rules.

The matchLabelKeys gotcha during rollouts

A subtle one: during a rolling update, old and new ReplicaSet pods share the same app label, so the scheduler counts both when evaluating skew. That can cause new pods to schedule oddly mid-rollout. Newer Kubernetes versions support matchLabelKeys (e.g. including pod-template-hash) so the constraint only considers pods from the same rollout. If your spread looks wrong only during deploys, this is usually why.

Verifying it works

Don’t trust the YAML — check the placement:

kubectl get pods -l app=api -o wide

Look at the NODE column and confirm the pods are actually distributed. Then check zones:

kubectl get pods -l app=api -o json | \
  jq -r '.items[] | .spec.nodeName' | sort | uniq -c

If everything’s on one node despite the constraints, your whenUnsatisfiable is probably ScheduleAnyway and the cluster was tight when they scheduled. Re-evaluate, or tighten to DoNotSchedule.

Before shipping spread constraints on critical services, get a review — an overly strict DoNotSchedule zone constraint on a single-zone cluster will pin every pod Pending, and that’s a confusing 2am discovery. Our AI code review catches constraints that can’t be satisfied by the cluster’s actual topology.

Replicas are a number; resilience is a distribution. Topology spread constraints are how you make the two line up. For more on scheduling and reliability, see the Kubernetes & Helm category.

Scheduling constraints interact with your cluster’s real node and zone layout. Verify topology labels and skew settings against your environment before applying to production.

Free download · 368-page PDF

Download the Free 500-Prompt DevOps AI Toolkit

500 battle-tested, copy-paste AI prompts engineered by a senior systems engineer — every one with fill-in placeholders and safety/back-out notes. Drop your email and it's yours.

  • 500 prompts: Linux · Kubernetes · Terraform · OpenStack · GitLab · Docker · Monitoring · Incident Response
  • Instant PDF download — yours free, forever
  • Plus one practical AI-workflow email a week (no spam)

Single opt-in · unsubscribe anytime · no spam.