Skip to content
DevOps AI ToolKit
Newsletter
All guides
AI for Kubernetes & Helm By James Joyner IV · · 9 min read

Kubernetes Error Guide: 'StatefulSet has not progressed' Stuck Rollout

Fix a StatefulSet rollout that stalls because an ordered pod never becomes Ready, an OnDelete strategy is set, or a partition blocks the update.

  • #kubernetes-helm
  • #troubleshooting
  • #errors
  • #statefulset

Exact Error Message

A StatefulSet update appears to hang. A waiter such as kubectl rollout status or a CI gate reports that the StatefulSet is not making progress, and the status shows updated replicas stuck below the desired count:

$ kubectl rollout status statefulset/postgres
Waiting for partitioned roll out to finish: 1 out of 3 new pods have been updated...

$ kubectl get statefulset postgres
NAME       READY   AGE
postgres   2/3     6d

$ kubectl describe statefulset postgres
Replicas: 3 desired | 3 total
Update:   RollingUpdate
Status:   currentReplicas=2 updatedReplicas=1 readyReplicas=2
Message:  StatefulSet postgres has not progressed: waiting for pod postgres-1 to be Ready

The headline is has not progressed with updatedReplicas frozen. A StatefulSet updates pods one at a time in reverse ordinal order, and it will not touch the next pod until the current one is Ready.

What the Error Means

A RollingUpdate StatefulSet is deliberately conservative. It deletes and recreates pods from the highest ordinal down to zero, and after recreating each pod it waits for that pod to become Ready before proceeding to the next ordinal. If any pod gets stuck — crash-looping, failing a readiness probe, unschedulable, or blocked on a volume — the rollout halts at that ordinal and never advances.

So has not progressed almost never means the controller is broken. It means one specific pod is not Ready and is acting as a barrier. The fix is to identify the blocking ordinal (the lowest-numbered pod that is not yet updated-and-Ready) and resolve that pod’s problem. Two other configurations also cause an apparent stall: an OnDelete update strategy (no automatic updates at all) and a non-zero partition (intentionally pausing part of the fleet).

Common Causes

  • A pod never becomes Ready — the recreated pod crash-loops or fails its readiness probe, so the controller waits forever at that ordinal.
  • Pod unschedulable — the updated pod is Pending (FailedScheduling) and never starts, blocking the ordered rollout.
  • Volume / PVC problem — the pod’s PVC is unbound or its volume cannot attach, so the pod never runs.
  • updateStrategy: OnDelete — updates only happen when you manually delete a pod; the controller will never roll on its own.
  • Non-zero partitionrollingUpdate.partition: N intentionally updates only pods with ordinal >= N, leaving lower ordinals on the old revision.
  • Readiness probe too strict — the new image is healthy but the probe never passes, so Ready is never reported.

How to Reproduce the Error

Roll out an image that fails its readiness probe so a pod is Running but never Ready:

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: postgres
spec:
  serviceName: postgres
  replicas: 3
  selector:
    matchLabels: { app: postgres }
  template:
    metadata:
      labels: { app: postgres }
    spec:
      containers:
        - name: db
          image: registry.k8s.io/pause:3.9
          readinessProbe:
            exec:
              command: ["sh", "-c", "exit 1"]
            periodSeconds: 5
kubectl apply -f postgres-sts.yaml
kubectl rollout status statefulset/postgres
Waiting for partitioned roll out to finish: 0 out of 3 new pods have been updated...

The highest-ordinal pod recreates, never passes readiness, and the rollout sits forever.

Diagnostic Commands

# See updated vs ready counts and the update strategy
kubectl get statefulset <STS> -o wide
kubectl get statefulset <STS> -o jsonpath='{.spec.updateStrategy}'
kubectl get statefulset <STS> -o jsonpath='{.status}'

# Find the blocking pod: the lowest ordinal not Ready
kubectl get pods -l app=<APP> -o wide

# Why is that pod not Ready?
kubectl describe pod <STS>-<ORDINAL> | grep -A10 Events
kubectl logs <STS>-<ORDINAL> --previous

# Check for a partition that pauses the rollout
kubectl get statefulset <STS> -o jsonpath='{.spec.updateStrategy.rollingUpdate.partition}'

The combination of updatedReplicas, the pod list, and the blocking pod’s events tells you exactly where the rollout stopped.

Step-by-Step Resolution

1. Identify the blocking ordinal. List pods and find the lowest-numbered one that is not both updated and Ready — that pod is the barrier:

kubectl get pods -l app=<APP>
# postgres-2  1/1 Running   <- updated, ready
# postgres-1  0/1 Running   <- NOT ready: this is the barrier

2. Diagnose that pod. Use its events and logs to classify the failure — crash loop, failing readiness probe, Pending, or a volume error:

kubectl describe pod <STS>-1 | grep -A10 Events

3. Fix a failing readiness probe. If the container is healthy but the probe never passes, correct the probe path/port/command or loosen initialDelaySeconds / failureThreshold. The pod goes Ready and the rollout resumes automatically.

4. Fix a crash loop or pull error. Resolve the underlying CrashLoopBackOff or ImagePullBackOff in the new revision; once the pod is Ready the controller proceeds to the next ordinal. See CrashLoopBackOff.

5. Fix scheduling or volume blocks. If the pod is Pending, resolve the FailedScheduling cause. If its PVC is unbound, fix the storage. Both keep the pod from ever becoming Ready.

6. Handle OnDelete or a partition. If updateStrategy is OnDelete, the controller never rolls automatically — delete pods yourself to trigger updates. If a partition is set, lower it toward 0 step by step to release more ordinals:

kubectl get sts <STS> -o jsonpath='{.spec.updateStrategy}'

Prevention and Best Practices

  • Validate new images in a staging StatefulSet so a bad revision blocks a test, not production.
  • Make readiness probes match real readiness — too strict and every rollout stalls at ordinal-by-ordinal granularity.
  • Use partition deliberately as a canary mechanism and remember to drive it back to 0 to complete the rollout.
  • Alert on a StatefulSet where updatedReplicas < replicas for longer than your rollout SLA; ordered updates do not self-heal a stuck pod.
  • Keep PodDisruptionBudgets and probes consistent so one slow pod does not wedge the whole set. More in Kubernetes & Helm guides.

Frequently Asked Questions

Why does the rollout stop instead of skipping the bad pod? StatefulSets guarantee ordered, at-most-one-at-a-time updates to protect stateful identity. Skipping would break that guarantee, so the controller waits indefinitely for the current ordinal to become Ready.

My StatefulSet uses OnDelete and nothing updates — is that a bug? No. With updateStrategy: OnDelete, the controller only applies the new template when you manually delete a pod. It will never roll automatically. Switch to RollingUpdate for hands-off updates.

Which pod is actually blocking? The lowest-ordinal pod that is not yet on the new revision and Ready. Updates flow from highest ordinal down, so the first not-ready pod in that descending sequence is the barrier.

I set partition to 2 on a 3-replica set and only one pod updated — correct? Yes. partition: 2 updates only ordinals >= 2, so just pod-2 rolls. Lower the partition toward 0 to update the rest. It is a pause control, not an error.

Free download · 368-page PDF

Download the Free 500-Prompt DevOps AI Toolkit

500 battle-tested, copy-paste AI prompts engineered by a senior systems engineer — every one with fill-in placeholders and safety/back-out notes. Drop your email and it's yours.

  • 500 prompts: Linux · Kubernetes · Terraform · OpenStack · GitLab · Docker · Monitoring · Incident Response
  • Instant PDF download — yours free, forever
  • Plus one practical AI-workflow email a week (no spam)

Single opt-in · unsubscribe anytime · no spam.