Kubernetes Error Guide: 'FailedAttachVolume' Multi-Attach Stuck Pod
Fix the FailedAttachVolume Multi-Attach error: detach volumes stuck on a dead node, switch RWO to RWX where needed, and unblock pods that hang in ContainerCreating.
- #kubernetes-helm
- #troubleshooting
- #errors
- #storage
Exact Error Message
Warning FailedAttachVolume 3m12s attachdetach-controller
Multi-Attach error for volume "pvc-9f3c2a1e-7b44-4c0d-9a21-0d5f6e8b1c2a"
Volume is already exclusively attached to one node and can't be attached to another
Warning FailedMount 62s (x9 over 11m) kubelet
Unable to attach or mount volumes: unmounted volumes=[data],
unattached volumes=[data kube-api-access-x5k2t]:
timed out waiting for the condition
What the Error Means
FailedAttachVolume with a Multi-Attach error means Kubernetes is trying to attach a ReadWriteOnce (RWO) persistent volume to a node while that same volume is still attached to a different node. RWO volumes — the default for most block storage like EBS, Azure Disk, GCE PD, and Ceph RBD — can only be mounted by one node at a time. The attach/detach controller refuses the second attach because the first node has not released it.
This almost always happens during pod rescheduling. The old pod is supposed to terminate, the volume detaches from the old node, then the new pod attaches it on the new node. When the old node dies abruptly, gets cordoned, or the old pod gets stuck terminating, the detach never completes, and the new pod hangs in ContainerCreating indefinitely.
Common Causes
- A node went down hard. The kubelet on the dead node never reports the volume detached, so the controller waits the full 6-minute force-detach timeout (or forever, if the node object is not deleted).
- The old pod is stuck
Terminating. A finalizer, a hung process, or a longterminationGracePeriodSecondskeeps the old pod alive, so the volume is never released. - A Deployment with an RWO volume and a
RollingUpdatestrategy. The new pod is scheduled to a different node before the old pod releases the disk. - Two replicas sharing one RWO PVC. A StatefulSet misconfiguration or a Deployment with
replicas: 2pointing at the same RWO PVC will always contend. - A genuinely RWX workload using an RWO storage class. The application expects shared storage but the PVC was provisioned RWO.
How to Reproduce the Error
- Create an RWO PVC and a Deployment that mounts it on node A.
- Hard-stop node A (power off the VM, do not drain it) so the kubelet stops reporting.
- The Deployment controller reschedules the pod to node B.
- Node B’s pod cannot attach the volume because node A still “owns” it, producing the Multi-Attach error.
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: data
spec:
accessModes: ["ReadWriteOnce"]
resources:
requests:
storage: 10Gi
Diagnostic Commands
# See the pod stuck in ContainerCreating and its events
kubectl get pod <POD> -o wide
kubectl describe pod <POD> | grep -A4 -iE 'multi-attach|failedattach|failedmount'
# Find which node currently holds the volume
kubectl get volumeattachment | grep <PV_NAME>
kubectl describe volumeattachment <VA_NAME>
# Check whether the old pod is still terminating and on which node
kubectl get pods -o wide --all-namespaces | grep -i terminating
# Inspect the PV access mode and its bound PVC
kubectl get pv <PV_NAME> -o jsonpath='{.spec.accessModes}{"\n"}'
kubectl get pvc data -o wide
# Is the old node Ready, or has it gone NotReady?
kubectl get nodes
kubectl describe node <OLD_NODE> | grep -iE 'taint|condition|ready'
Step-by-Step Resolution
1. Identify the node still holding the volume. Use the VolumeAttachment object — it names both the PV and the node:
kubectl get volumeattachment -o custom-columns=NAME:.metadata.name,PV:.spec.source.persistentVolumeName,NODE:.spec.nodeName,ATTACHED:.status.attached
2. If the old node is dead, let the force-detach run or delete the node object. When a node is NotReady, the controller waits ~6 minutes before force-detaching. If the node is permanently gone, delete it so the controller releases the volume immediately:
kubectl delete node <DEAD_NODE>
3. If the old pod is stuck Terminating, release it. Find the terminating pod that still references the PVC and let it finish or force-delete it once you confirm the process is truly gone:
kubectl delete pod <OLD_POD> --grace-period=0 --force
Only force-delete after confirming the old node/process is dead — force-deleting while the old pod still writes risks data corruption on the shared disk.
4. For Deployments, switch to Recreate. RWO volumes do not survive a rolling update cleanly. Change the strategy so the old pod is fully gone before the new one starts:
kubectl patch deployment <DEPLOY> -p '{"spec":{"strategy":{"type":"Recreate"}}}'
5. If the workload genuinely needs shared access, use RWX. Re-provision on a storage class that supports ReadWriteMany (NFS, CephFS, Azure Files, EFS). The Multi-Attach error disappears because multiple nodes are allowed.
Prevention and Best Practices
- Use
Recreate(notRollingUpdate) for any single-replica Deployment that mounts an RWO volume, so the disk is released before the replacement schedules. - Prefer StatefulSets for stateful workloads — each replica gets its own PVC, eliminating contention.
- Set a realistic
terminationGracePeriodSecondsand ensure your app handles SIGTERM, so pods do not linger inTerminating. - Add node auto-repair/auto-replacement so dead nodes are removed and their VolumeAttachments are cleaned up automatically.
- Never point two replicas at the same RWO PVC; if you need shared storage, provision RWX from the start.
- Monitor for pods stuck in
ContainerCreatinglonger than a few minutes as an early signal of attach problems. See more in Kubernetes & Helm guides.
Related Errors
MountVolume.SetUp failed— the volume attached but the filesystem mount or CSI node-stage step failed.FailedScheduling ... volume node affinity conflict— the pod was scheduled to a zone where the volume cannot attach.PersistentVolumeClaim is not bound— provisioning never completed, so there is nothing to attach.Unable to attach or mount volumes: timed out waiting for the condition— the kubelet-side companion message to a stuck attach.
Frequently Asked Questions
Why does my volume stay attached to a node that is already gone? The attach/detach controller relies on the kubelet to confirm a detach. A hard-down node never sends that confirmation, so the controller waits for the force-detach timeout (about 6 minutes) or until you delete the node object. Deleting the dead node is the fastest safe fix.
Can I just delete the VolumeAttachment object to fix it? Avoid it. Deleting the VolumeAttachment manually can leave the underlying cloud volume attached at the infrastructure layer, causing a state mismatch. Delete the dead node or release the stuck pod instead, and let the controller reconcile.
Is Recreate strategy slower than RollingUpdate? Yes, there is a brief downtime because the old pod is removed before the new one starts. For RWO-backed single-replica workloads this is the correct trade-off — a rolling update simply cannot succeed when only one node can hold the disk.
How do I make the volume mountable from multiple pods? Re-provision the PVC with accessModes: ["ReadWriteMany"] on an RWX-capable storage class. RWO volumes physically cannot be attached to more than one node, regardless of Kubernetes configuration.
Download the Free 500-Prompt DevOps AI Toolkit
500 battle-tested, copy-paste AI prompts engineered by a senior systems engineer — every one with fill-in placeholders and safety/back-out notes. Drop your email and it's yours.
- 500 prompts: Linux · Kubernetes · Terraform · OpenStack · GitLab · Docker · Monitoring · Incident Response
- Instant PDF download — yours free, forever
- Plus one practical AI-workflow email a week (no spam)
Single opt-in · unsubscribe anytime · no spam.