Kubernetes Error Guide: 'rpc error: code = DeadlineExceeded' CSI Attach/Mount Timeout
Fix 'rpc error: code = DeadlineExceeded, context deadline exceeded' in Kubernetes CSI attach/mount: slow cloud APIs, throttling, and stuck VolumeAttachments.
- #kubernetes-helm
- #troubleshooting
- #errors
- #storage
Exact Error Message
When a CSI driver takes too long to attach or mount a volume, the gRPC call from the kubelet or attacher sidecar times out and the pod records a FailedAttachVolume or FailedMount event:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedAttachVolume 2m1s attachdetach-controller AttachVolume.Attach failed for volume "pvc-9a2c..." : rpc error: code = DeadlineExceeded desc = context deadline exceeded
Warning FailedMount 9s kubelet Unable to attach or mount volumes: unmounted volumes=[data], unattached volumes=[data]: timed out waiting for the condition
The pod stays ContainerCreating. The rpc error: code = DeadlineExceeded desc = context deadline exceeded is the gRPC layer reporting that the CSI operation did not finish within its timeout.
What the Error Means
DeadlineExceeded is gRPC status code 4. The CSI sidecars (csi-attacher, csi-provisioner) and the kubelet each call the driver with a context deadline. When ControllerPublishVolume (attach) or NodeStageVolume/NodePublishVolume (mount) does not return before that deadline, the call is cancelled and surfaced as DeadlineExceeded.
Crucially, this is a timeout, not a rejection. The cloud operation may still be in progress or even succeed seconds later — but Kubernetes already gave up on that attempt and will retry. The error means “the driver was too slow,” and the usual reason is a slow or throttled cloud control-plane API, not bad configuration.
Common Causes
- Cloud API throttling — many simultaneous attach/detach calls (e.g. mass pod rescheduling) get rate-limited by the provider, so each call slows past the deadline.
- Slow detach from a previous node — a volume must detach from an old node before attaching to a new one; if the old node is unreachable, detach stalls.
- Stuck VolumeAttachment objects — a leftover
VolumeAttachmentkeeps the volume “attached” in the API even though the node is gone. - Undersized CSI sidecar timeouts —
--timeouton the attacher/provisioner is too short for the provider’s real latency. - CSI driver pod under-resourced or crash-looping — the node plugin is CPU-starved or restarting, so
NodeStageis slow. - Filesystem operations on large volumes — formatting or
fsckof a large/fragmented volume exceeds the mount deadline.
How to Reproduce the Error
Force many attaches at once by rescheduling a StatefulSet across nodes while the cloud API is busy:
# Drain a node hosting several stateful pods to trigger simultaneous detach+attach
kubectl drain <NODE> --ignore-daemonsets --delete-emptydir-data
kubectl get pods -o wide -w
Under throttling, several pods stick in ContainerCreating and kubectl describe pod shows rpc error: code = DeadlineExceeded. A simpler lab trigger is to set the attacher sidecar --timeout=1s, which makes nearly every attach exceed its deadline.
Diagnostic Commands
# Read the attach/mount events on the stuck pod
kubectl describe pod <POD> | grep -A10 Events
# Is the volume stuck attached to an old node?
kubectl get volumeattachment | grep <PV-NAME>
kubectl describe volumeattachment <VA-NAME>
# Attacher sidecar logs: look for repeated DeadlineExceeded
kubectl logs -n kube-system <CSI-CONTROLLER-POD> -c csi-attacher --tail=80
# Node plugin logs for NodeStage/NodePublish latency
kubectl logs -n kube-system <CSI-NODE-POD> -c <DRIVER> --tail=80
# Check CSI pods are healthy and not restarting
kubectl get pods -n kube-system -l 'app in (ebs-csi-controller,ebs-csi-node)' -o wide
If the same DeadlineExceeded repeats every few seconds across multiple volumes, suspect cloud-API throttling rather than a single bad volume.
Step-by-Step Resolution
1. Decide: transient or stuck? If the pod eventually starts after a minute or two, it was transient throttling and self-healed via retry. If it never starts, there is a stuck attachment or a persistent slowness to fix.
2. Clear a stuck VolumeAttachment. A volume can only attach to one node (RWO). If a VolumeAttachment still references a dead node, new attach calls wait forever. Verify the old node is truly gone, then remove the stale object:
kubectl get volumeattachment -o wide
# only after confirming the node no longer exists / volume is detached cloud-side
kubectl delete volumeattachment <VA-NAME>
3. Relieve cloud-API throttling. If logs show throttle/rate-limit errors, stagger workload moves instead of draining everything at once, and reduce churn. Some drivers expose tunables to lower concurrent attach calls per node.
4. Increase sidecar timeouts. If the provider’s real attach latency exceeds the configured --timeout, raise it on the attacher/provisioner deployment (commonly to 60s or more for slow regions), then restart the controller.
5. Right-size and stabilize the driver. Ensure the CSI node and controller pods are not crash-looping or CPU-throttled. Bump their resource requests if NodeStageVolume is slow, and confirm they are Running on every relevant node.
6. Verify recovery. After the fix, the next retry should attach cleanly:
kubectl get pod <POD> -o wide -w
Prevention and Best Practices
- Avoid mass simultaneous drains; cordon and migrate stateful workloads in small batches to stay under cloud API rate limits.
- Set CSI sidecar timeouts generously for your region’s measured latency rather than relying on defaults.
- Monitor
VolumeAttachmentobjects and alert on ones older than a few minutes that are notAttached. - Keep CSI driver pods on guaranteed resources so node staging is never CPU-starved.
- Replace unhealthy nodes promptly so volumes can detach; a hung node is the classic cause of “attach waits on detach.” More in the Kubernetes & Helm guides.
Related Errors
- FailedAttachVolume Multi-Attach — the RWO volume is still attached elsewhere.
- timed out waiting for the condition — the generic kubelet timeout this often pairs with.
- MountVolume.SetUp failed — mount failures after a successful attach.
- failed to provision volume with StorageClass — failures at the create stage instead of attach.
Frequently Asked Questions
Does DeadlineExceeded mean the volume operation failed permanently? No. It means one attempt timed out. The controller retries with backoff, and the underlying cloud operation may even complete on its own. Watch the pod — if it starts shortly after, the retry succeeded.
Why does attach wait on a node that no longer exists? An RWO volume must be detached from its previous node before it can attach elsewhere. If that node became unreachable without a clean detach, the cloud keeps the volume “in use” and a stale VolumeAttachment blocks the new attach until it is force-detached or removed.
Is raising the sidecar timeout always safe? Raising it prevents premature cancellation of legitimately slow operations, which is usually correct. It does not mask real failures — a genuinely failing call returns a different error code, not DeadlineExceeded.
Could this be the CSI driver crashing rather than the cloud being slow? Yes. Check that the controller and node plugin pods are Running and not restarting. A crash-looping node plugin makes NodeStageVolume time out just like a slow cloud API.
How do I tell throttling from a single bad volume? Throttling shows the same DeadlineExceeded across many volumes at once and rate-limit messages in the attacher log. A single bad volume fails in isolation while other volumes attach normally.
Download the Free 500-Prompt DevOps AI Toolkit
500 battle-tested, copy-paste AI prompts engineered by a senior systems engineer — every one with fill-in placeholders and safety/back-out notes. Drop your email and it's yours.
- 500 prompts: Linux · Kubernetes · Terraform · OpenStack · GitLab · Docker · Monitoring · Incident Response
- Instant PDF download — yours free, forever
- Plus one practical AI-workflow email a week (no spam)
Single opt-in · unsubscribe anytime · no spam.