Skip to content
DevOps AI ToolKit
Newsletter
All guides
AI for Kubernetes & Helm By James Joyner IV · · 9 min read

Kubernetes Error Guide: 'rpc error: code = DeadlineExceeded' CSI Attach/Mount Timeout

Fix 'rpc error: code = DeadlineExceeded, context deadline exceeded' in Kubernetes CSI attach/mount: slow cloud APIs, throttling, and stuck VolumeAttachments.

  • #kubernetes-helm
  • #troubleshooting
  • #errors
  • #storage

Exact Error Message

When a CSI driver takes too long to attach or mount a volume, the gRPC call from the kubelet or attacher sidecar times out and the pod records a FailedAttachVolume or FailedMount event:

Events:
  Type     Reason              Age   From                     Message
  ----     ------              ----  ----                     -------
  Warning  FailedAttachVolume  2m1s  attachdetach-controller  AttachVolume.Attach failed for volume "pvc-9a2c..." : rpc error: code = DeadlineExceeded desc = context deadline exceeded
  Warning  FailedMount         9s    kubelet                  Unable to attach or mount volumes: unmounted volumes=[data], unattached volumes=[data]: timed out waiting for the condition

The pod stays ContainerCreating. The rpc error: code = DeadlineExceeded desc = context deadline exceeded is the gRPC layer reporting that the CSI operation did not finish within its timeout.

What the Error Means

DeadlineExceeded is gRPC status code 4. The CSI sidecars (csi-attacher, csi-provisioner) and the kubelet each call the driver with a context deadline. When ControllerPublishVolume (attach) or NodeStageVolume/NodePublishVolume (mount) does not return before that deadline, the call is cancelled and surfaced as DeadlineExceeded.

Crucially, this is a timeout, not a rejection. The cloud operation may still be in progress or even succeed seconds later — but Kubernetes already gave up on that attempt and will retry. The error means “the driver was too slow,” and the usual reason is a slow or throttled cloud control-plane API, not bad configuration.

Common Causes

  • Cloud API throttling — many simultaneous attach/detach calls (e.g. mass pod rescheduling) get rate-limited by the provider, so each call slows past the deadline.
  • Slow detach from a previous node — a volume must detach from an old node before attaching to a new one; if the old node is unreachable, detach stalls.
  • Stuck VolumeAttachment objects — a leftover VolumeAttachment keeps the volume “attached” in the API even though the node is gone.
  • Undersized CSI sidecar timeouts--timeout on the attacher/provisioner is too short for the provider’s real latency.
  • CSI driver pod under-resourced or crash-looping — the node plugin is CPU-starved or restarting, so NodeStage is slow.
  • Filesystem operations on large volumes — formatting or fsck of a large/fragmented volume exceeds the mount deadline.

How to Reproduce the Error

Force many attaches at once by rescheduling a StatefulSet across nodes while the cloud API is busy:

# Drain a node hosting several stateful pods to trigger simultaneous detach+attach
kubectl drain <NODE> --ignore-daemonsets --delete-emptydir-data
kubectl get pods -o wide -w

Under throttling, several pods stick in ContainerCreating and kubectl describe pod shows rpc error: code = DeadlineExceeded. A simpler lab trigger is to set the attacher sidecar --timeout=1s, which makes nearly every attach exceed its deadline.

Diagnostic Commands

# Read the attach/mount events on the stuck pod
kubectl describe pod <POD> | grep -A10 Events

# Is the volume stuck attached to an old node?
kubectl get volumeattachment | grep <PV-NAME>
kubectl describe volumeattachment <VA-NAME>

# Attacher sidecar logs: look for repeated DeadlineExceeded
kubectl logs -n kube-system <CSI-CONTROLLER-POD> -c csi-attacher --tail=80

# Node plugin logs for NodeStage/NodePublish latency
kubectl logs -n kube-system <CSI-NODE-POD> -c <DRIVER> --tail=80

# Check CSI pods are healthy and not restarting
kubectl get pods -n kube-system -l 'app in (ebs-csi-controller,ebs-csi-node)' -o wide

If the same DeadlineExceeded repeats every few seconds across multiple volumes, suspect cloud-API throttling rather than a single bad volume.

Step-by-Step Resolution

1. Decide: transient or stuck? If the pod eventually starts after a minute or two, it was transient throttling and self-healed via retry. If it never starts, there is a stuck attachment or a persistent slowness to fix.

2. Clear a stuck VolumeAttachment. A volume can only attach to one node (RWO). If a VolumeAttachment still references a dead node, new attach calls wait forever. Verify the old node is truly gone, then remove the stale object:

kubectl get volumeattachment -o wide
# only after confirming the node no longer exists / volume is detached cloud-side
kubectl delete volumeattachment <VA-NAME>

3. Relieve cloud-API throttling. If logs show throttle/rate-limit errors, stagger workload moves instead of draining everything at once, and reduce churn. Some drivers expose tunables to lower concurrent attach calls per node.

4. Increase sidecar timeouts. If the provider’s real attach latency exceeds the configured --timeout, raise it on the attacher/provisioner deployment (commonly to 60s or more for slow regions), then restart the controller.

5. Right-size and stabilize the driver. Ensure the CSI node and controller pods are not crash-looping or CPU-throttled. Bump their resource requests if NodeStageVolume is slow, and confirm they are Running on every relevant node.

6. Verify recovery. After the fix, the next retry should attach cleanly:

kubectl get pod <POD> -o wide -w

Prevention and Best Practices

  • Avoid mass simultaneous drains; cordon and migrate stateful workloads in small batches to stay under cloud API rate limits.
  • Set CSI sidecar timeouts generously for your region’s measured latency rather than relying on defaults.
  • Monitor VolumeAttachment objects and alert on ones older than a few minutes that are not Attached.
  • Keep CSI driver pods on guaranteed resources so node staging is never CPU-starved.
  • Replace unhealthy nodes promptly so volumes can detach; a hung node is the classic cause of “attach waits on detach.” More in the Kubernetes & Helm guides.

Frequently Asked Questions

Does DeadlineExceeded mean the volume operation failed permanently? No. It means one attempt timed out. The controller retries with backoff, and the underlying cloud operation may even complete on its own. Watch the pod — if it starts shortly after, the retry succeeded.

Why does attach wait on a node that no longer exists? An RWO volume must be detached from its previous node before it can attach elsewhere. If that node became unreachable without a clean detach, the cloud keeps the volume “in use” and a stale VolumeAttachment blocks the new attach until it is force-detached or removed.

Is raising the sidecar timeout always safe? Raising it prevents premature cancellation of legitimately slow operations, which is usually correct. It does not mask real failures — a genuinely failing call returns a different error code, not DeadlineExceeded.

Could this be the CSI driver crashing rather than the cloud being slow? Yes. Check that the controller and node plugin pods are Running and not restarting. A crash-looping node plugin makes NodeStageVolume time out just like a slow cloud API.

How do I tell throttling from a single bad volume? Throttling shows the same DeadlineExceeded across many volumes at once and rate-limit messages in the attacher log. A single bad volume fails in isolation while other volumes attach normally.

Free download · 368-page PDF

Download the Free 500-Prompt DevOps AI Toolkit

500 battle-tested, copy-paste AI prompts engineered by a senior systems engineer — every one with fill-in placeholders and safety/back-out notes. Drop your email and it's yours.

  • 500 prompts: Linux · Kubernetes · Terraform · OpenStack · GitLab · Docker · Monitoring · Incident Response
  • Instant PDF download — yours free, forever
  • Plus one practical AI-workflow email a week (no spam)

Single opt-in · unsubscribe anytime · no spam.