Skip to content
DevOps AI ToolKit
Newsletter
All guides
AI for Kubernetes & Helm By James Joyner IV · · 9 min read

Kubernetes Error Guide: 'failed to provision volume with StorageClass' RPC Error

Fix 'failed to provision volume with StorageClass: rpc error' in Kubernetes: decode CSI provisioner failures from zone/topology mismatch, quota, and IAM.

  • #kubernetes-helm
  • #troubleshooting
  • #errors
  • #storage

Exact Error Message

When a CSI external-provisioner tries to create a volume for a Pending PersistentVolumeClaim and the underlying driver rejects the request, the PVC records a ProvisioningFailed event:

Events:
  Type     Reason                Age   From                                   Message
  ----     ------                ----  ----                                   -------
  Warning  ProvisioningFailed    12s   ebs.csi.aws.com_ebs-csi-controller...  failed to provision volume with StorageClass "gp3": rpc error: code = Internal desc = Could not create volume "pvc-7f3a...": InvalidParameterValue: The volume size is invalid for gp3 volumes, or VolumeLimitExceeded: You have reached the maximum number of volumes in availability zone us-east-1a

The PVC stays Pending and any pod that mounts it stays Pending too. The text after rpc error: comes verbatim from the cloud provider’s storage API, so it is the real clue.

What the Error Means

Dynamic provisioning is a hand-off. The external-provisioner sidecar watches for unbound PVCs that reference a StorageClass, then calls the CSI driver’s CreateVolume gRPC method. The driver in turn calls the cloud API (AWS EBS, GCE PD, Azure Disk, Ceph, etc.) to allocate real storage. failed to provision volume with StorageClass means that gRPC call returned an error instead of a new volume.

The rpc error: code = <Code> desc = <message> envelope is the gRPC status. The desc carries the provider’s message — VolumeLimitExceeded, InvalidParameterValue, requested topology is not available, UnauthorizedOperation, or a quota message. Provisioning is retried with exponential backoff, so the event repeats while the root cause persists.

Common Causes

  • Topology / zone mismatch — the StorageClass or allowedTopologies requires a zone where the volume type is unavailable, or where no schedulable node exists.
  • Quota / volume limits — the account hit a per-zone volume count, total provisioned IOPS, or capacity quota.
  • Invalid StorageClass parameters — an unsupported type, iops, throughput, or fsType combination for that volume class.
  • Missing IAM / RBAC permissions — the CSI controller’s cloud credentials lack CreateVolume/CreateDisk rights.
  • Immediate binding in the wrong zonevolumeBindingMode: Immediate provisions before scheduling, picking a zone the pod cannot land in.
  • Driver not installed or unhealthy — the CSI controller pod is crash-looping, so the gRPC call never reaches the cloud.

How to Reproduce the Error

Request a volume larger than your quota allows, or in a constrained zone:

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: too-big
spec:
  accessModes: ["ReadWriteOnce"]
  storageClassName: gp3
  resources:
    requests:
      storage: 64Ti
kubectl apply -f too-big.yaml
kubectl get pvc too-big
NAME      STATUS    VOLUME   CAPACITY   ACCESS MODES   STORAGECLASS   AGE
too-big   Pending                                      gp3            20s

kubectl describe pvc too-big then shows ProvisioningFailed ... rpc error ... exceeds the maximum supported size.

Diagnostic Commands

# Read the full ProvisioningFailed message
kubectl describe pvc <PVC> | grep -A8 Events

# Inspect the StorageClass parameters and binding mode
kubectl get storageclass <NAME> -o yaml

# Confirm the CSI controller is running and healthy
kubectl get pods -n kube-system -l app=ebs-csi-controller -o wide

# Read provisioner-side errors (the real cloud API message)
kubectl logs -n kube-system <CSI-CONTROLLER-POD> -c csi-provisioner --tail=50

# List node zones to compare against allowedTopologies
kubectl get nodes -o custom-columns=NAME:.metadata.name,ZONE:.metadata.labels.'topology\.kubernetes\.io/zone'

The csi-provisioner sidecar log is the authoritative source — it logs the exact gRPC error and the parameters it sent.

Step-by-Step Resolution

1. Read the desc text. The gRPC desc is the provider’s own message. VolumeLimitExceeded, InvalidParameterValue, UnauthorizedOperation, and topology messages each route to a different fix below.

2. Fix quota / volume limits. If the message is a limit or quota error, you have hit a cloud account ceiling. Check used vs. allowed volumes per zone in the provider console and either request a quota increase or delete unused Released/orphaned volumes. Spreading PVCs across more zones also helps.

3. Fix topology mismatch. If desc mentions topology or a zone where no nodes run, switch the StorageClass to WaitForFirstConsumer so the volume is created in the pod’s actual zone:

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: gp3
provisioner: ebs.csi.aws.com
volumeBindingMode: WaitForFirstConsumer

This is the single most common fix for multi-zone clusters using Immediate binding.

4. Fix invalid parameters. Validate type, iops, throughput, and fsType against the driver’s documentation. For example, gp3 caps IOPS and throughput per GiB; an out-of-range value yields InvalidParameterValue. Correct the StorageClass, then delete and recreate the PVC (StorageClass parameters are immutable for an existing volume).

5. Fix IAM / RBAC. If desc says UnauthorizedOperation or access denied, the CSI controller’s credentials lack create permissions. Attach the driver’s required policy (e.g. the EBS CSI managed policy) to the controller’s IAM role / service account, then restart the controller.

6. Confirm the driver is healthy. A crash-looping controller produces no events at all or repeated dial errors. Check the controller pods and the provisioner sidecar logs before chasing cloud-side causes.

Prevention and Best Practices

  • Default to volumeBindingMode: WaitForFirstConsumer for any cluster spanning multiple zones — it eliminates the most common topology failures.
  • Monitor cloud storage quotas and alert before you reach per-zone volume limits.
  • Pin StorageClass parameters to values you have validated against the driver version; review them on every driver upgrade.
  • Grant the CSI controller a least-privilege but complete IAM policy, and verify it in a non-prod cluster first.
  • Alert on PVCs Pending longer than a couple of minutes; dynamic provisioning should be near-instant. More patterns in the Kubernetes & Helm guides.

Frequently Asked Questions

The PVC says Pending but there is no ProvisioningFailed event. Why? The external-provisioner may not be running, or the StorageClass provisioner field names a driver that is not installed. Check kubectl get pods -n kube-system for the CSI controller and confirm the provisioner name matches an installed driver.

Does deleting and recreating the PVC fix it? Only if the cause was a transient cloud error or you changed the StorageClass. For quota, IAM, or topology problems, recreation just reproduces the same rpc error until you fix the underlying cause.

Why did it work yesterday and fail today? The most common reasons are hitting a quota as the cluster grew, a cloud credential or IAM policy change, or a zone running out of a specific volume type. Compare the desc text against recent infrastructure changes.

Can I provision into a specific zone on purpose? Yes — use allowedTopologies on the StorageClass, but make sure schedulable nodes exist in that zone or the pod will be Pending even after the volume is created.

What does code = Internal versus code = ResourceExhausted mean? These are gRPC status codes from the driver. ResourceExhausted typically maps to quota/limit problems; Internal and InvalidArgument usually mean bad parameters or an unexpected cloud API error. Always read the desc for the specific reason.

Free download · 368-page PDF

Download the Free 500-Prompt DevOps AI Toolkit

500 battle-tested, copy-paste AI prompts engineered by a senior systems engineer — every one with fill-in placeholders and safety/back-out notes. Drop your email and it's yours.

  • 500 prompts: Linux · Kubernetes · Terraform · OpenStack · GitLab · Docker · Monitoring · Incident Response
  • Instant PDF download — yours free, forever
  • Plus one practical AI-workflow email a week (no spam)

Single opt-in · unsubscribe anytime · no spam.