You are a senior Kubernetes storage engineer with deep experience operating CSI drivers (EBS, Persistent Disk, Ceph RBD, Longhorn, Portworx, NFS-CSI) in production. I will provide: - The symptom: PVC stuck in `Pending`, pod stuck in `ContainerCreating` waiting on volume, `Multi-Attach error`, `failed to provision`, slow I/O, or `volume in use cannot delete` - PVC + PV YAML (`kubectl get pvc <p> -o yaml`, `kubectl get pv <pv> -o yaml`) - StorageClass: `kubectl get sc <sc> -o yaml` - Recent events on the PVC/pod - CSI driver pods (logs from the affected node's `csi-node-*` and the cluster-wide `csi-provisioner` / `csi-attacher` / `csi-resizer`) - Pod spec with the volume mount - The CSI driver name + version Your job: 1. **Walk the PVC lifecycle** to find the failing stage: - `Pending` waiting on provisioner → check `csi-provisioner` logs; usually quota, IAM, or invalid params - `Bound` but pod stuck `ContainerCreating` → attach phase (`csi-attacher`) or mount phase (`csi-node` on the affected node) - `Multi-Attach error for volume...` → `ReadWriteOnce` PV is being attached to a second node before the first node releases (often when pod moves) - `Bound` and mounted, but slow → backend issue, not Kubernetes 2. **Match access modes correctly**: - `RWO` → one node at a time; pod moves between nodes need force-detach - `RWX` → multiple nodes; CSI driver must support it (NFS yes, EBS no, Ceph CephFS yes) - `ROX` → rare; multiple readers only - `RWOP` (1.27+) → ReadWriteOncePod — one POD, even tighter than RWO 3. **Decode StorageClass parameters**: - `volumeBindingMode: WaitForFirstConsumer` vs `Immediate` - Immediate provisions at PVC create (may bind to wrong AZ on cloud) - WaitForFirstConsumer waits for pod scheduling to know zone; preferred for zonal disks - `reclaimPolicy: Delete` vs `Retain` — `Delete` removes the backend volume on PVC delete; `Retain` keeps it (orphaned) - Provisioner-specific params: `fsType`, `iops`, `throughput`, encryption, tags 4. **For stuck deletes**: check the finalizer list on PVC and PV. `kubernetes.io/pvc-protection` and `kubernetes.io/pv-protection` block deletion until referencers go away. 5. **For Multi-Attach errors**: - Old node went down hard; volume still "attached" in cloud API - Force-detach is risky — write-cache loss possible - Newer K8s + CSI handle this with `node.kubernetes.io/out-of-service` taint on the dead node 6. **For CSI driver crashes**: check pod logs, restart counts, RBAC for the csi service account, volume attachment limits per node (`maxVolumesPerNode`). 7. **For slow I/O**: validate it's a Kubernetes-layer issue and not just the backend. `kubectl exec` and `fio` / `dd` from within the pod tells you the real bandwidth. 8. Mark every DESTRUCTIVE action clearly: editing PV `reclaimPolicy` from Delete to Retain mid-flight (good idea, but timing matters), force-removing finalizers (orphans backend), deleting a Bound PVC. --- CSI driver + version: [e.g., ebs.csi.aws.com v1.30] Cluster context: [cloud provider / on-prem / k3s, etc.] Symptom: [DESCRIBE] PVC YAML: ```yaml [PASTE] ``` PV YAML (if bound): ```yaml [PASTE] ``` StorageClass YAML: ```yaml [PASTE] ``` Events on PVC + pod: ``` [PASTE kubectl describe pvc + kubectl describe pod] ``` CSI logs (controller + node on affected node): ``` [PASTE] ```

Why this prompt works

Storage failures in Kubernetes cross at least three components: the PVC controller (kube-controller-manager), the CSI driver (cluster-wide + per-node), and the backend storage system. The visible state (Pending, ContainerCreating) doesn’t tell you which one failed. This prompt forces a stage-aware diagnosis and flags the destructive recovery actions.

How to use it

Always include the StorageClass YAML — half of “stuck Pending” PVCs are wrong volumeBindingMode for a zonal disk.
For multi-node clusters with zonal disks, mention the AZ of the pod’s target node. If the disk is in a different zone, no amount of waiting fixes it.
Include both controller-side and node-side CSI logs — the failure is often on the node, but the user only checks the controller.
For “Multi-Attach error”, mention what happened to the previous pod (node went down? rolling deploy?).

Useful commands

# PVC + PV state
kubectl get pvc -A
kubectl describe pvc <pvc> -n <ns>
kubectl get pv <pv> -o yaml
kubectl describe pv <pv>

# StorageClass
kubectl get sc
kubectl describe sc <sc>

# CSI driver pods
kubectl get pods -n kube-system -l app.kubernetes.io/name=<csi-driver>
kubectl logs -n kube-system <csi-controller-pod> -c csi-provisioner --tail=200
kubectl logs -n kube-system <csi-controller-pod> -c csi-attacher --tail=200
kubectl logs -n kube-system <csi-node-pod-on-affected-node> --tail=200

# Volume attachments
kubectl get volumeattachment
kubectl get volumeattachment <va> -o yaml

# Pod-level mount errors
kubectl describe pod <pod>          # look for Events section
kubectl get events --field-selector involvedObject.name=<pod>

# In-pod I/O test
kubectl exec -n <ns> <pod> -- df -h /data
kubectl exec -n <ns> <pod> -- dd if=/dev/zero of=/data/test bs=1M count=100 oflag=direct

# CSI driver capabilities
kubectl get csidriver
kubectl get csidriver <name> -o yaml

# Snapshots (if VolumeSnapshot enabled)
kubectl get volumesnapshot -A
kubectl get volumesnapshotclass

# Stuck delete — see what's holding it
kubectl get pvc <pvc> -o yaml | grep -A5 finalizers
kubectl get pv <pv> -o yaml | grep -A5 finalizers

Decision matrix

Symptom	Where to look first
PVC `Pending` immediately	`csi-provisioner` logs — IAM/quota/params
PVC `Pending` but pod scheduled	`volumeBindingMode: WaitForFirstConsumer` — normal, will resolve
Pod `ContainerCreating` for >2m	`csi-attacher` (controller) and `csi-node` on the target node
`Multi-Attach error`	Previous attachment not released; check old pod’s node
`failed to provision volume`	CSI provisioner; check StorageClass params + cloud quota
Slow I/O inside pod	Backend, not K8s; test from another mount of same backend
`volume in use, cannot delete`	Finalizers; check `kubectl get pvc/pv -o yaml

Common findings this catches

PVC Pending with no events on cloud cluster — volumeBindingMode: Immediate + zonal SC, but no node available in the SC’s zone. Switch to WaitForFirstConsumer.
Pod ContainerCreating after node failure — RWO volume still attached to dead node; cloud API thinks it’s busy. Taint dead node with node.kubernetes.io/out-of-service:NoExecute to trigger CSI cleanup (K8s 1.26+).
PVC stuck Terminating — pod still using it. kubectl get pods -A | grep <pvc> finds the holder; once removed, finalizer releases.
PV stuck Released — reclaimPolicy: Retain left it; admin must kubectl patch pv <pv> -p '{"spec":{"claimRef": null}}' to reuse, or delete to clean up.
CSI controller missing IAM permissions — csi-provisioner logs show AccessDenied. Common after IAM role changes.
maxVolumesPerNode reached — on AWS, EBS has per-instance attach limits. New pods stuck ContainerCreating even with PVCs already bound.

Recovery patterns

Recover from stuck Terminating PVC after pod gone

# 1. Confirm no pod uses it
kubectl get pods -A -o jsonpath='{range .items[*]}{range .spec.volumes[*]}{.persistentVolumeClaim.claimName}{"\n"}{end}{end}' | grep <pvc>

# 2. If truly orphaned, finalizer should auto-clear once pod is gone.
#    If not (PVC stuck in Terminating after pod delete):
kubectl get pvc <pvc> -o yaml | grep finalizers   # confirm "kubernetes.io/pvc-protection"
# Manual removal (DESTRUCTIVE if backend not actually free):
kubectl patch pvc <pvc> -n <ns> -p '{"metadata":{"finalizers":null}}'

Switch reclaimPolicy on existing PVs

kubectl get pv | awk '/<class-name>/ {print $1}' | xargs -I{} kubectl patch pv {} \
  -p '{"spec":{"persistentVolumeReclaimPolicy":"Retain"}}'

Expand a PVC

# Requires `allowVolumeExpansion: true` in StorageClass
kubectl patch pvc <pvc> -n <ns> -p '{"spec":{"resources":{"requests":{"storage":"20Gi"}}}}'
# Expansion may require pod restart; watch the FileSystemResizePending condition
kubectl describe pvc <pvc>

When to escalate

CSI driver pods crashing repeatedly — engage the CSI driver maintainer’s support (cloud provider or vendor); usually a quota or RBAC issue, but can be a driver bug.
Backend storage system in degraded state — fix the backend before retrying K8s operations.
Data integrity concerns after a forced detach — restore from snapshot rather than trusting fsck on a volume with potential cache loss.

Reading prompts? Get all 500 in one free PDF

500 battle-tested, copy-paste AI prompts engineered by a senior systems engineer — every one with fill-in placeholders and safety/back-out notes. Drop your email and it's yours.

500 prompts: Linux · Kubernetes · Terraform · OpenStack · GitLab · Docker · Monitoring · Incident Response

Instant PDF download — yours free, forever

Plus one practical AI-workflow email a week (no spam)

Kubernetes PV / PVC / CSI Storage Troubleshooting Prompt

Why this prompt works

How to use it

Useful commands

Decision matrix

Common findings this catches

Recovery patterns

Recover from stuck Terminating PVC after pod gone

Switch reclaimPolicy on existing PVs

Expand a PVC

When to escalate

Related prompts

Kubernetes Pod Troubleshooting Prompt

Cinder Volume Troubleshooting Prompt

Kubernetes YAML Security Review Checklist Prompt

Kubernetes Volume Populators & dataSourceRef Design Prompt

Reading prompts? Get all 500 in one free PDF