Skip to content
DevOps AI ToolKit
Newsletter
All guides
AI for Kubernetes & Helm By James Joyner IV · · 9 min read

Persistent Storage in Kubernetes: PVCs, StorageClasses, and StatefulSets

Storage is where stateless Kubernetes intuition breaks down. Here's how PVs, PVCs, StorageClasses, and StatefulSets fit together, with AI help debugging stuck volumes.

  • #kubernetes
  • #storage
  • #pvc
  • #statefulset
  • #ai
  • #databases

Kubernetes makes stateless workloads feel easy, which is exactly why storage feels so hard the first time. Pods are ephemeral, they move between nodes, and yet your database needs its data to survive a reschedule. The objects that make that work — PersistentVolumes, PersistentVolumeClaims, StorageClasses, StatefulSets — fit together in a way that’s logical once you see it and baffling until you do.

Here’s the model, the common failure modes, and where AI helps when a volume is stuck.

The three objects and the claim-check pattern

Think of it like a coat check:

  • PersistentVolume (PV) — an actual piece of storage (an EBS volume, a Ceph RBD, an NFS export). The coat.
  • PersistentVolumeClaim (PVC) — a request for storage of a certain size and access mode. Your claim ticket.
  • StorageClass — defines how PVs get created when a PVC asks for one. The coat-check policy.

Your pod references a PVC. The PVC binds to a PV. You almost never create PVs by hand anymore — the StorageClass provisions them dynamically when a PVC appears.

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: data
spec:
  accessModes: ["ReadWriteOnce"]
  storageClassName: fast-ssd
  resources:
    requests:
      storage: 20Gi

Access modes are not what they sound like

The single most misunderstood thing in Kubernetes storage:

  • ReadWriteOnce (RWO) — mountable read-write by one node at a time. Most block storage (EBS, GCE PD). Multiple pods can share it only if they’re on the same node.
  • ReadWriteMany (RWX) — read-write by many nodes. Requires a file storage backend (NFS, EFS, CephFS). Block storage cannot do this.
  • ReadOnlyMany (ROX) — read-only by many nodes.

People design a scale-out deployment with three replicas all writing to one RWO volume, then can’t understand why only one pod schedules. RWO is per-node. If you need shared read-write across nodes, you need an RWX-capable backend, full stop.

WaitForFirstConsumer: the binding mode that matters

A StorageClass setting that prevents a whole class of bugs:

volumeBindingMode: WaitForFirstConsumer

With the default Immediate, the volume is provisioned the moment the PVC is created — possibly in a zone where your pod can’t schedule, leaving the pod Pending forever with a zone-mismatch error. WaitForFirstConsumer delays provisioning until a pod is scheduled, so the volume lands in the right zone. Use it for any zonal block storage. This one setting eliminates the most maddening Pending-pod-with-volume scenario.

StatefulSets: stable identity for stateful apps

A Deployment gives you interchangeable pods. A database needs the opposite — stable names, stable storage, ordered startup. That’s a StatefulSet:

  • Pods get stable ordinal names (postgres-0, postgres-1), not random hashes.
  • Each pod gets its own PVC via volumeClaimTemplatespostgres-0 always reattaches to its volume, even after rescheduling.
  • Startup and scale-down are ordered.
volumeClaimTemplates:
- metadata:
    name: data
  spec:
    accessModes: ["ReadWriteOnce"]
    storageClassName: fast-ssd
    resources:
      requests:
        storage: 50Gi

Critical gotcha: deleting a StatefulSet does not delete its PVCs. That’s a safety feature — your data survives. But it means scaling down and back up reattaches old data, and cleaning up for real requires deleting the PVCs explicitly.

Where AI helps debugging stuck storage

Storage problems are quiet — a PVC sits Pending, a pod won’t start, and the real error is buried in a provisioner log or a describe event three objects away. Gather the evidence and ask:

“This PVC is stuck Pending and the pod won’t schedule. Here’s the PVC, the StorageClass, the pod’s describe events, and the CSI provisioner logs. What’s blocking the bind — access mode, zone, binding mode, or quota?”

The model is good at lining up the zone in the node, the binding mode in the StorageClass, and the access mode in the PVC to find the mismatch — tedious correlation across objects. Keep a few Kubernetes storage prompts on hand for these.

The failure modes you’ll actually hit

  • PVC Pending, no PV. Usually no default StorageClass, a typo’d storageClassName, or Immediate binding into the wrong zone. kubectl get storageclass and check for the (default) marker.
  • Pod stuck ContainerCreating with “Multi-Attach error.” An RWO volume is still attached to a pod on another node — common after an ungraceful node failure. The old attachment has to be released first.
  • Volume full. Pods don’t get evicted for a full PVC; the app just gets ENOSPC. Monitor kubelet_volume_stats_used_bytes.
  • Resize stuck. allowVolumeExpansion: true must be on the StorageClass and the backend must support online resize.

Backups are not optional, and snapshots aren’t backups

A snapshot on the same backend dies with the backend. Use the VolumeSnapshot API for fast point-in-time copies, but ship them off-cluster (a tool like Velero) for real disaster recovery. The day you lose the storage backend is the day you learn whether your “backups” left the building.

A storage setup that won’t page you

My checklist for any stateful workload:

  1. Right access mode for the topology — RWO is per-node.
  2. WaitForFirstConsumer for zonal block storage.
  3. StatefulSet with volumeClaimTemplates for databases.
  4. allowVolumeExpansion: true so you can grow without migrating.
  5. VolumeSnapshots shipped off-cluster for real DR.
  6. Alerts on volume utilization before ENOSPC.

Before storage manifests ship, run them through the Code Review tool — it flags the RWO-with-multiple-replicas mistake and StorageClasses missing expansion support.

Storage is where stateless intuition breaks, but the model is consistent: a claim binds to a volume, a class provisions it, and a StatefulSet keeps the identity stable. Get the access mode and binding mode right and most of the pain disappears.

AI storage diagnoses are assistive. Always confirm against your provisioner and backend before changing storage configuration.

Free download · 368-page PDF

Download the Free 500-Prompt DevOps AI Toolkit

500 battle-tested, copy-paste AI prompts engineered by a senior systems engineer — every one with fill-in placeholders and safety/back-out notes. Drop your email and it's yours.

  • 500 prompts: Linux · Kubernetes · Terraform · OpenStack · GitLab · Docker · Monitoring · Incident Response
  • Instant PDF download — yours free, forever
  • Plus one practical AI-workflow email a week (no spam)

Single opt-in · unsubscribe anytime · no spam.