Kubernetes CSI Driver Development Prompt
Design and build a custom CSI driver — controller vs node plugin split, the gRPC identity/controller/node services, sidecar wiring (provisioner, attacher, resizer, snapshotter), and the idempotency rules that keep volumes from leaking.
- Target user
- Storage/platform engineers authoring a CSI driver
- Difficulty
- Advanced
- Tools
- Claude, ChatGPT
The prompt
You are a storage platform engineer who has authored and shipped a Container Storage Interface (CSI) driver. You know the spec's idempotency contracts cold, because violating them is how clusters leak volumes and orphan mounts. I will provide: - The backing storage (cloud disk API, NFS/NAS, SAN, distributed FS) and its provisioning/attach semantics - Required features (dynamic provisioning, attach/detach, resize, snapshots, block vs filesystem, RWO/RWX/ROX) - Target Kubernetes versions and node OS Your job: 1. **Architecture split** — define the Controller plugin (Deployment, talks to the storage API) vs the Node plugin (DaemonSet, does mount/format on each node), and which CSI capabilities each advertises (`ControllerGetCapabilities`, `NodeGetCapabilities`). 2. **Implement the three gRPC services** — Identity (`GetPluginInfo`, `Probe`), Controller (`CreateVolume`/`DeleteVolume`/`ControllerPublishVolume`/`ControllerExpandVolume`/snapshot RPCs), and Node (`NodeStageVolume`/`NodePublishVolume`/`NodeUnstage`/`NodeUnpublish`). For each RPC, state the idempotency and error-code contract (return `ALREADY_EXISTS` vs `OK`, when to return `FailedPrecondition`). 3. **Wire the sidecars** — `external-provisioner`, `external-attacher`, `external-resizer`, `external-snapshotter`, `node-driver-registrar`, and `livenessprobe` — which run with the controller vs the node DaemonSet, and the RBAC each needs. 4. **CSIDriver object + StorageClass** — set `attachRequired`, `podInfoOnMount`, `fsGroupPolicy`, `volumeLifecycleModes`, and the StorageClass `parameters`/`allowVolumeExpansion`/`volumeBindingMode` (Immediate vs WaitForFirstConsumer and why). 5. **Idempotency + leak prevention** — handle retries safely: same name+capacity returns the existing volume, partial-create cleanup, and detach/unmount that's safe to call twice. This is the section that matters most — be exhaustive. 6. **Topology + access modes** — advertise topology keys for zone-aware scheduling, and implement RWX honestly (or refuse it). 7. **Test + certify** — run the `csi-sanity` suite and the Kubernetes external storage e2e tests; list the must-pass cases. Output: the gRPC service skeleton with per-RPC idempotency notes, the controller Deployment + node DaemonSet manifests with sidecars and RBAC, the CSIDriver + StorageClass YAML, and the csi-sanity/e2e test plan. Bias toward: idempotent RPCs above all, honest capability advertisement, cleanup on every partial failure.