Integrating Cinder With NetApp and NFS Backends in OpenStack

The Cinder bug that took me longest to find wasn’t a Cinder bug at all. Volumes were timing out on attach, the Nova error pointed at the compute, and I burned an afternoon there before realizing the compute couldn’t even mount the NFS share — a NetApp export policy didn’t include that host’s IP. NFS and NetApp backends are wonderful for Cinder, but their failures hide in the storage export layer while surfacing as Nova attach errors. Here’s how I set these backends up, the pitfalls that bite, and how AI helps me debug across the OpenStack/storage seam without trusting its verdict blindly.

Configuring an NFS or NetApp Backend

Cinder backends are defined as stanzas in cinder.conf, one per backend, listed in enabled_backends. A generic NFS backend looks like this:

[nfs-backend]
volume_driver = cinder.volume.drivers.nfs.NfsDriver
nfs_shares_config = /etc/cinder/nfs_shares
nfs_mount_options = vers=4.1,hard,timeo=600,retrans=2
volume_backend_name = nfs

A NetApp ONTAP backend over NFS uses the vendor driver and adds the management LIF and credentials:

[netapp-nfs]
volume_driver = cinder.volume.drivers.netapp.common.NetAppDriver
netapp_storage_family = ontap_cluster
netapp_storage_protocol = nfs
netapp_server_hostname = 10.0.0.10
netapp_vserver = svm_openstack
nfs_shares_config = /etc/cinder/netapp_shares
volume_backend_name = netapp-nfs

Then you create a volume type and map it to the backend so users can target it. The openstack category collects the broader Cinder storage playbooks.

openstack volume type create netapp-gold
openstack volume type set netapp-gold --property volume_backend_name=netapp-nfs

Pitfall One: Export Policy and Reachability

The failure I opened with is the most common one. For NFS, every compute that might attach a volume must be able to mount the share — which for NetApp means the SVM’s export policy must include the compute’s IP, and the network path must exist. Before blaming Cinder, prove the mount works from the compute:

showmount -e <nfs-server>          # what's exported, to whom
mount -t nfs -o vers=4.1 <server>:/vol/share /mnt/test

If showmount doesn’t list the share to your host or the manual mount fails, the problem is the export policy, not OpenStack.

Prompt: “Here’s my showmount -e output from the NFS server, the export-policy rules from ONTAP, and the IPs of my 6 compute hosts. Build a matrix of which computes can mount each share and which are blocked by export-policy rules. Flag any compute that can’t mount a share its volumes might land on. This is read-only analysis — don’t propose ONTAP commands that change live export policies.”

Output: A reachability matrix that showed two newly-added computes missing from the export policy’s client list — exactly the hosts where attaches were timing out. It correctly framed this as the likely root cause rather than a Cinder driver issue.

That reachability matrix is the kind of cross-tabulation AI does in seconds and humans do in a spreadsheet. The model is a fast junior engineer here; I confirmed the two flagged computes really were missing from the live export policy before anyone touched it, because changing an export policy that backs running volumes can detach them everywhere.

Pitfall Two: Mount-Option Drift

Cinder mounts shares with nfs_mount_options, but reality and config drift. A volume mounted earlier, or a fallback, can use different options, and a soft vs hard or wrong vers mismatch quietly risks data under network blips. Diff configured against actual:

# what's configured
grep nfs_mount_options /etc/cinder/cinder.conf
# what's actually mounted on the compute
mount | grep nfs

Pro Tip: Always run NFS volumes with hard mounts, not soft. A soft mount returns errors on timeout instead of retrying, which under a brief network blip can corrupt an in-flight write. Have the AI diff your configured options against every active mount and flag any soft or mismatched vers — it’s a fast, high-value check.

Pitfall Three: Stale Handles After Failover

NetApp LIF migrations and storage failovers can leave stale NFS file handles on the compute — the mount looks present but every operation returns “Stale file handle.” This is where attaches mysteriously fail on some hosts after a storage event. The fix is a coordinated remount, but never force it under load:

ls /var/lib/cinder/mnt/<hash>   # "Stale file handle" if affected

When I’m sorting “which hosts have stale handles after the failover,” I’ll hand the mount state and the failover timeline to Claude and ask it to correlate which computes lost their handles to the LIF that migrated. That correlation saves real time; I drain attachments on the affected hosts before remounting, because forcing a remount with in-flight writes risks corruption. Reusable storage prompts live in the prompt workspace.

Pitfall Four: Thin Provisioning and Reported Capacity

NetApp and NFS backends often report capacity that assumes thin provisioning and dedup, and Cinder schedules against that reported number. If the backend reports more free space than it can actually honor, you can over-commit and hit out-of-space errors at write time. Check what the driver reports versus what the backend actually has free, and set max_over_subscription_ratio conservatively until you trust the numbers.

Validate on One Compute First

Every change here — export policy, mount options, remounts — gets validated on a single compute before fleet rollout. Create a test volume on the new volume type, attach it to an instance on one host, write and read data, then detach cleanly. If that round-trips, the backend is wired right. Doing it on one host means a misconfiguration costs you one host, not a storage-wide outage.

Conclusion

Cinder NFS and NetApp backends fail in the storage layer and complain in the Nova layer, which is why these problems eat afternoons. The discipline is to start at reachability — can the compute even mount the share — and work toward the symptom, not the other way around. AI is genuinely fast at the cross-seam reading: export-policy reachability matrices, mount-option diffs, stale-handle correlation after a failover. Every one of those is a lead you verify against the live backend before acting, because changing an export policy or forcing a remount under load can detach or corrupt running volumes. The model reads across the seam; you verify and validate on one host. More Cinder prompts are in the prompts library.