Cinder Volume Troubleshooting Prompt
Diagnose stuck volumes, failed attachments, and backend issues (Ceph/LVM/iSCSI/NFS) in OpenStack Cinder using CLI output and service logs.
- Target user
- OpenStack storage and platform engineers
- Difficulty
- Advanced
- Tools
- Claude, ChatGPT
The prompt
You are a senior OpenStack storage engineer with deep experience operating Cinder against Ceph RBD, LVM-iSCSI, NFS, and vendor SAN drivers in production. I will provide: - A symptom (volume stuck in `creating`/`attaching`/`deleting`/`error_extending`, attach failures, "volume in use" loops, slow I/O, etc.) - The Cinder backend (`rbd`, `lvm-iscsi`, `nfs`, `vmdk`, vendor-specific) - The OpenStack release - Output from `openstack volume show`, `cinder-volume`, `cinder-scheduler`, `cinder-api` logs - Hypervisor side: `nova-compute` and `libvirt` logs if the issue is attachment-side Your job: 1. **Identify which Cinder service** is the most likely failure point: - `cinder-api` (request never reached scheduler) - `cinder-scheduler` (no backend selected — capacity, weigher, filter) - `cinder-volume` (backend driver failed) - `nova-compute` (`os-brick` failed to attach on hypervisor) - `libvirt` / OS-level (block device not visible to QEMU) 2. **Walk the lifecycle**: create → scheduler → driver → quotas → attach → detach. Pin where it broke. 3. **For each candidate**: identify the *specific log line* you'd need to confirm or rule it out. Be exact about which service and which host. 4. **Label DANGEROUS recovery actions** explicitly: `cinder-manage volume update_host`, direct DB state changes, `rbd rm`, removing volumes still attached. 5. **Recommend the safest recovery path** with rollback. Prefer reversible actions (reset-state to `available`) over irreversible ones (DB updates). Common failure classes: - "Stuck in creating" for >5 min → scheduler failed silently, or driver in retry loop (check `cinder-scheduler` then `cinder-volume`) - "Stuck in attaching" → `os-brick` on compute failed; check `nova-compute` and `multipathd` logs - "Stuck in deleting" → backend driver detach failed but DB says detached; needs reset-state then retry - Volume "in use" but VM gone → orphaned attachment record; needs `volume attachment` cleanup - Slow I/O after migration → multipath not converged, or RBD client cache off - `error_extending` → backend lacks space, or LVM extent boundary issue - Quotas reject — sync `quota_usages` table vs actual Backend: [rbd / lvm-iscsi / nfs / vmdk / vendor] OpenStack release: [yoga / zed / antelope / bobcat / caracal / dalmatian / epoxy] Symptom: [DESCRIBE] Relevant output: ``` [PASTE] ```
Why this prompt works
Cinder issues span three different machines: the API node, the volume-service host (which may be very different per-backend), and the compute hypervisor where attach happens. Models love to suggest “restart cinder-volume” as a first response. This prompt forces lifecycle-aware diagnosis instead.
How to use it
- Always name the backend. A stuck volume on RBD is a totally different conversation than on LVM-iSCSI. The backend dictates which logs matter.
- Include
openstack volume show <id>(full output, not summary) —os-vol-host-attr:hosttells you exactly whichcinder-volumeinstance owns it. - Include
openstack volume attachment list --volume <id>— attachment-record orphans are extremely common. - If attach-side: include
journalctl -u nova-compute -n 200and the libvirt log for the affected VM.
Useful commands to gather first
# Cinder side
openstack volume show <volume-id>
openstack volume attachment list --volume <volume-id>
openstack volume service list
sudo journalctl -u cinder-scheduler -n 200 --no-pager
sudo journalctl -u cinder-volume -n 200 --no-pager # on the volume host
sudo journalctl -u cinder-api -n 100 --no-pager
# Compute / hypervisor side (on the host where the VM lives)
sudo journalctl -u nova-compute -n 200 --no-pager
sudo virsh list --all
sudo virsh dumpxml <instance-uuid> | grep -A2 disk
sudo multipath -ll # for iSCSI/FC backends
sudo iscsiadm -m session # for iSCSI
# Backend-specific
sudo rbd ls -p <cinder-pool> # RBD
sudo lvs <cinder-volumes-vg> # LVM
sudo showmount -e <nfs-host> # NFS
Common findings this catches
os-vol-host-attr:hostpoints to a dead cinder-volume service → migration of host pointer needed (cinder-manage volume update_host— only with service stopped).- Attachment record exists but no
<disk>in libvirt XML → orphan; safe to delete attachment record after confirming VM-side. - Scheduler accepted then volume errored → capacity filter mismatched (
reserved_percentagetoo low, or backend reporting staletotal_capacity_gb). - Volume
in-usebut VM deleted long ago → instance race during shutdown; attachment record never cleared. error_deletingon RBD → snapshot dependency or watcher still holding the image.
When to escalate to your storage team
If the AI suggests:
- Editing Cinder DB tables directly
rbd rm/lvremovewhile Cinder thinks the volume existscinder-manage volume update_hostwhilecinder-volumeis running
…stop and pull in storage on-call. These are the operations that cause the next incident, not the one you’re trying to solve.
Related prompts
-
OpenStack Request-ID Log Trace Prompt
Correlate a single API request across services (nova-api → conductor → scheduler → compute → neutron → cinder) using OpenStack request IDs.
-
OpenStack VM Troubleshooting Prompt
Diagnose Nova VM boot failures, networking issues, and stuck instances using nova/openstack CLI output.
-
RabbitMQ Queue Investigation Prompt
Investigate backed-up queues, dead-letter spillover, and consumer issues in RabbitMQ clusters.