Skip to content
CloudOps
All prompts
AI for OpenStack Difficulty: Advanced ClaudeChatGPT

Cinder Volume Troubleshooting Prompt

Diagnose stuck volumes, failed attachments, and backend issues (Ceph/LVM/iSCSI/NFS) in OpenStack Cinder using CLI output and service logs.

Target user
OpenStack storage and platform engineers
Difficulty
Advanced
Tools
Claude, ChatGPT

The prompt

You are a senior OpenStack storage engineer with deep experience operating Cinder against Ceph RBD, LVM-iSCSI, NFS, and vendor SAN drivers in production.

I will provide:
- A symptom (volume stuck in `creating`/`attaching`/`deleting`/`error_extending`, attach failures, "volume in use" loops, slow I/O, etc.)
- The Cinder backend (`rbd`, `lvm-iscsi`, `nfs`, `vmdk`, vendor-specific)
- The OpenStack release
- Output from `openstack volume show`, `cinder-volume`, `cinder-scheduler`, `cinder-api` logs
- Hypervisor side: `nova-compute` and `libvirt` logs if the issue is attachment-side

Your job:

1. **Identify which Cinder service** is the most likely failure point:
   - `cinder-api` (request never reached scheduler)
   - `cinder-scheduler` (no backend selected — capacity, weigher, filter)
   - `cinder-volume` (backend driver failed)
   - `nova-compute` (`os-brick` failed to attach on hypervisor)
   - `libvirt` / OS-level (block device not visible to QEMU)
2. **Walk the lifecycle**: create → scheduler → driver → quotas → attach → detach. Pin where it broke.
3. **For each candidate**: identify the *specific log line* you'd need to confirm or rule it out. Be exact about which service and which host.
4. **Label DANGEROUS recovery actions** explicitly: `cinder-manage volume update_host`, direct DB state changes, `rbd rm`, removing volumes still attached.
5. **Recommend the safest recovery path** with rollback. Prefer reversible actions (reset-state to `available`) over irreversible ones (DB updates).

Common failure classes:
- "Stuck in creating" for >5 min → scheduler failed silently, or driver in retry loop (check `cinder-scheduler` then `cinder-volume`)
- "Stuck in attaching" → `os-brick` on compute failed; check `nova-compute` and `multipathd` logs
- "Stuck in deleting" → backend driver detach failed but DB says detached; needs reset-state then retry
- Volume "in use" but VM gone → orphaned attachment record; needs `volume attachment` cleanup
- Slow I/O after migration → multipath not converged, or RBD client cache off
- `error_extending` → backend lacks space, or LVM extent boundary issue
- Quotas reject — sync `quota_usages` table vs actual

Backend: [rbd / lvm-iscsi / nfs / vmdk / vendor]
OpenStack release: [yoga / zed / antelope / bobcat / caracal / dalmatian / epoxy]
Symptom: [DESCRIBE]
Relevant output:
```
[PASTE]
```

Why this prompt works

Cinder issues span three different machines: the API node, the volume-service host (which may be very different per-backend), and the compute hypervisor where attach happens. Models love to suggest “restart cinder-volume” as a first response. This prompt forces lifecycle-aware diagnosis instead.

How to use it

  1. Always name the backend. A stuck volume on RBD is a totally different conversation than on LVM-iSCSI. The backend dictates which logs matter.
  2. Include openstack volume show <id> (full output, not summary) — os-vol-host-attr:host tells you exactly which cinder-volume instance owns it.
  3. Include openstack volume attachment list --volume <id> — attachment-record orphans are extremely common.
  4. If attach-side: include journalctl -u nova-compute -n 200 and the libvirt log for the affected VM.

Useful commands to gather first

# Cinder side
openstack volume show <volume-id>
openstack volume attachment list --volume <volume-id>
openstack volume service list
sudo journalctl -u cinder-scheduler -n 200 --no-pager
sudo journalctl -u cinder-volume -n 200 --no-pager  # on the volume host
sudo journalctl -u cinder-api -n 100 --no-pager

# Compute / hypervisor side (on the host where the VM lives)
sudo journalctl -u nova-compute -n 200 --no-pager
sudo virsh list --all
sudo virsh dumpxml <instance-uuid> | grep -A2 disk
sudo multipath -ll  # for iSCSI/FC backends
sudo iscsiadm -m session  # for iSCSI

# Backend-specific
sudo rbd ls -p <cinder-pool>  # RBD
sudo lvs <cinder-volumes-vg>  # LVM
sudo showmount -e <nfs-host>  # NFS

Common findings this catches

  • os-vol-host-attr:host points to a dead cinder-volume service → migration of host pointer needed (cinder-manage volume update_hostonly with service stopped).
  • Attachment record exists but no <disk> in libvirt XML → orphan; safe to delete attachment record after confirming VM-side.
  • Scheduler accepted then volume errored → capacity filter mismatched (reserved_percentage too low, or backend reporting stale total_capacity_gb).
  • Volume in-use but VM deleted long ago → instance race during shutdown; attachment record never cleared.
  • error_deleting on RBD → snapshot dependency or watcher still holding the image.

When to escalate to your storage team

If the AI suggests:

  • Editing Cinder DB tables directly
  • rbd rm / lvremove while Cinder thinks the volume exists
  • cinder-manage volume update_host while cinder-volume is running

…stop and pull in storage on-call. These are the operations that cause the next incident, not the one you’re trying to solve.

Related prompts

Newsletter

Get weekly AI workflows for DevOps engineers

Practical prompts, automation ideas, and tool reviews for infrastructure engineers. One email per week. No spam.