OpenStack Error Guide: Cinder Volume in 'error' /

Overview

A Cinder volume in an error state means a backend operation failed and the volume is no longer in a usable, well-defined condition. Cinder uses a set of terminal error statuses — error, error_deleting, error_extending, error_restoring — each tied to the operation that was in flight when the driver raised an exception. The volume row stays in the database, the quota stays consumed, and most follow-on actions (attach, snapshot, retype) are blocked until the status is cleared.

You will typically see the bad status in a list or show:

+--------------------------------------+----------+----------------+------+
| ID                                   | Name     | Status         | Size |
+--------------------------------------+----------+----------------+------+
| 7c1f9a2b-44d0-4e1a-9b22-aa1122334455 | data-vol | error          |   50 |
| 91aa3c5d-1122-4f8e-bb33-cc4455667788 | logs-vol | error_deleting |  100 |
+--------------------------------------+----------+----------------+------+

The status itself is only a symptom. The real cause is in the cinder-volume log for the host that owns the volume, where the driver (Ceph RBD, LVM/iSCSI, NetApp, etc.) recorded why the create, attach, delete, retype, or migrate failed. Resetting the state without reading that log often just hides a problem that will recur.

Symptoms

openstack volume show reports status of error, error_deleting, or error_extending.
openstack volume create returns success but the volume lands in error seconds later.
openstack volume delete leaves the volume in error_deleting instead of removing it.
openstack volume set --size (extend) leaves error_extending.
An instance fails to attach the volume, or an attachment is stuck in attaching/detaching.

openstack volume show data-vol -c status -c volume_type -c os-vol-host-attr:host -f value

error
ceph-ssd
hostgroup@ceph-ssd#ceph-ssd

Common Root Causes

1. Backend create failed (pool, quota, or capacity)

The scheduler picked a backend, but the driver could not actually carve out the volume — the Ceph pool is full or missing, the LVM volume group is out of space, or the SAN rejected the LUN.

# Kolla-Ansible (on the cinder-volume host)
docker logs cinder_volume 2>&1 | grep -i "7c1f9a2b" | tail -20
# Traditional packages
sudo journalctl -u openstack-cinder-volume | grep -i "7c1f9a2b" | tail -20

ERROR cinder.volume.flows.manager.create_volume Volume 7c1f9a2b... create failed:
rados.Error: error creating image: [errno 28] error creating image (No space left on device)

2. Stuck attachment / os-brick connector failure

A volume gets wedged in error when an attach or detach half-completes — the connection was exported on the backend but os-brick on the compute node could not connect or disconnect the iSCSI/RBD device.

openstack volume show data-vol -c attachments -f value
nova volume-attachments <SERVER_ID>

[{'server_id': 'a9b8...', 'attachment_id': 'd4c3...', 'device': '/dev/vdb'}]

A dangling attachment_id with the instance long gone is the classic stuck-attachment signature.

3. error_deleting — backend object could not be removed

The delete reached the driver, but the backend refused: the RBD image still has a snapshot or watcher, the LVM logical volume is open, or the iSCSI target is still exported.

ERROR cinder.volume.manager Cannot delete volume 91aa3c5d...: 
rbd.ImageBusy: error removing image: [errno 16] error removing image (Device or resource busy)

4. error_extending — online or offline grow failed

Extending failed because the backend could not grow the image (no capacity) or the running guest path failed to refresh, leaving Cinder unsure of the real size.

ERROR cinder.volume.manager Extend volume failed.
StorageError: Failed to extend volume <id>: LV vg-cinder/volume-... not large enough

5. Retype or migrate aborted mid-flight

A cinder retype --migration-policy on-demand or cinder migrate that fails partway can leave the source volume in error with a half-built copy on the destination backend.

ERROR cinder.volume.manager Migrate volume completion failed.
VolumeMigrationFailed: Volume migration failed: destination backend reported no capacity

6. cinder-volume host down with operations in flight

If the cinder-volume service crashes or its host reboots while operations are running, those volumes are left in the active verb status (creating, deleting) and are swept to error on recovery.

cinder service-list --binary cinder-volume

| cinder-volume | hostgroup@ceph-ssd | nova | enabled | down | 2026-06-24T09:01:14 |

Diagnostic Workflow

Step 1: Read the full status and owning host

openstack volume show <VOLUME_ID> \
  -c status -c volume_type -c bootable -c attachments \
  -c "os-vol-host-attr:host" -f value

The os-vol-host-attr:host field (backend@pool#pool) tells you exactly which cinder-volume host and pool to inspect next.

Step 2: Confirm the backend service is up

cinder service-list

If the owning cinder-volume is down, fix that first — half the “error” volumes are just orphaned in-flight operations.

# Kolla-Ansible
docker restart cinder_volume
# Traditional packages
sudo systemctl restart openstack-cinder-volume

Step 3: Grep the cinder-volume log for this volume ID

# Kolla-Ansible
docker logs cinder_volume 2>&1 | grep -i "<VOLUME_ID>" | tail -30
# Traditional packages
sudo journalctl -u openstack-cinder-volume --no-pager | grep -i "<VOLUME_ID>" | tail -30

Look for the driver exception (rados.Error, rbd.ImageBusy, StorageError, iSCSI/os_brick errors). This is the actual root cause.

Step 4: Check for stuck attachments

openstack volume show <VOLUME_ID> -c attachments -f value
openstack server show <SERVER_ID> -c "os-extended-volumes:volumes_attached" -f value

If the volume claims an attachment but the instance does not (or no longer exists), the attachment record is stale and must be cleared before the volume can be made usable.

Step 5: Decide — reset-state or repair the backend

Only after Steps 3-4 do you touch the status. If the backend object is healthy and the error came from a transient/control-plane fault, reset-state is safe:

cinder reset-state --state available <VOLUME_ID>

If the driver log shows the backend object is busy, has snapshots, or is genuinely missing, fix the backend (or clean the orphan) before resetting — otherwise you create a phantom Cinder volume that points at nothing.

Important: cinder reset-state only edits the database row. It does NOT touch the backend. Use it to correct a status that is wrong, never to “force” a delete of an object the driver said it could not remove.

Example Root Cause Analysis

A volume logs-vol is stuck in error_deleting and reappears every time the user retries the delete.

The owning host is hostgroup@ceph-ssd#ceph-ssd. Grepping the cinder-volume log:

ERROR cinder.volume.manager Cannot delete volume 91aa3c5d...:
rbd.ImageBusy: error removing image: [errno 16] error removing image (Device or resource busy)

The RBD image is busy. Checking Ceph directly on the storage node:

rbd -p volumes snap ls volume-91aa3c5d-1122-4f8e-bb33-cc4455667788
rbd -p volumes status volume-91aa3c5d-1122-4f8e-bb33-cc4455667788

SNAPID NAME                 SIZE
   14 snapshot-3e2a...    100 GiB
Watchers: none

A leftover RBD snapshot (from a Cinder snapshot whose DB row was already gone) is pinning the image, so the driver cannot remove it. The fix is to clear the orphaned snapshot, then let the delete complete:

# remove the orphan snapshot on the backend
rbd -p volumes snap rm volume-91aa3c5d-1122-4f8e-bb33-cc4455667788@snapshot-3e2a...
# now reset and re-issue the delete through Cinder
cinder reset-state --state available 91aa3c5d-1122-4f8e-bb33-cc4455667788
openstack volume delete 91aa3c5d-1122-4f8e-bb33-cc4455667788

With the backend snapshot gone, the RBD image is no longer busy and Cinder removes it cleanly — releasing the 100 GB back to the project’s quota.

Prevention Best Practices

Always read the cinder-volume log before running reset-state. The status is a symptom; resetting it without fixing the backend creates phantom volumes that fail again at the next operation.
Alert on cinder service-list showing any cinder-volume as down. Operations in flight during an outage are the most common source of mass error/error_deleting volumes.
Monitor backend capacity (Ceph pool %USED, LVM VG free, SAN pool) and keep reserved_percentage honest so creates fail at the scheduler, not half-way through the driver.
Treat error_deleting as a backend-cleanup task: confirm the RBD image/LV/LUN is actually gone before declaring the volume deleted, so quota and storage stay in sync.
Quota note: error-state volumes still count against gigabytes and volumes quota. Reconcile abandoned error volumes regularly or projects will hit limits on phantom usage.
For fast triage, the free incident assistant can turn a cinder-volume traceback into the likely driver cause. See more in OpenStack guides.

Quick Command Reference

# Full status + owning backend host
openstack volume show <VOLUME_ID> -c status -c "os-vol-host-attr:host" -c attachments -f value

# Is the owning cinder-volume service up?
cinder service-list --binary cinder-volume

# Root cause: grep the driver log for this volume
docker logs cinder_volume 2>&1 | grep -i "<VOLUME_ID>" | tail -30
sudo journalctl -u openstack-cinder-volume | grep -i "<VOLUME_ID>" | tail -30

# Inspect / clear stuck attachments
openstack volume show <VOLUME_ID> -c attachments -f value

# Backend checks (Ceph RBD example)
rbd -p volumes status volume-<VOLUME_ID>
rbd -p volumes snap ls volume-<VOLUME_ID>

# Reset state ONLY after the backend is confirmed healthy/clean
cinder reset-state --state available <VOLUME_ID>
cinder reset-state --state error <VOLUME_ID>   # force back to error if needed

# Retry the delete once the backend is clean
openstack volume delete <VOLUME_ID>

# Restart the backend service
docker restart cinder_volume
sudo systemctl restart openstack-cinder-volume

Conclusion

A Cinder volume in error, error_deleting, or error_extending is the database recording that a backend operation failed. The path to recovery is always the same:

Identify the owning cinder-volume host and confirm the service is up.
Grep that host’s log for the volume ID to find the real driver exception.
Check for stuck os-brick attachments left by a failed attach/detach.
Fix the backend object (Ceph RBD busy/snapshot, LVM, iSCSI, SAN capacity) before touching status.
Only then use cinder reset-state to correct the row — never to mask a backend that genuinely failed.

Reset-state is a database-only tool. Used after a real fix it restores order; used as a shortcut it leaves phantom volumes that consume quota and break on the next operation. Read the driver log first, repair the backend, then reset.

OpenStack Error Guide: Cinder Volume in 'error' / 'error_deleting' Status