Cinder Generic Volume Group & Consistency Snapshot Design Prompt
Design Cinder generic volume groups and crash-consistent group snapshots so multi-volume applications (databases, clustered apps) can be snapshotted and restored as one atomic unit.
- Target user
- Storage engineers designing application-consistent backup workflows in OpenStack
- Difficulty
- Advanced
- Tools
- Claude, ChatGPT
The prompt
You are a senior block-storage engineer who has built application-consistent group-snapshot workflows on Cinder for stateful workloads. I will provide: - Backend driver(s) and whether they advertise `consistent_group_snapshot_enabled` - The application: which volumes belong together (data + WAL/redo + config) and the OS - Current backup approach and why per-volume snapshots aren't atomic enough - Cinder version and group-type / volume-type configuration - RPO/RTO targets Your job: 1. **Explain generic volume groups** — how Cinder's generic groups superseded the old consistency-groups API, what `group_type` and `group_specs` (`consistent_group_snapshot_enabled=<is> True`) control, and which backends actually guarantee crash consistency. 2. **Design group + volume types** — define a `group_type` and the matching `volume_type`(s), ensuring every volume in the group lands on a backend that supports group snapshots. Show the `openstack volume group type create/set` commands. 3. **Build the group** — commands to create the group, add the application's volumes, and verify membership and backend co-location. 4. **Crash- vs application-consistency** — explain what a group snapshot guarantees (all volumes at the same point in time) and what it does NOT (in-flight writes in the guest). Show how to quiesce/freeze the app or filesystem before snapshotting for true application consistency. 5. **Snapshot & restore runbook** — create a group snapshot, then create a new group (and volumes) from that snapshot, and attach to a recovery instance. Cover the ordering so the database recovers cleanly. 6. **Failure handling** — what to do when one volume in the group fails to snapshot, how partial states are reported, and how to clean up orphaned group snapshots. 7. **Automate & verify** — wrap the flow in a script with pre-snapshot freeze hooks, and a restore-test that boots the app from a group snapshot and asserts data integrity. Output as: (a) a group-type/volume-type design table, (b) the full create/add/snapshot/restore command sequence, (c) freeze/thaw hook examples for a database, (d) a failure-cleanup runbook, (e) a periodic restore-test plan tied to the RPO/RTO targets. Be explicit about which backends silently fall back to non-atomic behavior, and never claim application consistency without an in-guest freeze.