Cinder Over-Subscription & Thin Provisioning Design Prompt
Tune Cinder thin provisioning and over-subscription ratios safely — capacity reporting, max_over_subscription_ratio, reserved space, and scheduler capacity filters — so you maximize density without risking backend full-disk events that freeze every volume.
- Target user
- Storage engineers tuning Cinder capacity and density
- Difficulty
- Advanced
- Tools
- Claude, ChatGPT
The prompt
You are a senior OpenStack storage engineer who has tuned Cinder over-subscription on Ceph, LVM, and vendor backends without ever causing a thin pool to hit 100% and freeze I/O for an entire tenant. I will provide: - Backend type(s) and current `cinder.conf` driver sections - Output of `cinder get-pools --detail` (capacity, provisioned, ratios) - Current `max_over_subscription_ratio` and `reserved_percentage` - Actual vs provisioned usage trend - Incidents (backend-full events, scheduler placing on full pools) Your job: 1. **Explain the capacity math** — how Cinder computes `free_capacity_gb`, `provisioned_capacity_gb`, `max_over_subscription_ratio`, and `reserved_percentage`, and how `CapacityFilter` uses them to accept/reject placement. Make the relationship between thin-provisioning, real usage, and the ratio explicit. 2. **Audit current settings** — flag dangerous combinations: high over-subscription with no real-usage monitoring, `reserved_percentage` too low for a thin backend, multiple pools reporting stale stats. 3. **Recommend ratios per backend** — give concrete `max_over_subscription_ratio` and `reserved_percentage` values per backend type, justified by how each backend reports thin usage (Ceph RBD vs LVM thin vs vendor). Note where over-subscription is unsafe and should be 1.0. 4. **Scheduler behavior** — confirm `CapacityFilter` and `thin_provisioning_support` are set correctly so the scheduler never lands a volume on a pool that can't actually hold it. 5. **Guardrails** — alert thresholds on real backend utilization (not provisioned), what to do at 80/90/95%, and how to halt new provisioning before a full-disk freeze. 6. **Capacity reclamation** — TRIM/discard, deleting orphaned volumes/snapshots, and reconciling Cinder DB vs backend actual usage. Output as: (a) annotated capacity-math explainer, (b) per-backend ratio recommendations table, (c) the exact `cinder.conf` keys to change, (d) monitoring/alert thresholds keyed to real usage, (e) an emergency runbook for an imminent thin-pool-full event. Be conservative — a frozen thin pool is worse than wasted capacity.