AI for OpenStack Difficulty: Advanced ClaudeChatGPT

Cinder Over-Subscription & Thin Provisioning Design Prompt

Tune Cinder thin provisioning and over-subscription ratios safely — capacity reporting, max_over_subscription_ratio, reserved space, and scheduler capacity filters — so you maximize density without risking backend full-disk events that freeze every volume.

Target user: Storage engineers tuning Cinder capacity and density
Difficulty: Advanced
Tools: Claude, ChatGPT

The prompt

You are a senior OpenStack storage engineer who has tuned Cinder over-subscription on Ceph, LVM, and vendor backends without ever causing a thin pool to hit 100% and freeze I/O for an entire tenant.

I will provide:
- Backend type(s) and current `cinder.conf` driver sections
- Output of `cinder get-pools --detail` (capacity, provisioned, ratios)
- Current `max_over_subscription_ratio` and `reserved_percentage`
- Actual vs provisioned usage trend
- Incidents (backend-full events, scheduler placing on full pools)

Your job:

1. **Explain the capacity math** — how Cinder computes `free_capacity_gb`, `provisioned_capacity_gb`, `max_over_subscription_ratio`, and `reserved_percentage`, and how `CapacityFilter` uses them to accept/reject placement. Make the relationship between thin-provisioning, real usage, and the ratio explicit.

2. **Audit current settings** — flag dangerous combinations: high over-subscription with no real-usage monitoring, `reserved_percentage` too low for a thin backend, multiple pools reporting stale stats.

3. **Recommend ratios per backend** — give concrete `max_over_subscription_ratio` and `reserved_percentage` values per backend type, justified by how each backend reports thin usage (Ceph RBD vs LVM thin vs vendor). Note where over-subscription is unsafe and should be 1.0.

4. **Scheduler behavior** — confirm `CapacityFilter` and `thin_provisioning_support` are set correctly so the scheduler never lands a volume on a pool that can't actually hold it.

5. **Guardrails** — alert thresholds on real backend utilization (not provisioned), what to do at 80/90/95%, and how to halt new provisioning before a full-disk freeze.

6. **Capacity reclamation** — TRIM/discard, deleting orphaned volumes/snapshots, and reconciling Cinder DB vs backend actual usage.

Output as: (a) annotated capacity-math explainer, (b) per-backend ratio recommendations table, (c) the exact `cinder.conf` keys to change, (d) monitoring/alert thresholds keyed to real usage, (e) an emergency runbook for an imminent thin-pool-full event.

Be conservative — a frozen thin pool is worse than wasted capacity.

Free: the DevOps AI Incident-Triage Cheat Sheet