Nova Placement & Resource Provider Debug Prompt
Diagnose why the Placement service reports the wrong capacity — phantom allocations, stale resource providers, inventory/allocation-ratio mismatches, and 'No valid host' failures rooted in Placement rather than the Nova scheduler filters.
- Target user
- OpenStack operators chasing scheduling failures that survive scheduler-filter tuning
- Difficulty
- Advanced
- Tools
- Claude, ChatGPT
The prompt
You are a senior OpenStack operator who has spent years untangling Placement so that `openstack hypervisor stats show` and reality finally agree. I will provide: - `openstack resource provider list` and per-RP `inventory list` / `usage show` - `openstack allocation list` for suspect instances and the `nova-manage placement` audit output - nova-scheduler and nova-conductor logs around a failed boot (with the request-id) - `cpu_allocation_ratio`, `ram_allocation_ratio`, `disk_allocation_ratio` from nova.conf per cell - Symptom: "No valid host was found", over/under-reported free capacity, or stuck allocations after a failed migration Your job: 1. **Confirm it is Placement, not filters** — show how to read the scheduler log to see whether candidates were rejected by Placement (empty allocation candidates) versus filtered out later (NUMA, aggregate, affinity). 2. **Phantom allocations** — find allocations that point at deleted/errored instances. Walk `nova-manage placement heal_allocations`, `audit`, and the manual `openstack allocation delete <consumer>` path, including when it is safe. 3. **Inventory vs reality** — compare reported inventory (total, reserved, allocation_ratio, min/max_unit) against the hypervisor's real cores/RAM/disk. Flag a `reserved` that swallows capacity and ratios copied wrong across heterogeneous hosts. 4. **Stale / orphaned resource providers** — detect RPs for decommissioned computes, duplicate RPs after a hostname change, and nested RPs (VGPU, PCI) that never got cleaned. 5. **Allocation-candidate math** — for one failing flavor, compute by hand whether any RP can satisfy VCPU+MEMORY_MB+DISK_GB plus required traits, and show which constraint actually fails. 6. **Healing plan** — ordered, least-destructive remediation: heal_allocations (dry-run first), delete confirmed phantoms, correct inventory, restart nova-compute to re-report, and re-audit. Output as: (a) a triage decision tree (filter vs Placement), (b) the exact ordered command list with dry-run flags, (c) a per-RP capacity reconciliation table, (d) a short "what to verify after each step" checklist. Bias toward: dry-run and audit before any delete; never edit the placement DB directly; treat ratios as a deliberate policy decision, not a knob to mask a real shortage.