Reading OpenStack Placement Resource Inventories with AI

The first time I had to debug a “No valid host was found” error at scale, I had 340 compute nodes and a Placement database that nobody on the team fully understood. I spent a weekend manually grep-ing through inventory dumps, building a spreadsheet by hand, and slowly losing my mind. These days I’d hand that dump to an AI and have a cross-tabulated summary in ninety seconds. But I want to be blunt up front: the AI is a fast junior engineer, not an oracle. It will confidently mislabel a reserved value as available capacity, and if you act on that you’ll wedge a scheduler. So let’s talk about how to actually do this well.

What Placement Is and Why Nova Lives or Dies by It

Placement is the service that tracks what resources exist and how much is left. Every compute node registers itself as a resource provider with an inventory: so many VCPUs, so much MEMORY_MB, so much DISK_GB. When Nova schedules an instance, it doesn’t guess; it asks Placement for a list of candidates that can satisfy the request, claims an allocation against one, and only then boots. If Placement’s view of the world drifts from reality, Nova makes bad decisions: it overpacks nodes, refuses to schedule onto empty ones, or throws “No valid host” with plenty of hardware sitting idle.

That’s why reading Placement correctly matters more than almost any other introspection task in OpenStack. Start with the inventory of providers:

openstack resource provider list
openstack resource provider list --resource VCPU=4

The second form filters to providers that can currently allocate four VCPUs, which is your fastest “is there anywhere to land?” sanity check. If you’re new to OpenStack capacity work, the openstack category on this site collects the related playbooks.

Reading a Single Provider’s Inventory and Usage

Once you have a UUID, the two commands you’ll run constantly are inventory and usage:

openstack resource provider inventory list 8f3a... 
openstack resource provider usage show 8f3a...

Inventory tells you the shape of the resource: total, reserved, min_unit, max_unit, step_size, and crucially allocation_ratio. Usage tells you how much is currently claimed. The math that actually determines available capacity is:

available = (total - reserved) * allocation_ratio - used

This is exactly where I see AI tools trip. If you paste an inventory dump into a model and ask “how much VCPU is free?”, a careless prompt gets you total - used, which ignores both the reserved headroom and the overcommit ratio. So I prompt explicitly: “Compute available per provider as (total minus reserved) times allocation_ratio minus used. Show the formula inputs in a table so I can check each row.” Forcing it to show inputs is what makes the answer auditable.

Pro Tip: Always make the AI emit the intermediate values, not just the conclusion. A summary you can’t spot-check against openstack resource provider usage show is a liability, not a shortcut.

Allocation Ratios and Reserved Values

cpu_allocation_ratio, ram_allocation_ratio, and disk_allocation_ratio are where operators encode their risk tolerance. A cpu_allocation_ratio of 16.0 means you’re selling sixteen vCPUs per physical core; great for dev workloads, catastrophic for latency-sensitive ones. These can be set globally in nova.conf or overridden per provider in Placement, and when those two disagree, confusion reigns.

Reserved values are the other classic gotcha. Operators reserve memory for the hypervisor and host OS so Nova doesn’t pack a node until it OOMs. When you dump inventory across hundreds of providers, an AI is genuinely good at flagging outliers: “provider X reserves 0 MEMORY_MB while every other node reserves 8192” is the kind of pattern a human eye glazes over but a model surfaces instantly. I treat that as a lead to investigate, never a fact to act on. I’ll pull up Claude to do the cross-tabulation, then go verify the flagged nodes by hand.

Traits, Nested Providers, and the Hard Stuff

Modern OpenStack clouds are not flat. A single compute node is a tree of nested resource providers: the root node owns disk, but NUMA nodes own VCPU and MEMORY_MB, PCI devices own PCI_DEVICE inventory, and a GPU card exposes VGPU resources. List the traits and you’ll see the qualitative capabilities layered on top:

openstack resource provider trait list 8f3a...
openstack trait list

Traits like HW_CPU_X86_AVX512, COMPUTE_VOLUME_MULTI_ATTACH, or a custom CUSTOM_GOLD_TIER are how the scheduler matches workloads to capable hardware. When you’re debugging why a VGPU instance won’t land, you need to read the nested provider’s inventory, not the root’s. This is precisely the kind of multi-level structure where AI summarization shines and also where it most often hallucinates parent-child relationships. I make the model reconstruct the tree and then I diff its tree against:

openstack resource provider show 8f3a... --allocations

If its reconstructed hierarchy doesn’t match, I don’t trust anything downstream of it. For building these structured introspection prompts, I keep templates in the prompt workspace and pull reusable ones from the prompt packs.

Finding Where Things Can Actually Land

The single most useful diagnostic command, and the one I wish I’d learned years earlier, is the allocation candidate query. It asks Placement the same question the scheduler asks:

openstack allocation candidate list --resource VCPU=4 --resource MEMORY_MB=8192

If this returns rows, the scheduler can place that shape somewhere. If it returns nothing, you have a real capacity or trait problem, and now you know it’s Placement-side rather than a filter or weigher issue. You can layer traits in too:

openstack allocation candidate list \
  --resource VCPU=4 --resource MEMORY_MB=8192 \
  --required CUSTOM_GOLD_TIER

When I’m staring at a fleet-wide dump trying to answer “are we out of capacity, and where?”, I’ll have the AI cross-tabulate every provider’s free VCPU and MEMORY_MB and rank them. That ranked table is the genuine time-saver. But the ground truth is still the candidate list above, so I verify the AI’s “we’re full” conclusion by actually running the query for the flavor in question.

Detecting Capacity Exhaustion and Healing Drift

Capacity exhaustion rarely announces itself. It shows up as scheduling failures on some flavors while small instances still land fine, because one resource class (often MEMORY_MB or a scarce PCI device) is gone while VCPUs remain. An AI is great at spotting this asymmetry across a dump: “every provider has VCPU headroom but DISK_GB is exhausted on 90% of them.” Again, lead, not verdict.

The other thing that silently breaks Placement is drift: instances that exist in Nova but whose allocations are missing or wrong in Placement, usually after a botched migration or a database restore. The repair tool is:

nova-manage placement heal_allocations --dry-run
nova-manage placement heal_allocations --instance <uuid>

I cannot stress this enough: run --dry-run first, read every proposed change, and scope to a single --instance when you can. This is not a command to let an AI run, and it’s not a command to run fleet-wide because a model told you it’d be fine. The heal operation rewrites allocations; get it wrong and you’ll corrupt your scheduler’s view of reality.

Pro Tip: Never give an AI assistant your clouds.yaml, an admin token, or shell access to a cloud where it can run nova-manage. Let it read sanitized dumps you paste in and propose commands you execute yourself. The moment it can act, “fast junior engineer” becomes “fast junior engineer with root.”

Conclusion

AI has genuinely changed how fast I can read Placement. Cross-tabulating inventory across hundreds of nested providers, flagging reserved-value outliers, ranking free capacity, reconstructing NUMA trees, these are real, daily wins. But every one of those wins is a summary of data you must still verify with openstack resource provider usage show and openstack allocation candidate list. Keep the model on the reading side of the airlock, keep your credentials on yours, and you get the speed without the foot-guns. For the structured prompts I use to do this, browse the prompts library.