OpenStack Error Guide: 'No allocation candidates returned'

Overview

No allocation candidates returned is the Placement API telling Nova that no resource provider can satisfy the requested combination of resource classes, required traits, and aggregate membership. Placement is the authoritative inventory ledger; the scheduler asks it GET /allocation_candidates before any Nova filter runs. If Placement returns an empty set, the scheduler never gets a host list and the boot fails before filtering.

Nova surfaces it in the scheduler/conductor log:

WARNING nova.scheduler.manager [req-...] Got no allocation candidates from the placement API. This could be due to insufficient resources or a temporary occurrence as compute nodes start up.

And Placement itself logs the empty query at DEBUG:

DEBUG placement.objects.allocation_candidate [req-...] found 0 allocation candidates after filtering by required traits, aggregates and forbidden traits

The instance ends in ERROR with a NoValidHost fault. This is distinct from “Filtering removed all hosts” — there the filters emptied a non-empty list; here Placement returned nothing to filter. It happens at boot, resize, evacuate, and any time nova-manage placement heal_allocations runs against a full or misconfigured cloud.

Symptoms

nova-scheduler logs Got no allocation candidates from the placement API.
openstack allocation candidate list --resource VCPU=2 returns zero rows.
Hypervisors look free in openstack hypervisor list, but Placement still returns nothing (inventory/usage drift or traits).

openstack server show api-01 -c fault -f value

{'code': 500, 'message': 'No valid host was found. There are not enough hosts available.', ...}

openstack allocation candidate list --resource VCPU=2 --resource MEMORY_MB=4096 --resource DISK_GB=20

(no output)

Common Root Causes

1. Inventory vs usage: the provider is genuinely booked

Placement tracks per-class total, reserved, allocation_ratio, and tallies used from live allocations. Bookable capacity is floor(total * allocation_ratio) - reserved - used. When that hits zero for any requested class on every provider, you get no candidates.

openstack resource provider list
openstack resource provider inventory list <RP_UUID>
openstack resource provider usage show <RP_UUID>

# inventory
+----------------+------------------+----------+-------+
| resource_class | allocation_ratio | reserved | total |
+----------------+------------------+----------+-------+
| VCPU           |             16.0 |        0 |    64 |
| MEMORY_MB      |              1.0 |     4096 | 257000 |
| DISK_GB        |              1.0 |        0 |  1800 |
+----------------+------------------+----------+-------+
# usage
+----------------+--------+
| resource_class | usage  |
+----------------+--------+
| MEMORY_MB      | 252904 |
+----------------+--------+

257000 * 1.0 - 4096 - 252904 = 0 MEMORY_MB free — RAM at ratio 1.0 is the typical wall.

2. Required traits not satisfied

A flavor (or image) can require traits via trait:<NAME>=required. Placement only returns providers that advertise every required trait and none of the forbidden ones.

openstack flavor show g1.gpu -c properties -f value
openstack resource provider trait list <RP_UUID>

{'trait:CUSTOM_GPU_A100': 'required', 'trait:HW_CPU_X86_AVX512F': 'required'}

If no provider’s trait list contains CUSTOM_GPU_A100, the candidate set is empty even with free VCPU/RAM. Confirm the trait was actually set on the host.

3. Aggregate mismatch between Nova and Placement

Nova host aggregates and Placement aggregates are separate. A request scoped to an availability zone or aggregate adds a member_of=<placement-agg-uuid> constraint. If the compute’s resource provider was never added to the matching Placement aggregate, it is excluded.

# Nova host aggregate
openstack aggregate show az-ssd -f yaml
# Placement aggregate membership for that compute's RP
openstack resource provider aggregate list <RP_UUID>

# Nova aggregate uuid: 3a9f...   but the RP lists no aggregates:
(empty)

The host belongs to the Nova aggregate but its RP is not in the corresponding Placement aggregate, so AZ/aggregate-scoped requests find nothing. Re-sync (Nova writes these on compute restart, or set explicitly):

openstack resource provider aggregate set --aggregate 3a9f... <RP_UUID>

4. Stale allocations holding phantom capacity

Allocations that outlive their instances (failed deletes, DB drift) keep consuming used in Placement, so providers look full when they are not.

openstack resource provider show <RP_UUID> --allocations -f yaml
# Cross-check against live instances on the host
openstack server list --all-projects --host <HOST> -c ID -f value

If Placement lists allocation consumer UUIDs that no longer correspond to a server, those are stale and must be healed/removed (see Step 5).

5. heal_allocations and reshaper

nova-manage placement heal_allocations (re)creates missing allocations for existing instances; it can also fail with “no allocation candidates” if the cloud is full or a flavor’s resources no longer fit. The reshaper migrates inventory between providers (e.g., moving VGPU to child RPs) and can transiently shift where capacity lives.

# Dry-run heal (preview) for a single instance
nova-manage placement heal_allocations --instance <INSTANCE_UUID> --dry-run --verbose

Inventory and allocations for instance <uuid> are good. No allocations to heal.

If heal itself reports no candidates, the underlying inventory/trait/aggregate problem must be fixed first.

6. Reserved or zeroed inventory after a config change

Setting reserved equal to total, an allocation_ratio of 0, or losing inventory after a failed update_provider_tree leaves a provider with no bookable capacity even though the hypervisor is healthy.

openstack resource provider inventory list <RP_UUID>

| VCPU | allocation_ratio 0.0 | reserved 0 | total 64 |   -> 0 bookable

An allocation_ratio of 0 makes floor(64*0)=0 VCPU bookable. Reset it via the agent config or directly.

Diagnostic Workflow

Step 1: Reproduce the empty query against Placement

openstack allocation candidate list \
  --resource VCPU=2 --resource MEMORY_MB=4096 --resource DISK_GB=20

Add the constraints the real request used to localize the cause:

openstack allocation candidate list --resource VCPU=2 --required CUSTOM_GPU_A100
openstack allocation candidate list --resource VCPU=2 --aggregate-uuid <PLACEMENT_AGG_UUID>

Remove constraints one at a time — whichever constraint, when dropped, makes candidates appear is your root cause (trait vs aggregate vs raw resource).

Step 2: Read the scheduler and Placement logs

# Kolla-Ansible
docker logs nova_scheduler 2>&1 | grep -i "no allocation candidates" | tail -5
docker logs placement_api 2>&1 | grep -i "allocation_candidate" | tail -5
# Traditional packages
sudo grep -i "no allocation candidates" /var/log/nova/nova-scheduler.log | tail -5
sudo grep -i "allocation candidate" /var/log/placement/placement-api.log | tail -5

Step 3: Compare inventory, reserved, ratio, and usage

for rp in $(openstack resource provider list -f value -c uuid); do
  echo "== $rp =="
  openstack resource provider inventory list $rp
  openstack resource provider usage show $rp
done

Compute floor(total*allocation_ratio) - reserved - used per class. A zero anywhere on every provider is a capacity/config wall.

Step 4: Verify traits and aggregate membership

openstack resource provider trait list <RP_UUID>
openstack resource provider aggregate list <RP_UUID>
openstack aggregate show <NOVA_AGGREGATE> -f yaml

Confirm the required traits exist on at least one provider and that the provider is in the Placement aggregate matching the Nova aggregate/AZ.

Step 5: Detect and heal stale allocations

# Preview healing for all instances on a host
for s in $(openstack server list --all-projects --host <HOST> -f value -c ID); do
  nova-manage placement heal_allocations --instance $s --dry-run --verbose
done
# Apply (without --dry-run) once previews look right
nova-manage placement heal_allocations --verbose

For an allocation whose consumer is a deleted instance, remove it so the phantom usage clears:

openstack resource provider show <RP_UUID> --allocations -f yaml
openstack allocation delete <STALE_CONSUMER_UUID>

Example Root Cause Analysis

Boots into the az-gpu availability zone fail with Got no allocation candidates from the placement API, even though openstack hypervisor list shows the GPU node compute-gpu-1 at 10% utilization.

Reproducing with constraints isolates it:

openstack allocation candidate list --resource VCPU=4 --resource MEMORY_MB=8192        # returns rows
openstack allocation candidate list --resource VCPU=4 --aggregate-uuid 3a9f8b...        # returns NOTHING

Dropping the aggregate constraint brings candidates back, so the resource is fine — the aggregate scope is the problem. Checking membership:

openstack aggregate show az-gpu -f yaml          # uuid: 3a9f8b...
openstack resource provider aggregate list <compute-gpu-1-RP-uuid>

(empty)

The compute was added to the Nova az-gpu aggregate, but its Placement resource provider was never added to the matching Placement aggregate (the sync didn’t run after a manual DB edit). AZ-scoped requests therefore exclude it.

Fix: add the RP to the Placement aggregate (restarting nova-compute also re-syncs it):

openstack resource provider aggregate set --aggregate 3a9f8b... <compute-gpu-1-RP-uuid>
docker restart nova_compute   # or: sudo systemctl restart nova-compute
openstack allocation candidate list --resource VCPU=4 --aggregate-uuid 3a9f8b...   # now returns rows

The next GPU boot reaches ACTIVE.

Prevention Best Practices

Monitor bookable capacity in Placement, not the hypervisor: alert when floor(total*ratio) - reserved - used for any class trends toward zero, especially MEMORY_MB at ratio 1.0.
Let Nova own the host/placement aggregate sync — avoid manual Placement DB edits; if you must, re-run nova-manage or restart nova-compute so RP aggregate membership matches the Nova aggregate.
Audit allocations periodically with nova-manage placement heal_allocations --dry-run to catch drift before it strands capacity.
Treat required traits as code: keep the host-side trait declarations (provider config files / custom_traits) in version control so flavors never require a trait no host advertises.
Never set reserved equal to total or allocation_ratio to 0 by accident — both silently zero bookable capacity while the hypervisor reports healthy.
For ad-hoc triage, the free incident assistant can read a Placement candidate-count log and point at inventory, trait, or aggregate drift. See more in OpenStack guides.

Quick Command Reference

# Reproduce the empty candidate query, then peel off constraints
openstack allocation candidate list --resource VCPU=2 --resource MEMORY_MB=4096 --resource DISK_GB=20
openstack allocation candidate list --resource VCPU=2 --required CUSTOM_GPU_A100
openstack allocation candidate list --resource VCPU=2 --aggregate-uuid <PLACEMENT_AGG_UUID>

# Scheduler + Placement logs
docker logs nova_scheduler 2>&1 | grep -i "no allocation candidates" | tail -5
sudo grep -i "no allocation candidates" /var/log/nova/nova-scheduler.log | tail -5

# Inventory math: total*ratio - reserved - used
openstack resource provider inventory list <RP_UUID>
openstack resource provider usage show <RP_UUID>

# Traits and aggregate membership
openstack resource provider trait list <RP_UUID>
openstack resource provider aggregate list <RP_UUID>
openstack aggregate show <NOVA_AGGREGATE> -f yaml

# Heal / clear stale allocations
nova-manage placement heal_allocations --instance <INSTANCE_UUID> --dry-run --verbose
nova-manage placement heal_allocations --verbose
openstack resource provider show <RP_UUID> --allocations -f yaml
openstack allocation delete <STALE_CONSUMER_UUID>

# Fix placement aggregate membership
openstack resource provider aggregate set --aggregate <AGG_UUID> <RP_UUID>

Conclusion

No allocation candidates returned means Placement — not the Nova filters — found no provider matching the requested resources, traits, and aggregates. The usual root causes:

Genuine exhaustion: floor(total*ratio) - reserved - used is zero (usually MEMORY_MB at ratio 1.0).
A required trait that no resource provider advertises.
A Nova host aggregate / Placement aggregate membership mismatch for AZ-scoped requests.
Stale allocations inflating used and masking real free capacity.
A failed reshaper/heal, or reserved == total / allocation_ratio == 0 zeroing inventory.

Reproduce the query with openstack allocation candidate list, peel off constraints one at a time to isolate resource vs trait vs aggregate, then fix the inventory, trait, or aggregate membership that Placement is filtering on.

OpenStack Error Guide: 'No allocation candidates returned' Placement API