OpenStack Error Guide: 'Instance failed to spawn' Nova Stuck

Overview

“Instance failed to spawn” is the error nova-compute raises when it has accepted a build, downloaded/prepared the image, and asked libvirt to create the domain — but the create did not succeed. The instance either flips to ERROR or sits in BUILD with task state spawning until a timeout fires.

The literal log line in nova-compute looks like:

ERROR nova.compute.manager [instance: 7c9e1a2b-3344-5566-7788-99aabbccddee] Instance failed to spawn: libvirt.libvirtError: internal error: process exited while connecting to monitor: qemu-system-x86_64: -accel kvm: failed to initialize kvm: No such file or directory

If the build instead hangs on networking you will see the spawn aborted by a VIF timeout:

ERROR nova.virt.libvirt.driver [instance: 7c9e...] Failed to allocate network(s)
nova.exception.VirtualInterfaceCreateException: Virtual Interface creation failed

It occurs during the compute-side spawn phase of openstack server create (or rebuild). Unlike “No Valid Host Was Found,” the scheduler already picked a host — the failure is local to that compute node’s hypervisor, disk, image handling, or VIF plugging.

Symptoms

Instance stuck in BUILD / task state spawning, then drops to ERROR.
Fault message references libvirt, qemu, “Failed to allocate network(s)”, or “No space left on device”.
nova-compute log shows Instance failed to spawn on a specific compute host.

openstack server show app-09 -c status -c "OS-EXT-STS:task_state" -c fault -f value

ERROR
None
{'code': 500, 'message': "Build of instance 7c9e... aborted: Virtual Interface creation failed", 'details': '...VirtualInterfaceCreateException...'}

openstack server list --status BUILD --long -c Name -c Status -c "Task State" -c Host

+--------+--------+------------+------------+
| Name   | Status | Task State | Host       |
+--------+--------+------------+------------+
| app-09 | BUILD  | spawning   | compute-02 |
+--------+--------+------------+------------+

Common Root Causes

1. Libvirt/qemu cannot start the domain (no KVM / nested virt)

If the host lacks hardware virtualization (or kvm modules aren’t loaded) but virt_type = kvm, qemu fails to initialize KVM.

egrep -c '(vmx|svm)' /proc/cpuinfo
lsmod | grep -E 'kvm_intel|kvm_amd'
docker exec nova_libvirt virsh list --all 2>/dev/null || sudo virsh list --all

A count of 0 means no virtualization flag is exposed — KVM init fails with “failed to initialize kvm”.

2. Insufficient disk on the compute host

Image conversion, the ephemeral/root disk, or _base cache can fill the Nova state directory.

df -h /var/lib/nova /var/lib/docker
docker logs nova_compute 2>&1 | grep -i "No space left" | tail -3

Filesystem      Size  Used Avail Use% Mounted on
/dev/vdb1       100G  100G   0G  100% /var/lib/nova

A 100% /var/lib/nova produces OSError: [Errno 28] No space left on device during disk creation.

3. Image / backing-file problems

A corrupt image, a wrong disk_format, or a qcow2 with an unreachable backing file makes qemu-img conversion fail.

openstack image show ubuntu-22.04 -c status -c disk_format -c size -f value
docker exec nova_compute qemu-img info \
  /var/lib/nova/instances/_base/<HASH> 2>/dev/null
docker logs nova_compute 2>&1 | grep -i "qemu-img" | tail -5

active
qcow2
0

A reported size of 0 or a qemu-img “Could not open backing file” error points at the image.

4. Neutron VIF plug timeout

Nova creates the domain but waits for Neutron to send a network-vif-plugged event. If the L2 agent is slow or the event never arrives, the spawn aborts after vif_plugging_timeout.

grep -E 'vif_plugging_(timeout|is_fatal)' /etc/nova/nova.conf
docker logs nova_compute 2>&1 | grep -i "Timeout waiting for .*vif" | tail -3

vif_plugging_timeout = 300
vif_plugging_is_fatal = True

WARNING nova.virt.libvirt.driver [instance: 7c9e...] Timeout waiting for [('network-vif-plugged', 'a1b2c3d4-...')] to be plugged.

5. SELinux / AppArmor denials

Mandatory access control can block libvirt/qemu from opening the instance disk or socket, even when permissions look correct.

sudo ausearch -m avc -ts recent 2>/dev/null | grep -iE 'qemu|libvirt|svirt' | tail -5
sudo getenforce
sudo dmesg | grep -i apparmor | grep -i denied | tail -5

type=AVC msg=audit(...): avc:  denied  { read } for  pid=... comm="qemu-system-x86" name="disk" dev="vdb1" ... scontext=system_u:system_r:svirt_t:s0:c12,c34 tcontext=system_u:object_r:default_t:s0

6. Unsupported CPU model / missing flags

A flavor or image requesting a CPU model the host doesn’t support, or cpu_mode/cpu_models misconfig, makes libvirt reject the domain.

grep -E '^(cpu_mode|cpu_models|cpu_model_extra_flags)' /etc/nova/nova.conf
docker logs nova_compute 2>&1 | grep -i "unsupported configuration\|CPU" | tail -5

ERROR ... libvirt.libvirtError: unsupported configuration: guest and host CPU are not compatible: Host CPU does not provide required features: ...

Diagnostic Workflow

Step 1: Capture the fault and the host

openstack server show <SERVER> -c status -c "OS-EXT-STS:task_state" \
  -c "OS-EXT-SRV-ATTR:host" -c fault -f value

Note the host; every later command runs against that compute node.

Step 2: Read the nova-compute spawn log

# Kolla-Ansible
docker logs nova_compute 2>&1 | grep -A20 "Instance failed to spawn" | tail -40
# Traditional
sudo journalctl -u nova-compute --no-pager | grep -A20 "Instance failed to spawn" | tail -40

The traceback names the layer that failed: libvirt, qemu-img, VirtualInterfaceCreateException, or No space left.

Step 3: Check host capacity and hypervisor health

df -h /var/lib/nova
free -m
docker exec nova_libvirt virsh list --all 2>/dev/null || sudo virsh list --all
docker exec nova_libvirt virsh nodeinfo 2>/dev/null || sudo virsh nodeinfo

Rule out a full disk and a dead/over-subscribed libvirt before digging deeper.

Step 4: If it’s networking, correlate with Neutron

docker logs nova_compute 2>&1 | grep -i "vif" | tail -10
docker logs neutron_openvswitch_agent 2>&1 | tail -30
# Traditional
sudo journalctl -u neutron-openvswitch-agent --no-pager | tail -30
openstack network agent list --host <HOST>

A VIF timeout almost always means the L2 agent on the host is slow, down, or the port failed to bind.

Step 5: Check MAC layer (SELinux/AppArmor) and CPU config

sudo ausearch -m avc -ts recent | grep -iE 'qemu|libvirt|svirt' | tail
sudo aa-status | grep -i libvirt
grep -E '^(virt_type|cpu_mode|cpu_models)' /etc/nova/nova.conf
egrep -c '(vmx|svm)' /proc/cpuinfo

Example Root Cause Analysis

app-09 is stuck in BUILD/spawning on compute-02, then errors with “Virtual Interface creation failed.”

The nova-compute log:

WARNING nova.virt.libvirt.driver [instance: 7c9e...] Timeout waiting for [('network-vif-plugged', 'a1b2c3d4-1111-...')] to be plugged.
ERROR nova.compute.manager [instance: 7c9e...] Instance failed to spawn: nova.exception.VirtualInterfaceCreateException: Virtual Interface creation failed

So libvirt created the domain but never got the network-vif-plugged event. Checking the OVS agent on compute-02:

openstack network agent list --host compute-02

+---------+--------------------+------------+-------+-------+
| ID      | Agent Type         | Host       | Alive | State |
+---------+--------------------+------------+-------+-------+
| 3a2b... | Open vSwitch agent | compute-02 | XXX   | UP    |
+---------+--------------------+------------+-------+-------+

The agent is not heartbeating (Alive = XXX), so it never wired the tap interface and Nova’s VIF timeout fired. The agent log confirms it was stuck reconnecting to RabbitMQ.

Fix: restart the agent, confirm it heartbeats, then rebuild the instance:

docker restart neutron_openvswitch_agent     # on compute-02
openstack network agent list --host compute-02   # Alive should flip to :-)
openstack server reboot --hard app-09

The VIF plugs, Nova receives the event, and app-09 reaches ACTIVE.

Prevention Best Practices

Monitor /var/lib/nova (and /var/lib/docker for Kolla) free space; alert well before 90%. A full state dir is a top cause of spawn failures.
Validate KVM/nested virt on every compute node (egrep -c '(vmx|svm)' /proc/cpuinfo) before adding it to the scheduler pool.
Alert on dead L2 agents with openstack network agent list; VIF timeouts are usually an agent that stopped heartbeating.
Keep vif_plugging_timeout realistic for your fabric and decide deliberately on vif_plugging_is_fatal.
Run SELinux/AppArmor in enforcing mode but ship the OpenStack policy modules, and watch ausearch -m avc after upgrades.
Pin cpu_mode/cpu_models to the lowest common host CPU when using live migration to avoid “CPU not compatible” spawn errors.
For fast triage, paste the spawn traceback into the free incident assistant, or browse more OpenStack guides.

Quick Command Reference

# Fault, task state, and host
openstack server show <SERVER> -c status -c "OS-EXT-STS:task_state" -c "OS-EXT-SRV-ATTR:host" -c fault -f value

# Spawn traceback
docker logs nova_compute 2>&1 | grep -A20 "Instance failed to spawn" | tail -40
sudo journalctl -u nova-compute | grep -A20 "Instance failed to spawn" | tail -40

# Host capacity & hypervisor
df -h /var/lib/nova; free -m
docker exec nova_libvirt virsh list --all
docker exec nova_libvirt virsh nodeinfo

# KVM availability
egrep -c '(vmx|svm)' /proc/cpuinfo; lsmod | grep kvm

# VIF / Neutron correlation
docker logs nova_compute 2>&1 | grep -i "vif" | tail -10
openstack network agent list --host <HOST>

# MAC layer & CPU
sudo ausearch -m avc -ts recent | grep -iE 'qemu|libvirt|svirt' | tail
grep -E '^(virt_type|cpu_mode|cpu_models)' /etc/nova/nova.conf

# Recover
openstack server reboot --hard <SERVER>

Conclusion

“Instance failed to spawn” is a host-local build failure after the scheduler has already chosen the compute node. The typical root causes:

Libvirt/qemu cannot start the domain (no KVM / missing virtualization flags).
The compute host’s /var/lib/nova (or Docker) volume is out of disk.
A corrupt image, wrong disk_format, or broken qcow2 backing file.
A Neutron VIF plug timeout, usually a slow or dead L2 agent.
SELinux/AppArmor denials blocking qemu’s access to the disk or socket.
An unsupported CPU model or missing CPU flags for the requested domain.

Read the nova-compute traceback first — it names the failing layer — then check disk and the L2 agent, which together account for most real-world spawn failures.

OpenStack Error Guide: 'Instance failed to spawn' Nova Stuck in BUILD/spawning