OpenStack Error Guide: 'Instance failed to spawn' Nova Stuck in BUILD/spawning
Fix Nova 'Instance failed to spawn' and instances stuck in BUILD/spawning: diagnose libvirt/qemu errors, disk space, VIF plug timeouts, SELinux, and CPU flags.
- #openstack
- #troubleshooting
- #errors
- #nova
Overview
“Instance failed to spawn” is the error nova-compute raises when it has accepted a build, downloaded/prepared the image, and asked libvirt to create the domain — but the create did not succeed. The instance either flips to ERROR or sits in BUILD with task state spawning until a timeout fires.
The literal log line in nova-compute looks like:
ERROR nova.compute.manager [instance: 7c9e1a2b-3344-5566-7788-99aabbccddee] Instance failed to spawn: libvirt.libvirtError: internal error: process exited while connecting to monitor: qemu-system-x86_64: -accel kvm: failed to initialize kvm: No such file or directory
If the build instead hangs on networking you will see the spawn aborted by a VIF timeout:
ERROR nova.virt.libvirt.driver [instance: 7c9e...] Failed to allocate network(s)
nova.exception.VirtualInterfaceCreateException: Virtual Interface creation failed
It occurs during the compute-side spawn phase of openstack server create (or rebuild). Unlike “No Valid Host Was Found,” the scheduler already picked a host — the failure is local to that compute node’s hypervisor, disk, image handling, or VIF plugging.
Symptoms
- Instance stuck in
BUILD/ task statespawning, then drops toERROR. - Fault message references libvirt, qemu, “Failed to allocate network(s)”, or “No space left on device”.
- nova-compute log shows
Instance failed to spawnon a specific compute host.
openstack server show app-09 -c status -c "OS-EXT-STS:task_state" -c fault -f value
ERROR
None
{'code': 500, 'message': "Build of instance 7c9e... aborted: Virtual Interface creation failed", 'details': '...VirtualInterfaceCreateException...'}
openstack server list --status BUILD --long -c Name -c Status -c "Task State" -c Host
+--------+--------+------------+------------+
| Name | Status | Task State | Host |
+--------+--------+------------+------------+
| app-09 | BUILD | spawning | compute-02 |
+--------+--------+------------+------------+
Common Root Causes
1. Libvirt/qemu cannot start the domain (no KVM / nested virt)
If the host lacks hardware virtualization (or kvm modules aren’t loaded) but virt_type = kvm, qemu fails to initialize KVM.
egrep -c '(vmx|svm)' /proc/cpuinfo
lsmod | grep -E 'kvm_intel|kvm_amd'
docker exec nova_libvirt virsh list --all 2>/dev/null || sudo virsh list --all
0
A count of 0 means no virtualization flag is exposed — KVM init fails with “failed to initialize kvm”.
2. Insufficient disk on the compute host
Image conversion, the ephemeral/root disk, or _base cache can fill the Nova state directory.
df -h /var/lib/nova /var/lib/docker
docker logs nova_compute 2>&1 | grep -i "No space left" | tail -3
Filesystem Size Used Avail Use% Mounted on
/dev/vdb1 100G 100G 0G 100% /var/lib/nova
A 100% /var/lib/nova produces OSError: [Errno 28] No space left on device during disk creation.
3. Image / backing-file problems
A corrupt image, a wrong disk_format, or a qcow2 with an unreachable backing file makes qemu-img conversion fail.
openstack image show ubuntu-22.04 -c status -c disk_format -c size -f value
docker exec nova_compute qemu-img info \
/var/lib/nova/instances/_base/<HASH> 2>/dev/null
docker logs nova_compute 2>&1 | grep -i "qemu-img" | tail -5
active
qcow2
0
A reported size of 0 or a qemu-img “Could not open backing file” error points at the image.
4. Neutron VIF plug timeout
Nova creates the domain but waits for Neutron to send a network-vif-plugged event. If the L2 agent is slow or the event never arrives, the spawn aborts after vif_plugging_timeout.
grep -E 'vif_plugging_(timeout|is_fatal)' /etc/nova/nova.conf
docker logs nova_compute 2>&1 | grep -i "Timeout waiting for .*vif" | tail -3
vif_plugging_timeout = 300
vif_plugging_is_fatal = True
WARNING nova.virt.libvirt.driver [instance: 7c9e...] Timeout waiting for [('network-vif-plugged', 'a1b2c3d4-...')] to be plugged.
5. SELinux / AppArmor denials
Mandatory access control can block libvirt/qemu from opening the instance disk or socket, even when permissions look correct.
sudo ausearch -m avc -ts recent 2>/dev/null | grep -iE 'qemu|libvirt|svirt' | tail -5
sudo getenforce
sudo dmesg | grep -i apparmor | grep -i denied | tail -5
type=AVC msg=audit(...): avc: denied { read } for pid=... comm="qemu-system-x86" name="disk" dev="vdb1" ... scontext=system_u:system_r:svirt_t:s0:c12,c34 tcontext=system_u:object_r:default_t:s0
6. Unsupported CPU model / missing flags
A flavor or image requesting a CPU model the host doesn’t support, or cpu_mode/cpu_models misconfig, makes libvirt reject the domain.
grep -E '^(cpu_mode|cpu_models|cpu_model_extra_flags)' /etc/nova/nova.conf
docker logs nova_compute 2>&1 | grep -i "unsupported configuration\|CPU" | tail -5
ERROR ... libvirt.libvirtError: unsupported configuration: guest and host CPU are not compatible: Host CPU does not provide required features: ...
Diagnostic Workflow
Step 1: Capture the fault and the host
openstack server show <SERVER> -c status -c "OS-EXT-STS:task_state" \
-c "OS-EXT-SRV-ATTR:host" -c fault -f value
Note the host; every later command runs against that compute node.
Step 2: Read the nova-compute spawn log
# Kolla-Ansible
docker logs nova_compute 2>&1 | grep -A20 "Instance failed to spawn" | tail -40
# Traditional
sudo journalctl -u nova-compute --no-pager | grep -A20 "Instance failed to spawn" | tail -40
The traceback names the layer that failed: libvirt, qemu-img, VirtualInterfaceCreateException, or No space left.
Step 3: Check host capacity and hypervisor health
df -h /var/lib/nova
free -m
docker exec nova_libvirt virsh list --all 2>/dev/null || sudo virsh list --all
docker exec nova_libvirt virsh nodeinfo 2>/dev/null || sudo virsh nodeinfo
Rule out a full disk and a dead/over-subscribed libvirt before digging deeper.
Step 4: If it’s networking, correlate with Neutron
docker logs nova_compute 2>&1 | grep -i "vif" | tail -10
docker logs neutron_openvswitch_agent 2>&1 | tail -30
# Traditional
sudo journalctl -u neutron-openvswitch-agent --no-pager | tail -30
openstack network agent list --host <HOST>
A VIF timeout almost always means the L2 agent on the host is slow, down, or the port failed to bind.
Step 5: Check MAC layer (SELinux/AppArmor) and CPU config
sudo ausearch -m avc -ts recent | grep -iE 'qemu|libvirt|svirt' | tail
sudo aa-status | grep -i libvirt
grep -E '^(virt_type|cpu_mode|cpu_models)' /etc/nova/nova.conf
egrep -c '(vmx|svm)' /proc/cpuinfo
Example Root Cause Analysis
app-09 is stuck in BUILD/spawning on compute-02, then errors with “Virtual Interface creation failed.”
The nova-compute log:
WARNING nova.virt.libvirt.driver [instance: 7c9e...] Timeout waiting for [('network-vif-plugged', 'a1b2c3d4-1111-...')] to be plugged.
ERROR nova.compute.manager [instance: 7c9e...] Instance failed to spawn: nova.exception.VirtualInterfaceCreateException: Virtual Interface creation failed
So libvirt created the domain but never got the network-vif-plugged event. Checking the OVS agent on compute-02:
openstack network agent list --host compute-02
+---------+--------------------+------------+-------+-------+
| ID | Agent Type | Host | Alive | State |
+---------+--------------------+------------+-------+-------+
| 3a2b... | Open vSwitch agent | compute-02 | XXX | UP |
+---------+--------------------+------------+-------+-------+
The agent is not heartbeating (Alive = XXX), so it never wired the tap interface and Nova’s VIF timeout fired. The agent log confirms it was stuck reconnecting to RabbitMQ.
Fix: restart the agent, confirm it heartbeats, then rebuild the instance:
docker restart neutron_openvswitch_agent # on compute-02
openstack network agent list --host compute-02 # Alive should flip to :-)
openstack server reboot --hard app-09
The VIF plugs, Nova receives the event, and app-09 reaches ACTIVE.
Prevention Best Practices
- Monitor
/var/lib/nova(and/var/lib/dockerfor Kolla) free space; alert well before 90%. A full state dir is a top cause of spawn failures. - Validate KVM/nested virt on every compute node (
egrep -c '(vmx|svm)' /proc/cpuinfo) before adding it to the scheduler pool. - Alert on dead L2 agents with
openstack network agent list; VIF timeouts are usually an agent that stopped heartbeating. - Keep
vif_plugging_timeoutrealistic for your fabric and decide deliberately onvif_plugging_is_fatal. - Run SELinux/AppArmor in enforcing mode but ship the OpenStack policy modules, and watch
ausearch -m avcafter upgrades. - Pin
cpu_mode/cpu_modelsto the lowest common host CPU when using live migration to avoid “CPU not compatible” spawn errors. - For fast triage, paste the spawn traceback into the free incident assistant, or browse more OpenStack guides.
Quick Command Reference
# Fault, task state, and host
openstack server show <SERVER> -c status -c "OS-EXT-STS:task_state" -c "OS-EXT-SRV-ATTR:host" -c fault -f value
# Spawn traceback
docker logs nova_compute 2>&1 | grep -A20 "Instance failed to spawn" | tail -40
sudo journalctl -u nova-compute | grep -A20 "Instance failed to spawn" | tail -40
# Host capacity & hypervisor
df -h /var/lib/nova; free -m
docker exec nova_libvirt virsh list --all
docker exec nova_libvirt virsh nodeinfo
# KVM availability
egrep -c '(vmx|svm)' /proc/cpuinfo; lsmod | grep kvm
# VIF / Neutron correlation
docker logs nova_compute 2>&1 | grep -i "vif" | tail -10
openstack network agent list --host <HOST>
# MAC layer & CPU
sudo ausearch -m avc -ts recent | grep -iE 'qemu|libvirt|svirt' | tail
grep -E '^(virt_type|cpu_mode|cpu_models)' /etc/nova/nova.conf
# Recover
openstack server reboot --hard <SERVER>
Conclusion
“Instance failed to spawn” is a host-local build failure after the scheduler has already chosen the compute node. The typical root causes:
- Libvirt/qemu cannot start the domain (no KVM / missing virtualization flags).
- The compute host’s
/var/lib/nova(or Docker) volume is out of disk. - A corrupt image, wrong
disk_format, or broken qcow2 backing file. - A Neutron VIF plug timeout, usually a slow or dead L2 agent.
- SELinux/AppArmor denials blocking qemu’s access to the disk or socket.
- An unsupported CPU model or missing CPU flags for the requested domain.
Read the nova-compute traceback first — it names the failing layer — then check disk and the L2 agent, which together account for most real-world spawn failures.
Download the Free 500-Prompt DevOps AI Toolkit
500 battle-tested, copy-paste AI prompts engineered by a senior systems engineer — every one with fill-in placeholders and safety/back-out notes. Drop your email and it's yours.
- 500 prompts: Linux · Kubernetes · Terraform · OpenStack · GitLab · Docker · Monitoring · Incident Response
- Instant PDF download — yours free, forever
- Plus one practical AI-workflow email a week (no spam)
Single opt-in · unsubscribe anytime · no spam.