Nova Live Migration Troubleshooting Prompt
Diagnose Nova live migration failures — shared storage requirements, block migration, network bandwidth, CPU compatibility, error 'migration aborted'.
- Target user
- OpenStack compute engineers handling live migrations
- Difficulty
- Advanced
- Tools
- Claude, ChatGPT
The prompt
You are a senior OpenStack compute engineer who has debugged thousands of live migrations — shared-storage and block, across different libvirt versions, with various CPU models. I will provide: - The symptom (migration aborted, stuck, slow, post-copy issue, guest hangs) - Source and destination compute info - Storage backend (shared / block) - libvirt + qemu versions on both - nova-compute and libvirt logs Your job: 1. **Verify prerequisites**: - Compute hostnames resolvable - SSH between computes (for image transfer if block migration) - libvirt TCP/TLS port reachable - CPU compatibility between source and destination (same model or compatible) - For block migration: enough disk on destination 2. **For shared storage** (Ceph, NFS): - Both computes see the same volume / image storage - No data transfer needed for disk — only memory + state - Faster than block migration 3. **For block migration**: - Disk content transferred during migration - Network bandwidth = bottleneck (large disks take time) - Set `--block-migration` flag 4. **For "migration aborted"**: - `nova-compute` log on source/dest - libvirt log: `/var/log/libvirt/qemu/<instance>.log` - Common: bandwidth too low, dirty rate too high (memory changes faster than transfer) 5. **For CPU model mismatch**: - `cpu_model` setting in nova.conf must match - Use `host-passthrough` only when source == destination - `host-model` or specific model (e.g., `Westmere`) for portability 6. **For post-copy**: - Post-copy migration switches to post-copy mode if pre-copy stuck - `live_migration_permit_post_copy = true` in nova.conf - Risk: brief downtime if network drops during post-copy 7. **For auto-converge**: - Throttles guest CPU to slow dirty rate - `live_migration_permit_auto_converge = true` - May cause perceived slowdown but ensures completion 8. **For tunable bandwidth**: - `live_migration_bandwidth` (0 = unlimited) - `live_migration_downtime` and `live_migration_downtime_steps` Mark DESTRUCTIVE: cancelling in-flight migration (instance may end in unknown state), force-migration across incompatible CPU models (guest crash), live migration with insufficient bandwidth (instance hung in post-copy). --- Symptom: [DESCRIBE] Source + dest compute: [DESCRIBE] Storage: [shared / block] libvirt + qemu versions: ``` [PASTE] ``` Nova / libvirt logs: ``` [PASTE] ```
Why this prompt works
Live migration combines compute, network, storage, and libvirt — failures span layers. This prompt walks them.
How to use it
- Verify CPU compatibility between hosts.
- For block migration, plan bandwidth.
- For stuck migrations, check dirty rate.
- Test with non-critical VMs first.
Useful commands
# Start migration
openstack server migrate --live-migration --host <dest> <instance>
openstack server migrate --live-migration --block-migration <instance>
# Watch progress
openstack server show <instance>
openstack server migration list --instance <instance>
# Cancel (if needed)
openstack server migration abort <instance> <migration-id>
# Compute config
sudo cat /etc/nova/nova.conf | grep -E "live_migration|cpu_model"
# Libvirt CPU compat
sudo virsh capabilities | grep -A20 cpu
sudo virsh domcapabilities
# Logs
sudo journalctl -u nova-compute -n 200 --no-pager
sudo tail /var/log/libvirt/qemu/<instance-name>.log
sudo journalctl -u libvirtd -n 100 --no-pager
# Bandwidth check
ip -s link show <migration-network-iface>
iperf3 -c <dest-host>
Common findings this catches
- CPU model mismatch → migration rejected at libvirt; use compatible model.
- Block migration timeout with high dirty rate → enable auto-converge or post-copy.
- Network bandwidth saturated → tune
live_migration_bandwidthcap or schedule off-peak. - Destination disk insufficient for block migration → expand or use shared storage.
- Stuck in MIGRATING state → libvirt operation failed; check both sides.
- Post-migration instance error → resource race; check both nova-compute logs.
- libvirt version mismatch between source/dest — verify compatibility matrix.
When to escalate
- Production live migration plan — coordinate maintenance.
- Mass evacuation of host — schedule with users.
- Compute hardware vs guest workload mismatch — capacity planning.
Related prompts
-
OpenStack Request-ID Log Trace Prompt
Correlate a single API request across services (nova-api → conductor → scheduler → compute → neutron → cinder) using OpenStack request IDs.
-
OpenStack Upgrade Pre-Flight Review Prompt
Pre-upgrade safety review of an OpenStack cluster moving release N → N+1 — config drift, deprecated options, DB migrations, breaking changes, service ordering.
-
OpenStack VM Troubleshooting Prompt
Diagnose Nova VM boot failures, networking issues, and stuck instances using nova/openstack CLI output.