Skip to content
CloudOps
Newsletter
All prompts
AI for OpenStack Difficulty: Advanced ClaudeChatGPT

Nova Live Migration Troubleshooting Prompt

Diagnose Nova live migration failures — shared storage requirements, block migration, network bandwidth, CPU compatibility, error 'migration aborted'.

Target user
OpenStack compute engineers handling live migrations
Difficulty
Advanced
Tools
Claude, ChatGPT

The prompt

You are a senior OpenStack compute engineer who has debugged thousands of live migrations — shared-storage and block, across different libvirt versions, with various CPU models.

I will provide:
- The symptom (migration aborted, stuck, slow, post-copy issue, guest hangs)
- Source and destination compute info
- Storage backend (shared / block)
- libvirt + qemu versions on both
- nova-compute and libvirt logs

Your job:

1. **Verify prerequisites**:
   - Compute hostnames resolvable
   - SSH between computes (for image transfer if block migration)
   - libvirt TCP/TLS port reachable
   - CPU compatibility between source and destination (same model or compatible)
   - For block migration: enough disk on destination
2. **For shared storage** (Ceph, NFS):
   - Both computes see the same volume / image storage
   - No data transfer needed for disk — only memory + state
   - Faster than block migration
3. **For block migration**:
   - Disk content transferred during migration
   - Network bandwidth = bottleneck (large disks take time)
   - Set `--block-migration` flag
4. **For "migration aborted"**:
   - `nova-compute` log on source/dest
   - libvirt log: `/var/log/libvirt/qemu/<instance>.log`
   - Common: bandwidth too low, dirty rate too high (memory changes faster than transfer)
5. **For CPU model mismatch**:
   - `cpu_model` setting in nova.conf must match
   - Use `host-passthrough` only when source == destination
   - `host-model` or specific model (e.g., `Westmere`) for portability
6. **For post-copy**:
   - Post-copy migration switches to post-copy mode if pre-copy stuck
   - `live_migration_permit_post_copy = true` in nova.conf
   - Risk: brief downtime if network drops during post-copy
7. **For auto-converge**:
   - Throttles guest CPU to slow dirty rate
   - `live_migration_permit_auto_converge = true`
   - May cause perceived slowdown but ensures completion
8. **For tunable bandwidth**:
   - `live_migration_bandwidth` (0 = unlimited)
   - `live_migration_downtime` and `live_migration_downtime_steps`

Mark DESTRUCTIVE: cancelling in-flight migration (instance may end in unknown state), force-migration across incompatible CPU models (guest crash), live migration with insufficient bandwidth (instance hung in post-copy).

---

Symptom: [DESCRIBE]
Source + dest compute: [DESCRIBE]
Storage: [shared / block]
libvirt + qemu versions:
```
[PASTE]
```
Nova / libvirt logs:
```
[PASTE]
```

Why this prompt works

Live migration combines compute, network, storage, and libvirt — failures span layers. This prompt walks them.

How to use it

  1. Verify CPU compatibility between hosts.
  2. For block migration, plan bandwidth.
  3. For stuck migrations, check dirty rate.
  4. Test with non-critical VMs first.

Useful commands

# Start migration
openstack server migrate --live-migration --host <dest> <instance>
openstack server migrate --live-migration --block-migration <instance>

# Watch progress
openstack server show <instance>
openstack server migration list --instance <instance>

# Cancel (if needed)
openstack server migration abort <instance> <migration-id>

# Compute config
sudo cat /etc/nova/nova.conf | grep -E "live_migration|cpu_model"

# Libvirt CPU compat
sudo virsh capabilities | grep -A20 cpu
sudo virsh domcapabilities

# Logs
sudo journalctl -u nova-compute -n 200 --no-pager
sudo tail /var/log/libvirt/qemu/<instance-name>.log
sudo journalctl -u libvirtd -n 100 --no-pager

# Bandwidth check
ip -s link show <migration-network-iface>
iperf3 -c <dest-host>

Common findings this catches

  • CPU model mismatch → migration rejected at libvirt; use compatible model.
  • Block migration timeout with high dirty rate → enable auto-converge or post-copy.
  • Network bandwidth saturated → tune live_migration_bandwidth cap or schedule off-peak.
  • Destination disk insufficient for block migration → expand or use shared storage.
  • Stuck in MIGRATING state → libvirt operation failed; check both sides.
  • Post-migration instance error → resource race; check both nova-compute logs.
  • libvirt version mismatch between source/dest — verify compatibility matrix.

When to escalate

  • Production live migration plan — coordinate maintenance.
  • Mass evacuation of host — schedule with users.
  • Compute hardware vs guest workload mismatch — capacity planning.

Related prompts

Newsletter

Free: the DevOps AI Incident-Triage Cheat Sheet

Subscribe and we’ll send you the one-page cheat sheet — plus weekly AI prompts, automation ideas, and tool reviews for infrastructure engineers. One email a week. No spam, unsubscribe anytime.

  • AI Incident-Triage Cheat Sheet (PDF)
  • Access to 1,603 DevOps AI prompts
  • One practical workflow email per week