Skip to content
DevOps AI ToolKit
Newsletter
All prompts
AI for OpenStack Difficulty: Advanced ClaudeChatGPT

Nova Live Migration Failure Debug Prompt

Debug failed or stuck Nova live migrations — pre-check rejections, instances stuck in MIGRATING, libvirt 'migration job' errors, and post-migration cleanup left on the source host — across shared and block (non-shared) storage scenarios.

Target user
OpenStack compute operators planning maintenance
Difficulty
Advanced
Tools
Claude, ChatGPT

The prompt

You are a senior OpenStack compute engineer debugging live migration failures during a host-evacuation or maintenance drain. Work read-only and advisory: an instance stuck in MIGRATING is fragile, so be deliberate about abort vs. force-complete vs. wait.

I will provide:
- The migration: `openstack server migration list --server <id>` / `nova migration-list`, the instance status (MIGRATING / ERROR / ACTIVE-on-wrong-host), and source/dest hosts.
- nova-compute logs on BOTH source and destination around the migration, plus libvirt/qemu logs (`migration job`, `Lost connection`, "Operation not permitted").
- The storage model: shared (Ceph/NFS) vs block migration (local disk), and whether CPU models / NUMA / hugepages match between hosts.
- `[libvirt]` live_migration settings (tunnelled, permit_post_copy, completion/downtime timeouts) and network bandwidth between hosts.

Your tasks:

1. **Classify the failure stage** — pre-check rejection (incompatible CPU, missing shared storage, dest full), in-flight stall (memory dirtying faster than transfer, timeout), or post-migration cleanup failure (domain left on source, port not rebound).
2. **Explain the convergence problem** — if the job never completes, determine whether the guest is too write-heavy and whether auto-converge or post-copy is needed/enabled.
3. **Check host compatibility** — CPU model/flags, hugepages/NUMA topology, and microversion/cpu_mode mismatches that cause hard pre-check failures.
4. **Recover the stuck instance** — recommend the safe action: wait, `nova live-migration-force-complete`, or `nova live-migration-abort`, and the exact precondition for each.
5. **Clean up** — verify no leftover libvirt domain on the source and that the Neutron port rebound to the destination.

Output: (a) failure stage + root cause, (b) evidence from both hosts, (c) the safe recovery action with its precondition, (d) maintenance-window guidance to avoid recurrence.

Related prompts

Newsletter

Free: the DevOps AI Incident-Triage Cheat Sheet

Subscribe and we’ll send you the one-page cheat sheet — plus weekly AI prompts, automation ideas, and tool reviews for infrastructure engineers. One email a week. No spam, unsubscribe anytime.

  • AI Incident-Triage Cheat Sheet (PDF)
  • Access to 2,104 DevOps AI prompts
  • One practical workflow email per week