AI-Assisted Glance Image and Instance Boot Failure Troubleshooting
Why instances won't boot from a Glance image — disk formats, image properties, virtio drivers, cloud-init — and how AI speeds up triage without your cloud.
- #openstack
- #glance
- #nova
- #images
Nothing humbles you quite like an image that boots flawlessly on your laptop and then sits at ERROR the moment Nova tries to launch it. I’ve uploaded thousands of images into Glance across the years, and I can tell you the failure is almost never the bytes inside the image — it’s the metadata wrapped around them. The wrong disk_format, a missing hw_disk_bus, a Windows image with no virtio drivers — these are the things that turn a “working” image into a brick. AI has become a genuinely useful co-pilot for this triage, mostly because image-property debugging is a giant lookup table and the model is fast at lookup tables. Let me walk through how I do it.
Start by interrogating the image, not the instance
When an instance won’t boot, everyone’s instinct is to look at the instance. Wrong layer. Look at the image first:
openstack image show my-ubuntu-22.04
Read the disk_format, container_format, min_disk, min_ram, and the whole properties block. Then — and this is the step people skip — verify the actual file against what Glance claims it is:
qemu-img info downloaded-image.qcow2
If openstack image show says disk_format: raw but qemu-img info reports qcow2, you have a metadata lie, and Nova will try to treat a qcow2 file as raw bytes. The instance won’t boot, and the error will be unhelpful. I’ll paste both outputs into Claude side by side and ask it to flag mismatches between declared and actual format. It catches them instantly, which is the right job for it — a fast junior cross-checking two tables. It is not the thing that gets to fix the image registration; that’s me, after I’ve understood why the mismatch happened.
Pro Tip: qemu-img info also reports a virtual size. If your min_disk is smaller than that virtual size, scheduling will look fine and the disk will be silently too small. Set min_disk to at least the virtual size in GB.
qcow2 versus raw, and the Ceph trap
This one bites every operator who moves to Ceph-backed storage. With local storage, qcow2 images are great — thin, compressed, fast to upload. But when Glance and Nova sit on Ceph (RBD), qcow2 is a performance and correctness landmine, because RBD wants raw images to do copy-on-write cloning. If you upload a qcow2 to a Ceph-backed cloud, Nova has to convert it on every boot, which is slow and can fail under memory pressure.
The fix is force_raw_images = True in nova.conf, which converts to raw on the compute side, or — better — uploading already-raw images so Ceph can clone them instantly:
qemu-img convert -f qcow2 -O raw ubuntu.qcow2 ubuntu.raw
openstack image create ubuntu-raw --disk-format raw --container-format bare --file ubuntu.raw
When I’m deciding between converting on upload versus relying on force_raw_images, AI is a competent sounding board for the tradeoffs — conversion cost, storage footprint, clone speed. I’ve worked through more than one of these design debates in a prompt workspace where I can keep the storage backend facts pinned so the model stops “helpfully” assuming local LVM.
The hardware properties that decide whether the kernel finds its disk
Here is where most genuine boot failures live. The guest kernel has to find its root disk on a bus it has drivers for. If you set the wrong bus, the kernel boots, panics looking for root, and you get a kernel panic on the console that has nothing obviously to do with Glance.
openstack image set my-image \
--property hw_disk_bus=virtio \
--property hw_scsi_model=virtio-scsi \
--property hw_qemu_guest_agent=yes
hw_disk_bus=virtio (or scsi with hw_scsi_model=virtio-scsi) tells Nova how to attach the root disk. A modern Linux image with virtio drivers wants virtio. An image built for IDE that you tag as virtio will panic. hw_qemu_guest_agent=yes enables the guest agent channel — without it, things like in-guest password injection and graceful shutdown quietly don’t work. When I’m matching properties to an image, I describe the OS and build process to the AI and ask which hw_* properties are required. It’s a strong recall engine for the property namespace, far faster than me grepping the Glance docs. Then I verify against the docs, because a confidently wrong property suggestion produces a panic that looks like a Glance bug for hours. I keep my validated property sets as reusable prompts so I’m not re-deriving the same Windows-versus-Linux property matrix every quarter.
cloud-init, no-cloud-init, and the silent hang
A huge class of “the instance booted but I can’t log in” tickets are really cloud-init mismatches. A cloud image expects a metadata source — if your image has cloud-init but the network/config-drive metadata isn’t reachable, cloud-init can block for minutes at boot waiting for a datasource that never answers. Conversely, an image without cloud-init will never inject your SSH key, and you’ll sit there wondering why your keypair “doesn’t work.”
openstack server show my-instance -c fault
openstack console log show my-instance
The console log is the truth serum here — cloud-init prints exactly where it’s stuck. I paste a console log into GitHub Copilot Chat or my terminal assistant and ask it to identify the cloud-init datasource failure, and it’s reliably good at spotting the “waiting for metadata” stall. That’s the fast-junior pattern again: it reads the log faster than I do and points; I decide what it means for this specific cloud.
Pro Tip: If your console log shows cloud-init retrying a datasource for 120 seconds before giving up, your metadata service or config-drive setting is wrong — the image is fine.
Windows images and the virtio driver problem
Windows deserves its own paragraph because it fails differently. Windows has no virtio drivers out of the box. If you build a Windows image and tag hw_disk_bus=virtio, the installer or the booted OS won’t see the disk at all — instant blue screen or “no boot device.” You either inject the virtio drivers (from the virtio-win ISO) during image build, or you boot with hw_disk_bus=ide / sata for the install and switch to virtio after the drivers are present.
openstack image set windows-2022 \
--property hw_disk_bus=scsi \
--property hw_scsi_model=virtio-scsi \
--property os_type=windows
For signed images — where you’ve enabled image signature verification so Nova refuses to boot tampered images — remember that any property or content change can invalidate the signature, and the boot will fail with a verification error that looks nothing like a driver problem. AI is useful for reminding me of that gotcha; it is not useful as a place to put my signing keys, which it never sees.
The boundary, as always
I’ll feed an AI: image show output, qemu-img info output, console logs, and property lists. I will not feed it my clouds.yaml, my image signing keys, or admin credentials. The model is a fast junior engineer with an excellent memory for property names and a complete inability to know which of those properties is right for your hypervisor and storage backend. Verify every property change before you set it, because a wrong hw_disk_bus doesn’t error at set time — it errors at boot, after you’ve shipped it to a hundred instances. I keep the durable image-debugging runbooks under the OpenStack category and the polished checklists in a prompt pack.
Conclusion
Glance boot failures are metadata failures ninety percent of the time. Interrogate the image before the instance, cross-check declared format against qemu-img info, respect the qcow2-versus-raw rules on Ceph, get your hw_* properties matched to the guest’s drivers, and read the console log for cloud-init stalls. AI makes the lookup-table parts of this dramatically faster — it’s the junior who’s read every property doc — but the verification and the credential boundary stay with you. Boot failures are unforgiving precisely because they fail late, so the speed AI gives you is only worth anything if you verify before you ship.
Download the Free 500-Prompt DevOps AI Toolkit
500 battle-tested, copy-paste AI prompts engineered by a senior systems engineer — every one with fill-in placeholders and safety/back-out notes. Drop your email and it's yours.
- 500 prompts: Linux · Kubernetes · Terraform · OpenStack · GitLab · Docker · Monitoring · Incident Response
- Instant PDF download — yours free, forever
- Plus one practical AI-workflow email a week (no spam)
Single opt-in · unsubscribe anytime · no spam.