Managing Glance Images at Scale in OpenStack

Glance is the service nobody thinks about until it’s the problem. It stores your images, it works, and then one day you notice it’s holding two terabytes of forgotten snapshots, boots are slow because images copy across the network every time, and three teams each maintain their own subtly-different Ubuntu. Image management at scale is a discipline, not a default, and Glance gives you the tools — if you use them.

I’ve run Glance backing thousands of instances. Here’s how I keep it fast, lean, and trustworthy.

Pick the right backend, and it’s probably Ceph

The single biggest performance decision is the Glance backend. If you run Ceph for Cinder and Nova, store Glance images in the same Ceph cluster and you unlock copy-on-write boots: when a Ceph-backed image spawns a Ceph-backed volume, Nova clones it instantly inside Ceph instead of copying the whole image over the network.

[glance_store]
stores = rbd
default_store = rbd
rbd_store_pool = images
rbd_store_user = glance
rbd_store_ceph_conf = /etc/ceph/ceph.conf

The difference is dramatic: a 2GB image that took 60+ seconds to copy on boot becomes a near-instant CoW clone. If you’re on file or Swift backend and booting feels slow, this is almost always why. Booting from a non-CoW backend copies the full image every single time.

Image properties are how you control scheduling

Glance images carry properties that flow into Nova scheduling and into how the image is handled. The ones I always set:

openstack image create ubuntu-22.04 \
  --file ubuntu-22.04.qcow2 \
  --disk-format qcow2 \
  --container-format bare \
  --property hw_disk_bus=virtio \
  --property hw_scsi_model=virtio-scsi \
  --property os_type=linux \
  --property hw_qemu_guest_agent=yes \
  --min-disk 10 \
  --min-ram 1024

min-disk and min-ram stop someone booting your image on a flavor too small to run it. hw_qemu_guest_agent=yes enables the guest agent so you can do graceful reboots and password resets. Hardware properties like hw_disk_bus and required traits let you steer images to compatible hosts. Properties are free; set them at image-creation time and you prevent a whole class of “it booted but doesn’t work” tickets.

Public, shared, and community visibility

Image sprawl is mostly a visibility-discipline problem. Glance has four visibilities, and using them deliberately is half the battle:

public — everyone sees it. Reserve for your blessed, maintained base images, owned by an images admin project.
shared — explicitly shared with named projects via members.
community — visible to all but not promoted; good for “available but unsupported.”
private — default; only the owning project.

The pattern that works: one team owns the public base images, everyone else builds from those, and nobody promotes their own one-off to public. Without that ownership, every team makes a public “ubuntu-final-v2-REAL” and you get the sprawl.

# Share a hardened image with a specific project
openstack image add project hardened-rhel9 <project-id>
openstack image set hardened-rhel9 --accept --project <project-id>

The image cache for non-Ceph backends

If you’re not on Ceph CoW, the Nova image cache saves you from re-copying the same image on every boot to a given host. The first boot of an image on a compute node copies it; subsequent boots reuse the cached copy. Tune the cache manager so it cleans up unused images:

[DEFAULT]
remove_unused_base_images = True
remove_unused_original_minimum_age_seconds = 86400

Without this the cache grows until compute nodes run out of local disk — a slow-motion outage I’ve cleaned up more than once.

The cleanup discipline that actually holds

Image sprawl is inevitable without a policy. Mine:

Version with properties, not names. Tag images with a version and build_date property so you can find stale ones programmatically instead of guessing from names.
Deactivate before deleting. openstack image set --deactivate makes an image un-bootable without deleting it, so you can retire a base image safely and confirm nothing breaks before the delete.
Audit on a schedule. Script a report of images not used by any instance and older than N days. The SDK makes this easy and it’s the only thing that keeps the store lean over years.

# Find images older than a date, deactivate candidates
openstack image list --long -f json | jq '...'
openstack image set old-base-image --deactivate

I keep an AI prompt that takes a Glance image list (with properties and sizes) plus the set of image IDs currently referenced by instances, and produces a safe deletion candidate list — deactivate-first, with the reasoning for each. It turns the quarterly cleanup from a nervous afternoon into a reviewed list. A few of these are in our prompt library.

Signing and trust at scale

On clouds where image provenance matters, enable image signature verification so Nova refuses to boot an image that doesn’t match a trusted signature (keys in Barbican). It’s extra setup, but on a multi-tenant cloud where anyone can upload an image, it’s the difference between “a base image” and “a base image you can actually trust.”

Where to go next

Glance rewards a little discipline enormously. Put images on Ceph for copy-on-write boots, set properties at creation so scheduling and guest features just work, use visibility levels to stop sprawl, and run a scheduled audit-and-deactivate cycle so the store stays lean over years. Do that and Glance goes back to being the service you never think about — in the good way. For the Ceph, Nova, and Barbican services it integrates with, see the OpenStack category.

Image deletion is irreversible and images may be in use. Always deactivate and confirm an image is unreferenced before deleting it from production.