Planning OpenStack Upgrades Safely Without Downtime
OpenStack upgrades fail on the boring details: DB migrations, RPC version pinning, and ordering. Here's a battle-tested plan for upgrading without taking the cloud down.
- #openstack
- #upgrades
- #operations
- #database
- #rolling-upgrade
- #sre
OpenStack upgrades have a reputation, and it’s earned. The release cadence is fast, the services are tightly coupled through a shared message bus and databases, and a single skipped migration can wedge the control plane. After upgrading production clouds across many releases, I’ve learned the failures are almost never exotic — they’re boring details executed in the wrong order.
Here’s the plan I use to upgrade without taking the cloud down.
The rule: never skip a release
OpenStack supports upgrading one release at a time, and increasingly supports “skip-level” (SLURP) upgrades between designated releases. But unless you’re explicitly on a SLURP-to-SLURP path, upgrade sequentially. The RPC and object-version compatibility windows are designed for N to N+1. Jumping Caracal straight to two releases later, off a SLURP boundary, is how you discover undocumented migration gaps the hard way.
Map your path before anything else: current release, target release, and every release in between.
Step 1: Read the release notes like they’re a contract
Every release has upgrade notes per service. They tell you about removed config options, required DB migrations, and deprecations that became removals. Skipping this step is the most expensive shortcut in OpenStack operations.
Build a per-service change list: Keystone first (everything depends on it), then Glance, Nova, Neutron, Cinder, and the rest. Note every config key that’s renamed or removed — those are silent breakers.
Step 2: Understand the rolling-upgrade contract
Modern OpenStack services support rolling upgrades through three mechanisms you must respect:
- DB schema is expand/contract. New code reads old and new schema. You run schema expand migrations while old services still run, deploy new code, then run contract later.
- RPC version pinning. During the rollout, you pin the new services to speak the old RPC version so old and new agents coexist. For Nova:
[upgrade_levels]
compute = auto
Or pin explicitly to the previous release name during the transition, then remove the pin once every node is upgraded.
- Online data migrations. After deploying new code, you run background migrations to move data to new formats:
nova-manage db online_data_migrations
This must reach zero remaining before you start the next upgrade. A common failure: starting the next release while migrations from the last one are unfinished.
Step 3: Upgrade order within the control plane
The ordering that’s bitten me when ignored:
- Back up every database.
mysqldumpall OpenStack schemas, verified restorable. Non-negotiable. - Keystone first. Tokens must keep validating throughout.
- Sync schema (expand):
keystone-manage db_sync --expand, then--migrate. - Glance, Nova, Neutron, Cinder — each with db_sync, new code, RPC pin, online migrations.
- Compute nodes last, rolling, so the control plane (already new, RPC-pinned to old) keeps serving the still-old agents.
- Remove RPC pins, run contract migrations once every node is new.
Step 4: The compute-node dance
Compute nodes are where “no downtime” is won or lost. The control plane is upgraded and pinned to the old RPC version, so it can talk to both old and new nova-compute. Now roll the computes one (or one availability zone) at a time:
openstack server list --host <compute> --all-projects # know what's running
# live-migrate or evacuate workloads off if you want zero instance impact
openstack compute service set --disable <compute> nova-compute
# upgrade the node, restart nova-compute
openstack compute service set --enable <compute> nova-compute
Disabling the service first stops the scheduler from placing new instances on a node you’re about to bounce.
Step 5: Verify, then contract
After every node is new, don’t immediately rip out the compatibility scaffolding. Verify first:
openstack compute service list # all up, all new version
nova-manage db online_data_migrations # must report 0 remaining
Only then remove the RPC pins and run the --contract schema migrations that drop old columns. Contracting too early — while an old service still expects the old column — is a classic self-inflicted outage.
Using AI to de-risk the plan
Upgrade planning is reading-heavy and detail-heavy, which is exactly where an LLM earns its keep as a planning assistant. I paste the relevant release notes and ask:
“Here are the Nova and Neutron upgrade release notes between release A and release B. Produce an ordered checklist of breaking config changes, required db_sync steps, RPC pinning needed, and online migrations. Flag anything that requires action before I deploy new code. Do not invent steps not supported by these notes.”
That last constraint matters — left loose, models confidently hallucinate migration commands. Grounded in the actual notes, it’s excellent at turning prose into an ordered runbook. I keep these upgrade-planning prompts with my other OpenStack prompts.
Rehearse it, every time
The single highest-leverage practice: rehearse the upgrade in a staging cloud that mirrors production, with a snapshot of production’s database. Most upgrade failures are data-shaped — a migration that chokes on a row your test data never had. A dump-and-restore rehearsal surfaces those before they touch real workloads.
OpenStack upgrades are not scary once you internalize the contract: sequential releases, expand-deploy-migrate-contract, Keystone first and computes last, and nothing torn down until everything’s verified. Back up, rehearse, and order the steps deliberately. For more upgrade and operations prompts, browse our prompt library.
AI-generated upgrade checklists are assistive, not authoritative. Validate every step against the official release notes and rehearse in staging first.
Download the Free 500-Prompt DevOps AI Toolkit
500 battle-tested, copy-paste AI prompts engineered by a senior systems engineer — every one with fill-in placeholders and safety/back-out notes. Drop your email and it's yours.
- 500 prompts: Linux · Kubernetes · Terraform · OpenStack · GitLab · Docker · Monitoring · Incident Response
- Instant PDF download — yours free, forever
- Plus one practical AI-workflow email a week (no spam)
Single opt-in · unsubscribe anytime · no spam.