Migrating Neutron to OVN Networking in OpenStack
Why OVN replaces the agent sprawl, how the migration actually works, and how to debug the OVN southbound DB when networking breaks in OpenStack.
- #openstack
- #ovn
- #neutron
- #networking
- #sdn
- #ovs
For years the OpenStack networking story was a sprawl of agents — the L3 agent, DHCP agent, metadata agent, and an OVS agent on every node, all coordinating over RabbitMQ. It worked, but it was a lot of moving parts to keep alive, and RabbitMQ became a bottleneck at scale. OVN (Open Virtual Network) changes the model: it pushes the logic into a distributed database and OpenFlow flows programmed directly onto each hypervisor’s Open vSwitch, killing most of those agents. After running both, I’m convinced OVN is where you want to be — but the migration and the debugging are genuinely different, and the old neutron agent-list muscle memory will mislead you. Here’s the real picture.
What changes when you move to OVN
With OVN, the ml2/ovs mechanism driver is replaced by ml2/ovn. Instead of per-node L3/DHCP/metadata agents talking over RabbitMQ, you get:
- A northbound DB (NB) that holds the logical network model Neutron writes to (logical switches, routers, ports, ACLs).
- A southbound DB (SB) that holds the physical realization — which chassis (hypervisor) hosts what, and the logical flows.
ovn-controlleron every compute/network node, which reads the SB and programs OpenFlow rules into local OVS.
Routing becomes distributed (each hypervisor does its own east-west and floating-IP routing), DHCP and metadata are served natively by OVN, and RabbitMQ is largely out of the data path. Fewer agents, less message-bus load, faster failover. The cost: a new mental model and new debugging tools.
Step 1: How to migrate without a flag day
You don’t rip out OVS and bolt on OVN in one shot on a live cloud. The supported path uses the neutron-ovn-migration tooling (TripleO/OpenStack-Ansible ship playbooks for it), and the safe sequence is:
- Stand up the OVN NB/SB databases and
ovn-northdon the controllers. - Migrate configuration so Neutron writes to
ml2/ovnwhile existing ports keep working. - Roll
ovn-controlleronto hypervisors and migrate data-plane state node by node. - Decommission the old L3/DHCP/metadata/OVS agents once each node is converted.
The single biggest piece of advice: do this in a maintenance window with a tested rollback, validate on a staging cloud first, and migrate hypervisors in small batches confirming connectivity after each. Network migrations are the one place “move fast” gets you a region-wide outage.
Step 2: Verify OVN is healthy
After migration, forget neutron agent-list for the data plane — query OVN directly. Check the databases and the chassis:
# Northbound: logical model Neutron wrote
ovn-nbctl show
# Southbound: physical realization + which chassis are up
ovn-sbctl show
ovn-sbctl list chassis
Every compute node should appear as a chassis in ovn-sbctl show. A node that’s missing means its ovn-controller isn’t registered — that node’s VMs will lose networking. Confirm the controller and northd:
systemctl status ovn-controller
systemctl status ovn-northd
ovn-nbctl get-connection
ovs-vsctl get open_vswitch . external_ids:ovn-remote
ovn-remote on each node must point at the SB DB; a wrong or stale value is the classic “this one node has no connectivity after migration” bug.
Step 3: Debug a port that has no connectivity
When a specific VM can’t talk, trace it through the logical model. Find its logical port and confirm it’s bound to a chassis:
ovn-nbctl show | grep -A2 <port-or-network>
ovn-sbctl show | grep -A3 <chassis>
# Is the port actually bound to a hypervisor?
ovn-sbctl find Port_Binding logical_port=<neutron-port-id>
A Port_Binding with an empty chassis means OVN knows about the port logically but hasn’t bound it to a hypervisor — usually the VM’s host ovn-controller is down or the port was created while the node was offline. The killer feature here is ovn-trace, which simulates a packet through the logical pipeline without sending anything:
ovn-trace <logical-switch> 'inport=="<port>" && eth.dst==<mac> && ip4.dst==<dst-ip>'
That shows you exactly which logical flow drops or forwards the packet — security group ACL, router, or NAT — without tcpdump archaeology.
Step 4: Floating IPs and NAT under OVN
Floating IPs become distributed NAT (DNAT/SNAT) entries in OVN rather than something the L3 agent does centrally. If a floating IP doesn’t work after migration, check the NAT rules in the northbound DB:
ovn-nbctl lr-nat-list <logical-router>
A missing or wrong NAT entry points at a Neutron-to-OVN sync issue. For gateway-bound traffic, confirm which chassis hosts the gateway port (ovn-sbctl show lists gateway chassis) — distributed FIP still pins the external gateway to specific chassis, and if that chassis is down, north-south traffic for those networks stops.
Step 5: Watch the databases at scale
The OVN SB DB is now critical infrastructure — if it’s unavailable, no ovn-controller can update flows. Run the NB/SB DBs in a clustered (RAFT) setup across your controllers, monitor their disk and connection counts, and watch ovn-northd processing latency. A bloated SB DB or a northd that’s falling behind shows up as networking changes taking minutes to apply — the OVN equivalent of the old RabbitMQ backlog.
Where AI helps
OVN’s strength — pushing logic into databases and flows — is also what makes it opaque when it breaks. The ovn-nbctl show, ovn-sbctl show, and ovn-trace output is dense. I’ll paste those into a model and ask it to confirm the port is bound, find the chassis, and read the ovn-trace output to tell me which logical flow dropped the packet. It’s genuinely good at parsing a trace and pointing at “this ACL dropped it” faster than I’d find it by hand.
Keep a saved OVN triage prompt and pair it with our broader OpenStack guides, especially the Neutron deep-dive — OVN is Neutron’s future, but the concepts build on each other. The model reads the databases and the trace; you run every ovn-*ctl command and you migrate hypervisors in small, reversible batches.
Generated commands are assistive, not authoritative. Always verify against your own deployment and test the migration on staging before touching production networking.
Download the Free 500-Prompt DevOps AI Toolkit
500 battle-tested, copy-paste AI prompts engineered by a senior systems engineer — every one with fill-in placeholders and safety/back-out notes. Drop your email and it's yours.
- 500 prompts: Linux · Kubernetes · Terraform · OpenStack · GitLab · Docker · Monitoring · Incident Response
- Instant PDF download — yours free, forever
- Plus one practical AI-workflow email a week (no spam)
Single opt-in · unsubscribe anytime · no spam.