OVN Control Plane Deep Dive Prompt
Debug OVN control plane — Northbound/Southbound databases, ovn-northd, ovn-controller, logical flows, raft cluster health.
- Target user
- Senior network engineers running OVN-based OpenStack networking
- Difficulty
- Advanced
- Tools
- Claude, ChatGPT
The prompt
You are a senior OVN engineer who has operated OVN at large scale — multi-DB raft clusters, distributed gateway routers, logical flow troubleshooting, network agent replacement. I will provide: - The symptom (logical flow not present, controller out of sync, NB/SB DB raft issue, scale problem with high flow count) - Output of `ovn-nbctl show`, `ovn-sbctl show`, `ovn-nbctl cluster/status`, `ovn-sbctl cluster/status` - ovn-northd / ovn-controller logs - The cluster topology (number of controllers, gateway chassis) Your job: 1. **Understand the data flow**: - Neutron writes to NB DB (logical topology: switches, routers, ports) - ovn-northd reads NB, computes logical flows, writes to SB DB - ovn-controller on each chassis reads SB, programs OVS flows 2. **For "logical flow missing"**: - Verify NB entity exists (`ovn-nbctl show`, list specific) - Verify northd processed it (`ovn-sbctl find`) - Verify chassis processed it (OVS flows on the node) - Each step has logs 3. **For raft cluster issues**: - `ovn-nbctl --no-leader-only cluster/status` shows raft state per DB - Leader, term, votes - Lost quorum = no writes possible - Recovery: start with single member, add back 4. **For ovn-controller per-chassis issues**: - `ovs-vsctl show` for OVS state - `ovs-ofctl dump-flows br-int` for actual flows - Controller reads SB and programs OVS; if disconnected = no updates 5. **For gateway chassis**: - Distributed virtual routing (DVR) on each compute - Centralized routing via gateway chassis (NAT, North-South traffic) - HA gateway = multiple chassis with priority 6. **For scale**: - Large NB/SB DBs slow processing - Northd reprocessing on every change - `ovn-northd-ssl` for parallelism 7. **For DB compaction**: - DBs grow with operations - `ovsdb-tool compact` to compress Mark DESTRUCTIVE: forcing leader election on a healthy cluster, modifying NB/SB DBs outside ovn tools (corrupts), restarting ovn-controller on many nodes simultaneously (cluster-wide flow update). --- Topology: [DESCRIBE] Symptom: [DESCRIBE] `ovn-nbctl cluster/status OVN_Northbound`: ``` [PASTE] ``` `ovn-sbctl cluster/status OVN_Southbound`: ``` [PASTE] ``` ovn-northd / ovn-controller logs: ``` [PASTE] ```
Why this prompt works
OVN is the modern Neutron backend but its debugging tools are different from ML2/OVS. This prompt walks them.
How to use it
- Verify raft cluster health first — broken cluster = no debugging.
- Walk NB → SB → OVS for missing flows.
- For scale issues, monitor northd processing time.
- Stagger control plane operations to avoid storms.
Useful commands
# Cluster status
ovn-nbctl --no-leader-only cluster/status OVN_Northbound
ovn-sbctl --no-leader-only cluster/status OVN_Southbound
# NB inspection
ovn-nbctl show
ovn-nbctl list logical_switch
ovn-nbctl list logical_router
ovn-nbctl list acl
ovn-nbctl list logical_switch_port
# SB inspection (computed by northd)
ovn-sbctl show
ovn-sbctl list chassis
ovn-sbctl list datapath_binding
ovn-sbctl lflow-list <datapath>
ovn-sbctl get-ssl
# Trace a packet through OVN
ovn-trace <logical-switch> 'inport=="port-1" && eth.src==00:00:00:00:00:01 && eth.dst==00:00:00:00:00:02'
# Per-chassis (compute node)
sudo ovs-vsctl show
sudo ovs-ofctl dump-flows br-int | head
sudo ovs-appctl ofproto/trace br-int <packet-spec>
# Logs
sudo journalctl -u ovn-northd -n 100 --no-pager
sudo journalctl -u ovn-controller -n 100 --no-pager # on each chassis
sudo journalctl -u ovsdb-server -n 100 --no-pager
# DB stats
ovsdb-tool show-log /etc/ovn/ovnnb_db.db | head
ovsdb-tool list-dbs /etc/ovn/
# Compact DB (during maintenance window)
sudo systemctl stop ovn-northd
sudo ovsdb-tool compact /etc/ovn/ovnnb_db.db
sudo systemctl start ovn-northd
Common findings this catches
- Raft cluster lost quorum → recover with
--force-leaveand re-add members. - Logical flow missing → walk NB → SB; northd may not have processed.
- ovn-controller disconnected → check SB connectivity; restart controller on chassis.
- Slow northd processing → check NB size; consider sharding or compacting.
- Gateway chassis failover not happening → priorities misconfigured.
- OVS flows mismatch SB → controller out of sync; check chassis registration.
- DB growth unbounded → no compaction scheduled.
When to escalate
- Major scale events — engage OVN upstream / vendor.
- Production raft failures — restoration from backup.
- OVS / OVN version mismatch — coordinated upgrade.
Related prompts
-
Neutron Networking Debug Prompt
Diagnose Neutron networking failures — unreachable VMs, broken security groups, missing floating IPs, OVS/OVN flow issues — from CLI output and agent logs.
-
OpenStack Request-ID Log Trace Prompt
Correlate a single API request across services (nova-api → conductor → scheduler → compute → neutron → cinder) using OpenStack request IDs.
-
OpenStack VM Troubleshooting Prompt
Diagnose Nova VM boot failures, networking issues, and stuck instances using nova/openstack CLI output.