Tuning OVN Gateway Chassis and BFD for L3 Failover in

The first time an OVN gateway chassis failed in one of my clouds, I learned exactly how long sixty seconds is. North-south traffic for every external network rode through that one node, BFD was running on lazy default timers, and floating IPs sat dark while OVN slowly noticed the chassis was gone. Nobody had ever load-tested the failover path. We had built a beautiful distributed control plane and then funneled all egress through a single point that nobody had tuned. Let me walk through how to avoid that, and how AI helps you reason about it without ever letting it touch the cluster.

Why Gateway Chassis Are the Choke Point

In OVN-backed Neutron, distributed routing (east-west) happens on every compute, but north-south traffic to external/provider networks is handled by gateway chassis — a designated subset of nodes that own the logical router’s gateway port. SNAT for outbound traffic and the landing point for floating IPs both live there. If you have one gateway chassis, you have one failure domain for all egress. If you have several, OVN schedules each router’s gateway port to a chassis and can fail it over, but only as fast as it detects the loss.

Start by seeing what you actually have:

ovn-nbctl list Logical_Router_Port | grep -A2 gateway_chassis
ovn-sbctl list Chassis | grep -E 'hostname|name'
ovs-appctl -t ovn-controller list-commands

The gateway_chassis entries on each router port show the priority-ordered list of chassis that can host it. If every router lists the same single chassis, you have a design problem, not a tuning problem. The openstack category collects the related OVN playbooks if you’re newer to this.

Spreading Gateway Ports Across Chassis

You want enough gateway chassis that the loss of any one redistributes a tolerable fraction of routers, and you want OVN to spread gateway ports rather than pile them on one node. The number of candidate chassis per gateway port is controlled by max_gw_chassis (commonly set via the Neutron OVN config). With three or more gateway chassis and a sane max_gw_chassis, OVN assigns each router a priority list so failover has somewhere to go.

Check the actual distribution, because the config intent and reality often diverge:

ovn-sbctl --columns=logical_port,chassis list Port_Binding \
  | grep -i cr-lrp

The cr-lrp (chassis-redirect logical router port) bindings tell you which chassis is currently serving each router’s gateway. If they cluster on one node, you have a hot spot that will hurt at failover time even if the config looks balanced.

Prompt: “Here are my cr-lrp Port_Binding rows mapping router gateway ports to chassis, and my list of 4 gateway chassis. Cross-tabulate how many active gateway ports each chassis currently serves, flag any chassis serving more than 1.5x the mean, and tell me how many routers would migrate if the busiest chassis failed. Show the per-chassis counts as a table. Do not suggest any ovn-nbctl write commands.”

Output: A table showing chassis-3 holding 41% of active gateway ports while chassis-1 held 12%, plus the count of routers that would migrate on a chassis-3 loss. It correctly noted this was a distribution observation, not a recommendation to rebalance live.

That cross-tabulation is the genuine time-saver. The AI is a fast junior engineer here: it reads the dump and ranks the hot spots in seconds. But I verify its router-migration count against the actual priority lists before I believe the blast-radius number, because it’s the kind of thing a model will confidently estimate from incomplete data.

BFD Timers: The Knob That Actually Decides Failover Speed

OVN uses BFD between chassis to detect tunnel-endpoint liveness, and BFD is what determines how fast a dead gateway chassis is noticed. The defaults are conservative. You can inspect and tune the interval and multiplier:

ovs-vsctl list interface | grep -A5 bfd
ovs-vsctl set interface <tunnel-iface> \
  bfd:enable=true bfd:min_rx=100 bfd:min_tx=100 bfd:mult=3

With min_tx=100 and mult=3, you detect a failure in roughly 300ms instead of the multi-second default. But faster is not free: aggressive BFD timers on a congested or jittery underlay produce false positives, which cause gateway ports to flap between chassis and create more outages than they prevent. There is no universally correct value; it depends on your underlay’s latency and stability.

Pro Tip: Never let an AI pick your BFD timers from first principles. Have it lay out the detection-time math (detection = min_rx * mult) and the false-positive tradeoff, then choose values you’ve validated against your own underlay’s measured jitter. A timer that’s great in a lab will flap a busy spine.

Validating Failover Before You Need It

The single most important thing you can do is test failover on purpose, in a window, before a leaf switch tests it for you. Pick a low-traffic router, start a continuous ping to its floating IP, and disable the gateway chassis it’s bound to:

# from a client, keep this running
ping -i 0.2 <floating-ip>

# on the gateway chassis, simulate loss
sudo systemctl stop ovn-controller   # or down the tunnel iface

Count the dropped packets. That number — not your config — is your real failover time. Then watch the gateway port rebind:

ovn-sbctl --columns=logical_port,chassis list Port_Binding | grep cr-lrp

When I’m reading the before/after binding tables and the ping-loss output, I’ll have Claude summarize “router X moved from chassis-2 to chassis-4, 6 packets lost, ~1.2s gap” across a batch of test routers. That batch summary is the win. The verdict — whether 1.2s is acceptable for a given workload — stays with me. For the structured prompts I reuse to drive these tests, I keep templates in the prompt workspace.

Reading the Failure Modes

Two failure modes recur. The first is flapping: BFD timers too aggressive for the underlay, so gateway ports bounce between chassis and connections reset repeatedly. The symptom is intermittent egress drops with no node actually down. The second is stranding: a chassis that’s up at the OVN level but has lost its external uplink, so it keeps the gateway port but can’t actually forward north-south. BFD on the tunnel mesh won’t catch an external-uplink failure — that’s a different liveness check entirely, and assuming BFD covers it is a classic mistake.

AI is good at helping you reason through which mode you’re in from logs and binding history, but it cannot tell you whether your external uplinks are healthy unless you give it that data. So I feed it the OVN side and the uplink/interface state and ask it to distinguish “OVN thinks the chassis is fine but it can’t egress” from “OVN failed it over correctly.” Keeping both data sources in the prompt is what makes the diagnosis trustworthy.

Conclusion

OVN gives you a distributed router and then quietly concentrates north-south traffic on a handful of gateway chassis whose failover speed you have to earn through deliberate tuning. Spread the gateway ports, size BFD timers against your real underlay’s jitter, and — above all — test failover on purpose and measure the packet loss. AI accelerates every reading step: cross-tabulating gateway-port distribution, summarizing failover tests, laying out BFD math. None of those summaries are ground truth until you’ve verified them against ovn-sbctl and a real ping test. Keep the model on the reading side and your hands on the writes. For more OVN diagnostic prompts, browse the prompts library.

Tuning OVN Gateway Chassis and BFD for L3 Failover in OpenStack