Skip to content
DevOps AI ToolKit
Newsletter
All guides
AI for OpenStack By James Joyner IV · · 11 min read

Debugging Neutron Floating IPs and NAT in OpenStack

Floating IPs that don't route, DNAT that silently drops, and SNAT egress failures. Here's how to trace OpenStack L3 NAT through routers and namespaces, with AI help.

  • #openstack
  • #neutron
  • #floating-ip
  • #nat
  • #networking

Floating IP problems are some of the most frustrating in OpenStack because the IP looks perfectly assigned in the API and yet nothing routes. The reason is that a floating IP is not a thing that lives on the instance — it is a NAT rule inside a Neutron router, several layers away from the VM. When traffic to a floating IP disappears, you are debugging DNAT, SNAT, network namespaces, and the L3 agent, not the instance. This guide is the trace path I follow so I stop blaming the guest OS.

Confirm the binding before anything else

Start by proving the association exists and points where you think. The API view is cheap and rules out the dumb mistakes.

openstack floating ip show <floating-ip>
openstack floating ip list --port <instance-port-uuid>
openstack server show <instance> -f value -c addresses

The port_id and fixed_ip_address on the floating IP must match the instance’s actual port and internal IP. I have lost time on a floating IP that was associated to a stale port from a rebuilt instance — the API happily showed it “assigned” to nothing useful.

Find the router and its namespace

The NAT lives in a Neutron router, which on most deployments runs inside a network namespace on a network node. You cannot debug the NAT without getting into that namespace.

openstack router list
openstack port list --router <router-uuid> --device-owner network:router_gateway
# On the network/L3 node:
ip netns | grep <router-uuid>
ip netns exec qrouter-<router-uuid> ip addr

If the namespace does not exist on any node, the L3 agent never scheduled the router — that is your bug, and it is an agent problem, not a NAT problem. Check openstack network agent list for a down L3 agent.

Pro Tip: With OVN, there is no qrouter namespace — NAT lives in the OVN northbound database. Run ovn-nbctl lr-nat-list <router> instead of poking namespaces. Knowing which backend you run determines the entire debug path, so confirm it first.

Read the actual NAT rules

Inside the router namespace, the DNAT and SNAT rules are plain iptables. Reading them tells you whether Neutron actually programmed the translation.

ip netns exec qrouter-<router-uuid> iptables -t nat -S | grep -E 'DNAT|SNAT'

You want to see a DNAT rule mapping your floating IP to the instance’s fixed IP, and a matching SNAT rule for return traffic. If the DNAT rule is missing, the L3 agent failed to apply the floating IP — restart the agent and watch its log. If the rule is present but traffic still fails, the problem is upstream: the external network, the gateway, or a security group.

Don’t forget the security group

A perfectly NATed packet still dies if the instance’s security group drops it. This catches everyone at least once because the NAT is invisible and the security group feels unrelated.

openstack security group rule list <sg-uuid>
openstack port show <instance-port-uuid> -f value -c security_group_ids

DNAT rewrites the destination to the fixed IP, then the security group on that port evaluates the packet. If there is no ingress rule for the port (say, port 443 from anywhere), the NAT worked and the firewall dropped it. Add the rule and traffic flows.

Trace egress (SNAT) separately

Inbound floating IP and outbound internet access are different NAT directions and fail independently. If instances cannot reach the internet but the floating IP inbound works, you have an SNAT or gateway problem.

ip netns exec qrouter-<router-uuid> ping -c2 <external-gateway-ip>
ip netns exec qrouter-<router-uuid> ip route

If the router namespace cannot ping its own external gateway, the external network or the physical uplink is the issue, and no amount of floating IP fiddling will fix egress.

Where AI accelerates the trace

This is a multi-layer trace, and an AI assistant is a strong fast junior for correlating the outputs. I paste the floating ip show, the namespace iptables -t nat -S, and the security group rules, and ask it to confirm whether the DNAT target matches the instance’s fixed IP and whether an ingress rule exists for the service port. It reliably catches the “DNAT points at the old fixed IP” class of bug.

I keep it sanitized — I scrub public IPs and never hand it credentials or a working clouds.yaml. The model reasons about the rules and tells me which layer is broken; I run the fix, especially anything that restarts the L3 agent, because that briefly disrupts every router on the node. The incident response dashboard is where I run this during an outage, and the prompt library has network-trace prompts. For the OVN side, the networking prompt pack includes ovn-nbctl triage prompts.

ip netns exec qrouter-<router-uuid> iptables -t nat -S \
  | sed -E 's/[0-9]+\.[0-9]+\.[0-9]+\.[0-9]+/IP/g'

A tool like Warp makes capturing and re-running these namespace commands less painful, but it is the human who restarts agents.

ARP and the gateway: the layer below NAT

Even with perfect NAT rules, a floating IP fails if the upstream network does not know where to send packets for it. The router answers ARP for floating IPs on the external network, and if that ARP is stale or suppressed, traffic never reaches the namespace to be translated. This is the failure that survives every iptables check and drives people to despair.

ip netns exec qrouter-<router-uuid> ip neigh
ip netns exec qrouter-<router-uuid> arping -c3 -I qg-<id> <floating-ip>

After a router moves between network nodes — an L3 HA failover, or a manual reschedule — the upstream switches may still cache the old MAC for the floating IP until a gratuitous ARP corrects them. Neutron sends those GARPs, but a switch with ARP suppression or a slow aging timer can hold the stale entry long enough to look like a NAT bug. When inbound dies right after a failover but the rules are perfect, suspect ARP at the physical layer before you touch Neutron again.

DVR changes where the NAT lives

If you run Distributed Virtual Routing, this whole trace shifts. With DVR, floating IP DNAT happens on the compute host where the instance lives, not on a central network node, so the namespace you need is fip-<network-id> on the compute node, not qrouter on the network node. Debugging DVR on the wrong host is a classic time-sink.

# On the compute host running the instance, with DVR:
ip netns | grep -E 'fip-|qrouter-'
ip netns exec fip-<network-id> iptables -t nat -S | grep DNAT

The lesson is the same as with OVN: confirm your L3 topology — centralized, DVR, or OVN — before you start tracing, because each puts the NAT in a different place and a trace on the wrong node finds nothing and tells you nothing. Five seconds of openstack network agent list and a config check saves an hour of poking empty namespaces.

Conclusion

A floating IP is a NAT rule in a router, not an address on a VM, and every effective debug starts from that fact. Confirm the binding, find the router namespace, read the DNAT and SNAT rules, then check the security group and egress path separately. An AI assistant is a capable fast junior for correlating those layered outputs and spotting a stale NAT target — keep public IPs and credentials out of the prompt, verify its conclusion against the actual iptables output, and run agent restarts yourself. More Neutron and networking guides live under the OpenStack category.

Free download · 368-page PDF

Download the Free 500-Prompt DevOps AI Toolkit

500 battle-tested, copy-paste AI prompts engineered by a senior systems engineer — every one with fill-in placeholders and safety/back-out notes. Drop your email and it's yours.

  • 500 prompts: Linux · Kubernetes · Terraform · OpenStack · GitLab · Docker · Monitoring · Incident Response
  • Instant PDF download — yours free, forever
  • Plus one practical AI-workflow email a week (no spam)

Single opt-in · unsubscribe anytime · no spam.