Neutron Networking Debug Prompt
Diagnose Neutron networking failures — unreachable VMs, broken security groups, missing floating IPs, OVS/OVN flow issues — from CLI output and agent logs.
- Target user
- OpenStack network engineers and platform operators
- Difficulty
- Advanced
- Tools
- Claude, ChatGPT
The prompt
You are a senior OpenStack network engineer with deep experience running Neutron on both ML2/OVS and ML2/OVN backends in production. You can read OVS flows, OVN logical flows, and Neutron agent logs fluently. I will provide: - A symptom (VM unreachable, floating IP not working, security group not enforcing, DHCP failing, intermittent packet loss, etc.) - The Neutron backend (ML2/OVS + L3 agent + DHCP agent, or ML2/OVN) - Output from `openstack network/port/router/security group/floating ip show` - Relevant agent logs (`neutron-server`, `neutron-openvswitch-agent`, `neutron-l3-agent`, or `ovn-controller`, `ovn-northd`) Your job: 1. **Map the path** the affected packet should take, named hop by hop: - VM tap → linux bridge (if OVS hybrid) → integration bridge (`br-int`) → tunnel/provider bridge → wire → return path - For OVN: VM logical port → logical switch → logical router → distributed gateway → physical 2. **Identify which hop fails** based on the data provided. 3. **List the *minimum* additional commands** you need (be exact about which host and which command — don't ask for "more logs"). 4. **Label any DANGEROUS command** that would touch shared infrastructure (`ovs-vsctl del-br`, `systemctl restart neutron-server`, `iptables -F`, etc.). 5. **Give a root-cause hypothesis** with reasoning grounded in the evidence. 6. **Recommend the fix** as a concrete diff or command, with a rollback. Common failure classes to consider: - Port binding failed / port DOWN — agent ↔ neutron-server connectivity, MTU mismatch, missing `bridge_mappings` - Security group not enforcing — `firewall_driver` mismatch, OVS-fw vs iptables-hybrid, conntrack zone issues - Floating IP not reachable — DNAT rule missing on L3 agent, `gateway_ip` misconfig, BGP/dynamic-routing agent down - VXLAN/Geneve tunnel down — `local_ip` wrong, MTU, missing tunnel between compute and network nodes - East-west drops — security group default-deny, missing ALLOW between project networks - DHCP not serving — dnsmasq process gone, namespace missing, dnsmasq lease conflict Backend: [ML2/OVS / ML2/OVN] OpenStack release: [yoga / zed / antelope / bobcat / caracal / dalmatian / epoxy] Symptom: [DESCRIBE] Relevant output: ``` [PASTE] ```
Why this prompt works
Neutron failures look identical from the outside (“VM can’t ping”) but have wildly different root causes depending on which hop fails. This prompt forces the model to think in terms of the actual packet path rather than guessing at agent restarts.
The backend (OVS vs OVN) is critical because the diagnostic commands are entirely different. ML2/OVS uses ovs-ofctl dump-flows, network namespaces, and L3-agent logs. OVN uses ovn-trace, logical flows, and ovn-controller logs. Telling the model the backend up front saves an entire round trip.
How to use it
- Specify the backend (OVS or OVN). This single fact changes 80% of the diagnostic commands.
- Always include
openstack port show <port-id>for the affected VM port — it’s the single most information-dense command. - For OVN: paste
ovn-nbctl show(logical topology) andovn-sbctl lflow-listfiltered to the relevant logical switch. - For OVS: paste
ovs-vsctl showandip netns exec qrouter-<id> ip a(orqdhcp-<id>).
Useful commands to gather first
# Common to both backends
openstack port show <port-uuid>
openstack server show <vm-uuid>
openstack security group rule list <sg-uuid>
openstack floating ip show <fip-uuid>
openstack router show <router-uuid>
# OVS-specific (run on compute node)
sudo ovs-vsctl show
sudo ovs-ofctl dump-flows br-int | grep <mac-or-ip>
sudo ip netns list
sudo ip netns exec qrouter-<id> ip route
sudo journalctl -u neutron-openvswitch-agent -n 200 --no-pager
sudo journalctl -u neutron-l3-agent -n 200 --no-pager
# OVN-specific (run on OVN central + compute)
sudo ovn-nbctl show
sudo ovn-sbctl show
sudo ovn-sbctl lflow-list <logical-switch>
sudo ovn-trace <logical-switch> 'inport=="<lport>" && eth.src==<mac> && ...'
sudo journalctl -u ovn-controller -n 200 --no-pager
Common findings this catches
- Port stuck in DOWN even though VM is ACTIVE →
neutron-openvswitch-agentnot reporting to neutron-server (RabbitMQ issue) orbridge_mappingsmismatch in agent config. - Floating IP works inbound, fails outbound → SNAT rule missing, often after L3 agent restart without proper failover.
- Security group reload “succeeded” but rules not applied → OVS-fw driver hot-reload race;
openvswitch-agentneeds full reload. - Cross-AZ traffic drops → MTU mismatch between tunnel networks; jumbo frames negotiated on one node only.
When to ask a human
If the model suggests anything involving ovs-vsctl del-, ovn-nbctl destroy, or restarting OVN northd/southd, stop and verify with your network team. These are unrecoverable from an “oops” perspective.
Related prompts
-
OpenStack Request-ID Log Trace Prompt
Correlate a single API request across services (nova-api → conductor → scheduler → compute → neutron → cinder) using OpenStack request IDs.
-
OpenStack VM Troubleshooting Prompt
Diagnose Nova VM boot failures, networking issues, and stuck instances using nova/openstack CLI output.
-
RabbitMQ Queue Investigation Prompt
Investigate backed-up queues, dead-letter spillover, and consumer issues in RabbitMQ clusters.