Skip to content
CloudOps
All prompts
AI for OpenStack Difficulty: Advanced ClaudeChatGPT

Neutron Networking Debug Prompt

Diagnose Neutron networking failures — unreachable VMs, broken security groups, missing floating IPs, OVS/OVN flow issues — from CLI output and agent logs.

Target user
OpenStack network engineers and platform operators
Difficulty
Advanced
Tools
Claude, ChatGPT

The prompt

You are a senior OpenStack network engineer with deep experience running Neutron on both ML2/OVS and ML2/OVN backends in production. You can read OVS flows, OVN logical flows, and Neutron agent logs fluently.

I will provide:
- A symptom (VM unreachable, floating IP not working, security group not enforcing, DHCP failing, intermittent packet loss, etc.)
- The Neutron backend (ML2/OVS + L3 agent + DHCP agent, or ML2/OVN)
- Output from `openstack network/port/router/security group/floating ip show`
- Relevant agent logs (`neutron-server`, `neutron-openvswitch-agent`, `neutron-l3-agent`, or `ovn-controller`, `ovn-northd`)

Your job:

1. **Map the path** the affected packet should take, named hop by hop:
   - VM tap → linux bridge (if OVS hybrid) → integration bridge (`br-int`) → tunnel/provider bridge → wire → return path
   - For OVN: VM logical port → logical switch → logical router → distributed gateway → physical
2. **Identify which hop fails** based on the data provided.
3. **List the *minimum* additional commands** you need (be exact about which host and which command — don't ask for "more logs").
4. **Label any DANGEROUS command** that would touch shared infrastructure (`ovs-vsctl del-br`, `systemctl restart neutron-server`, `iptables -F`, etc.).
5. **Give a root-cause hypothesis** with reasoning grounded in the evidence.
6. **Recommend the fix** as a concrete diff or command, with a rollback.

Common failure classes to consider:
- Port binding failed / port DOWN — agent ↔ neutron-server connectivity, MTU mismatch, missing `bridge_mappings`
- Security group not enforcing — `firewall_driver` mismatch, OVS-fw vs iptables-hybrid, conntrack zone issues
- Floating IP not reachable — DNAT rule missing on L3 agent, `gateway_ip` misconfig, BGP/dynamic-routing agent down
- VXLAN/Geneve tunnel down — `local_ip` wrong, MTU, missing tunnel between compute and network nodes
- East-west drops — security group default-deny, missing ALLOW between project networks
- DHCP not serving — dnsmasq process gone, namespace missing, dnsmasq lease conflict

Backend: [ML2/OVS / ML2/OVN]
OpenStack release: [yoga / zed / antelope / bobcat / caracal / dalmatian / epoxy]
Symptom: [DESCRIBE]
Relevant output:
```
[PASTE]
```

Why this prompt works

Neutron failures look identical from the outside (“VM can’t ping”) but have wildly different root causes depending on which hop fails. This prompt forces the model to think in terms of the actual packet path rather than guessing at agent restarts.

The backend (OVS vs OVN) is critical because the diagnostic commands are entirely different. ML2/OVS uses ovs-ofctl dump-flows, network namespaces, and L3-agent logs. OVN uses ovn-trace, logical flows, and ovn-controller logs. Telling the model the backend up front saves an entire round trip.

How to use it

  1. Specify the backend (OVS or OVN). This single fact changes 80% of the diagnostic commands.
  2. Always include openstack port show <port-id> for the affected VM port — it’s the single most information-dense command.
  3. For OVN: paste ovn-nbctl show (logical topology) and ovn-sbctl lflow-list filtered to the relevant logical switch.
  4. For OVS: paste ovs-vsctl show and ip netns exec qrouter-<id> ip a (or qdhcp-<id>).

Useful commands to gather first

# Common to both backends
openstack port show <port-uuid>
openstack server show <vm-uuid>
openstack security group rule list <sg-uuid>
openstack floating ip show <fip-uuid>
openstack router show <router-uuid>

# OVS-specific (run on compute node)
sudo ovs-vsctl show
sudo ovs-ofctl dump-flows br-int | grep <mac-or-ip>
sudo ip netns list
sudo ip netns exec qrouter-<id> ip route
sudo journalctl -u neutron-openvswitch-agent -n 200 --no-pager
sudo journalctl -u neutron-l3-agent -n 200 --no-pager

# OVN-specific (run on OVN central + compute)
sudo ovn-nbctl show
sudo ovn-sbctl show
sudo ovn-sbctl lflow-list <logical-switch>
sudo ovn-trace <logical-switch> 'inport=="<lport>" && eth.src==<mac> && ...'
sudo journalctl -u ovn-controller -n 200 --no-pager

Common findings this catches

  • Port stuck in DOWN even though VM is ACTIVE → neutron-openvswitch-agent not reporting to neutron-server (RabbitMQ issue) or bridge_mappings mismatch in agent config.
  • Floating IP works inbound, fails outbound → SNAT rule missing, often after L3 agent restart without proper failover.
  • Security group reload “succeeded” but rules not applied → OVS-fw driver hot-reload race; openvswitch-agent needs full reload.
  • Cross-AZ traffic drops → MTU mismatch between tunnel networks; jumbo frames negotiated on one node only.

When to ask a human

If the model suggests anything involving ovs-vsctl del-, ovn-nbctl destroy, or restarting OVN northd/southd, stop and verify with your network team. These are unrecoverable from an “oops” perspective.

Related prompts

Newsletter

Get weekly AI workflows for DevOps engineers

Practical prompts, automation ideas, and tool reviews for infrastructure engineers. One email per week. No spam.