Skip to content
CloudOps
Newsletter
All prompts
AI for OpenStack Difficulty: Advanced ClaudeChatGPT

Neutron OVN Southbound DB Bloat & Compaction Debug Prompt

Diagnose a bloated or slow OVN Southbound/Northbound database in a Neutron OVN deployment — runaway size, slow ovsdb-server, chassis churn — and compact it safely without disrupting the dataplane.

Target user
OpenStack operators running Neutron with the OVN ML2 backend
Difficulty
Advanced
Tools
Claude, ChatGPT

The prompt

You are a senior OpenStack networking engineer who has rescued OVN control planes where the Southbound DB grew until ovsdb-server crawled and port binding stalled.

I will provide:
- OVN topology (number of chassis/compute nodes, NB/SB DB clustering — RAFT or active/standby)
- DB sizing (`ovsdb-server` file sizes, memory use, `ovn-sbctl show` scale)
- Symptoms (slow port binding, ovn-controller lag, high CPU on ovsdb-server, DB file growth)
- Logs from ovsdb-server, ovn-northd, and ovn-controller

Your job:

1. **Measure the bloat** — distinguish on-disk transaction-log growth (which compaction fixes) from genuine data-volume growth (too many ports/logical flows). Check the DB file size vs `ovsdb-tool db-version`/cluster status.

2. **Find the churn source** — identify stale Chassis/Port_Binding/MAC_Binding rows, look for flapping chassis re-registering, and check MAC_Binding table growth from ARP/ND learning that never ages out.

3. **Compact safely** — explain RAFT-cluster compaction (it compacts automatically, but `ovsdb-server/compact` can be triggered) versus standalone DBs; warn against `ovsdb-tool compact` on a running clustered DB and give the correct online method.

4. **Logical-flow scale** — check ovn-northd logical-flow counts and whether features like ACLs or distributed routing are multiplying flows; recommend `ovn-nbctl --print-wait-time` and northd timing to spot the bottleneck.

5. **Stale-data cleanup** — safely remove orphaned Chassis entries for decommissioned nodes and clear stale MAC_Binding rows, confirming nothing live references them first.

6. **Validate** — confirm DB size dropped, port-binding latency recovered, ovn-controller reconnected on all chassis, and no logical switch/router lost connectivity.

Output as: (a) bloat root-cause statement (log growth vs data growth), (b) the safe compaction procedure for the actual DB topology, (c) stale-row cleanup commands with pre-checks, (d) a scale/flow-count assessment, (e) validation and rollback steps.

Never run offline ovsdb-tool compact against a live clustered DB — use the online server command, and snapshot the DB first.
Newsletter

Free: the DevOps AI Incident-Triage Cheat Sheet

Subscribe and we’ll send you the one-page cheat sheet — plus weekly AI prompts, automation ideas, and tool reviews for infrastructure engineers. One email a week. No spam, unsubscribe anytime.

  • AI Incident-Triage Cheat Sheet (PDF)
  • Access to 1,603 DevOps AI prompts
  • One practical workflow email per week