AI for OpenStack Difficulty: Advanced ClaudeChatGPT

Nova NUMA, CPU Pinning & Huge-Page Flavor Design Prompt

Design Nova flavors and compute configuration for NFV/high-performance workloads — CPU pinning, NUMA topology, huge pages, and emulator-thread isolation — without stranding capacity.

Target user: OpenStack operators running NFV, telco, or latency-sensitive compute
Difficulty: Advanced
Tools: Claude, ChatGPT

The prompt

You are a senior OpenStack compute architect who has built NFV-grade Nova flavors for telco clouds where a single mis-scheduled vCPU causes packet loss.

I will provide:
- Compute node hardware (sockets, cores/socket, hyperthreading, RAM per NUMA node)
- Current `nova.conf` (`cpu_dedicated_set`, `cpu_shared_set`, `reserved_host_*`)
- Workload profiles (VNFs, DPDK, latency SLOs, packet rates)
- Existing flavors and `hw:` extra-specs
- Symptoms (noisy-neighbor jitter, stranded cores, failed scheduling)

Your job:

1. **Map the hardware** — draw the NUMA topology: which cores belong to which socket, sibling-thread pairs, and where NICs/PCI devices are attached. Identify the NIC-local NUMA node for DPDK.

2. **Pinned vs shared partitioning** — recommend `cpu_dedicated_set` and `cpu_shared_set` split, host-OS reserved cores (kernel `isolcpus`/`nohz_full`/`rcu_nocbs`), and why you never let pinned guests share host-OS cores.

3. **Flavor extra-specs** — author exact `hw:cpu_policy=dedicated`, `hw:cpu_thread_policy` (prefer/isolate/require), `hw:numa_nodes`, `hw:mem_page_size=1GB`, and `hw:emulator_threads_policy=isolate`. Explain each choice and the capacity tax it imposes.

4. **Huge pages** — kernel cmdline `default_hugepagesz=1G hugepagesz=1G hugepages=N`, per-NUMA reservation, and how mismatched page sizes cause "no valid host" silently.

5. **Scheduling correctness** — confirm `NUMATopologyFilter` and `PciPassthroughFilter` are enabled; show how to read the resource-provider inventory to prove pinned capacity isn't stranded.

6. **Anti-patterns** — mixing pinned and floating guests on one host without `cpu_shared_set`; thread-policy `require` on a hyperthreading-disabled host; forgetting emulator-thread isolation so qemu steals a guest core.

7. **Validation** — `virsh vcpupin`/`numactl` checks on a booted guest, a noisy-neighbor jitter test, and a capacity-stranding audit.

Output as: (a) annotated NUMA topology diagram (ASCII), (b) `nova.conf` and kernel cmdline diffs, (c) 2-3 ready-to-create flavor definitions with `openstack flavor set` commands, (d) a pre-flight checklist, (e) a rollback plan if guests fail to schedule.

Bias toward: deterministic placement over packing density, every extra-spec justified, explicit capacity accounting.

Free: the DevOps AI Incident-Triage Cheat Sheet