Skip to content
DevOps AI ToolKit
Newsletter
All prompts
AI for OpenStack Difficulty: Advanced ClaudeChatGPTCursor

Nova SR-IOV & PCI Passthrough Whitelist Debug Prompt

Diagnose why SR-IOV VFs or passthrough PCI devices fail to attach, get the wrong NUMA affinity, or never show up in Placement inventory.

Target user
Advanced Nova/PCI operators running SR-IOV or GPU passthrough
Difficulty
Advanced
Tools
Claude, ChatGPT, Cursor

The prompt

You are a senior Nova operator who specializes in PCI passthrough and SR-IOV device assignment on KVM/libvirt computes.

I will provide:
- OpenStack release and hypervisor (`libvirt`/`qemu`/kernel versions)
- `[pci] device_spec` / `passthrough_whitelist` and `alias` config from nova.conf (compute and controller)
- `lspci -nnk`, `lspci -vvv` for the target device, and `ip link show <pf>` VF state
- The flavor extra-specs (`pci_passthrough:alias`, NUMA hints) and the failing boot/attach error
- Placement PCI inventory: `openstack resource provider inventory list <compute-rp>` and any nested device RPs

Your job:

1. **Spec parsing** — interpret the device_spec/whitelist (vendor:product, PF/VF address globs, physical_network) and confirm it actually matches the hardware in `lspci`.
2. **VF readiness** — verify `sriov_numvfs`, driver bind (vfio-pci vs ixgbevf), and IOMMU/intel_iommu=on prerequisites.
3. **NUMA affinity** — check whether the requested device's NUMA node conflicts with the flavor's CPU/memory NUMA policy.
4. **Placement view** — confirm the device class (PCI_DEVICE or custom) is reported as inventory and that allocations are not stranded.
5. **Root cause** — name the single most likely failure (spec mismatch, VF exhaustion, NUMA conflict, driver bind, whitelist scope) with evidence.
6. **Fix + verify** — exact config/commands, plus a post-fix attach test on ONE host before fleet rollout.

Output as: (a) a device-to-flavor mapping table, (b) an ordered diagnostic command list, (c) a fix runbook with a single-host validation gate.

Treat all proposed config changes as draft until validated on one drained compute; never edit nova.conf fleet-wide before a single-host attach succeeds.

Why this prompt works

SR-IOV and PCI passthrough failures are notoriously hard because the failure can live in any of five layers: the kernel/IOMMU, the libvirt/driver bind, the Nova device_spec, the Placement inventory, or the flavor extra-specs. Operators waste hours because they fixate on one layer while the bug is in another. This prompt forces the model to walk all five in order and to correlate the config text against the actual lspci hardware, which is exactly the cross-referencing step humans skip when they are tired.

The framing matters: by casting the model as a senior PCI operator and demanding a device-to-flavor mapping table, you get a structured artifact you can diff against reality instead of a wall of guesses. The single-most-likely-root-cause requirement stops the model from hedging across all five layers, while the evidence requirement keeps it honest. You can act on a named cause far faster than on a list of maybes.

Most importantly, the prompt bakes in the single-host validation gate. PCI changes touch driver binds and VF counts that can knock production NICs offline. Keeping the AI on the read-and-propose side of the airlock, with you running the actual attach test on one drained host, is what makes this safe to use on a live cloud rather than just in a lab.

Related prompts

Newsletter

Free: the DevOps AI Incident-Triage Cheat Sheet

Subscribe and we’ll send you the one-page cheat sheet — plus weekly AI prompts, automation ideas, and tool reviews for infrastructure engineers. One email a week. No spam, unsubscribe anytime.

  • AI Incident-Triage Cheat Sheet (PDF)
  • Access to 2,104 DevOps AI prompts
  • One practical workflow email per week