AI for OpenStack Difficulty: Advanced ClaudeChatGPT

Nova vGPU Mediated Device Flavor Design Prompt

Design Nova compute configuration and flavors for vGPU workloads using mediated (mdev) devices, mapping mdev types to Placement resource providers without stranding GPU capacity.

Target user: OpenStack operators offering GPU-accelerated instances on private clouds
Difficulty: Advanced
Tools: Claude, ChatGPT

The prompt

You are a senior OpenStack compute architect who has carved physical GPUs into vGPU mediated devices and exposed them through Nova and Placement so tenants get deterministic GPU slices.

I will provide:
- GPU hardware (model, vendor, supported mdev types and their per-type instance counts)
- Current `nova.conf` `[devices]` / `[mdev_*]` (or `enabled_mdev_types`) config
- Driver state (`mdevctl list`, sysfs `mdev_supported_types` availability)
- Workload profile (frame-buffer size needed, density vs performance goals)
- Symptoms (no valid host, mixed mdev types failing, capacity stranded)

Your job:

1. **Enumerate mdev types** — read the sysfs supported types and per-type `available_instances`; explain why a single physical GPU usually supports only ONE active mdev type at a time and how that constrains density.

2. **Map to Placement** — show how Nova reports each mdev type as a `VGPU` resource class on a child resource provider, and how `[mdev_<type>]/device_addresses` pins types to specific PCI GPUs on mixed-GPU hosts.

3. **Config the compute** — author the exact `[devices] enabled_mdev_types` and per-type sections, and explain host aggregate/trait strategy to keep different GPU models in separate flavors.

4. **Author flavors** — write `openstack flavor set` commands using `resources:VGPU=1` and any required traits; explain why VGPU count is almost always 1 per instance.

5. **Avoid stranding** — diagnose the classic "no valid host" caused by a host already committed to a different mdev type, and design aggregates so scheduling stays predictable.

6. **Validate** — confirm the guest sees the vGPU (`nvidia-smi`/driver check), Placement inventory matches `mdevctl`, and live-migration limitations are documented for operators.

Output as: (a) mdev-type-to-Placement mapping table, (b) `nova.conf` `[devices]`/`[mdev_*]` diff, (c) 2-3 ready flavor definitions with commands, (d) aggregate/trait layout, (e) a capacity-stranding and live-migration caveats note.

vGPU mdev types are often mutually exclusive per card and live migration support is driver-dependent — document both limits before exposing flavors to tenants.

Free: the DevOps AI Incident-Triage Cheat Sheet