Kubernetes Dynamic Resource Allocation (DRA) Design Prompt
Adopt Dynamic Resource Allocation for GPUs/accelerators/specialized hardware — model ResourceClaims, DeviceClasses, and ResourceClaimTemplates, and migrate off the legacy device-plugin model without breaking scheduling.
- Target user
- Platform engineers running GPU/accelerator workloads on modern Kubernetes
- Difficulty
- Advanced
- Tools
- Claude, ChatGPT
The prompt
You are a Kubernetes platform engineer who runs accelerator workloads (GPUs, NICs, FPGAs) and is adopting Dynamic Resource Allocation (DRA — `resource.k8s.io`, GA in 1.34) to replace the rigid `nvidia.com/gpu` device-plugin counting model. You know where DRA helps (sharing, partitioning, attribute-based selection) and where the old model is still fine. I will provide: - The hardware (GPU model, MIG/time-slicing needs, NICs/RDMA, other accelerators) and its DRA driver availability - Current allocation approach (device plugin + extended resources, node labels/taints) - Workload needs (whole-device, fractional/MIG, topology-aware, multiple devices per pod) - Target Kubernetes version and whether the DRA feature gates / APIs are enabled Your job: 1. **DRA vs device plugin decision** — be honest: if workloads just need "1 whole GPU," the device plugin may be simpler. DRA earns its keep with sharing, MIG partitioning, attribute/constraint-based selection, and topology alignment. State which applies. 2. **Model the API objects** — define `DeviceClass` (the kind of device + selectors), `ResourceClaim` vs `ResourceClaimTemplate` (per-pod vs shared claim lifecycle), and how pods reference them via `spec.resourceClaims`. Explain claim allocation modes and `allocationMode`. 3. **Selectors + constraints** — use CEL device selectors on attributes (memory size, MIG profile, driver version) and `constraints` (e.g., all devices from the same NUMA node / same GPU) so the scheduler picks correctly. 4. **Driver wiring** — install the DRA driver (e.g., NVIDIA DRA driver) as the kubelet plugin + controller, confirm `ResourceSlice` publication per node, and verify with the scheduler's DRA plugin enabled. 5. **Sharing + partitioning** — model time-slicing / MIG / multi-pod sharing of one device through the claim, and the isolation caveats of each. 6. **Migration** — run DRA alongside device-plugin extended resources during cutover, move one workload class at a time, and keep a rollback to the extended-resource path. 7. **Observe + test** — watch `ResourceSlice`/`ResourceClaim` status and unschedulable reasons; provide fixtures for whole-device, fractional, and multi-device claims. Output: the DeviceClass + ResourceClaimTemplate + pod manifests, the CEL selector/constraint examples, the driver install + verification steps, the device-plugin→DRA migration plan with rollback, and the test fixtures. Bias toward: DRA only where it earns it, attribute-based selection over node labels, one-workload-at-a-time migration with rollback.