Kubernetes Kueue Gang Scheduling & Job Queueing Prompt
Design Kueue ClusterQueues, LocalQueues, and ResourceFlavors to queue batch/ML jobs with gang (all-or-nothing) scheduling on a shared cluster.
- Target user
- Platform engineers running batch or ML workloads
- Difficulty
- Advanced
- Tools
- Claude, ChatGPT
The prompt
You are a senior platform engineer who runs Kueue on a shared multi-team cluster for batch and ML jobs. I want a queueing design that prevents partial scheduling and noisy-neighbor quota theft. I will provide: - The teams sharing the cluster and their fair-share expectations - Workload types (training jobs needing N pods at once, single-pod batch, etc.) - Available node pools (GPU, spot, on-demand) and their capacity Your job: 1. **Map the hierarchy**: explain how ResourceFlavor (node characteristics) → ClusterQueue (quota + cohort) → LocalQueue (namespace-scoped entry point) fit together, and which object each team interacts with. 2. **Design ResourceFlavors** for each distinct node pool, including the `nodeLabels` and tolerations Kueue injects so admitted workloads land on the right nodes. 3. **Design ClusterQueues** with nominal quota per team, and group them into a `cohort` so unused quota can be borrowed — specify `borrowingLimit` to cap how much one team can take. 4. **Enforce gang scheduling**: explain how Kueue admits a Job's pods all-or-nothing via workload-level admission, so a 16-pod training job never gets 12 pods scheduled and 4 stuck Pending forever holding GPUs. 5. **Add preemption** rules: within-cohort reclaim and priority-based preemption, with the `withinClusterQueue` and `reclaimWithinCohort` policies spelled out. 6. **Show the LocalQueue** a team submits to and the one-line `kueue.x-k8s.io/queue-name` label a Job needs. 7. **Mark DESTRUCTIVE** any preemption policy that can evict running jobs, and warn where borrowing can starve the lending team. Output format: an architecture diagram in text, then the ResourceFlavor / ClusterQueue / LocalQueue YAML, then a sample queued Job. Give me `kubectl` commands to inspect admission, but do not apply anything. --- Teams & fair share: [DESCRIBE] Workload types: [DESCRIBE] Node pools: [DESCRIBE]
Why this prompt works
Shared clusters that run batch or ML work hit two problems the default scheduler cannot solve: partial scheduling (a multi-pod training job grabs half its GPUs, blocks, and nobody can make progress) and quota theft (one team’s flood of jobs starves everyone else). Kueue exists for exactly this, but its object model — ResourceFlavor, ClusterQueue, cohort, LocalQueue — is layered enough that people wire it up wrong and get silent quota bypass. This prompt makes the assistant walk the hierarchy top to bottom and tie each object to who actually touches it.
The gang-scheduling requirement is the load-bearing part. Asking explicitly for all-or-nothing workload admission forces the design to account for the case where a big job can never fit, which is the most common way Kueue deployments quietly wedge. Combined with the borrowing-and-preemption section, it pushes the model to spell out the trade-off between fair-share borrowing and the disruption of reclaiming that capacity later — the thing that turns a “fair” cluster into a source of mysterious evictions.
Because the prompt insists on inspection commands rather than applying anything, you get a reviewable design plus the labels and kubectl checks to verify admission yourself. For the autoscaling side of the same problem, see the event-driven autoscaling guides and the wider prompt library.
Related prompts
-
Kubernetes GPU & Device Plugin Debug Prompt
Diagnose GPU scheduling — NVIDIA device plugin, MIG, scheduling, image/driver mismatch, pod stuck without GPU.
-
Kubernetes Job & CronJob Debug Prompt
Diagnose Jobs and CronJobs — pods not completing, backoff limit, history limit, time zone confusion, concurrency policy, missed runs, stuck cleanup.
-
Kubernetes KEDA Event-Driven Autoscaling Prompt
Scale Kubernetes workloads on real event sources — queue depth, Kafka lag, cron, Prometheus queries — with KEDA, including scale-to-zero, ScaledObject/ScaledJob design, and avoiding flapping or stuck consumers.