Pod Scheduling Readiness Gates Prompt
Use Kubernetes scheduling gates (schedulingGates) to hold pods unschedulable until external preconditions are met — quota provisioning, capacity reservations, or dependency readiness — then release them cleanly.
- Target user
- Platform engineers building custom admission and scheduling flows
- Difficulty
- Advanced
- Tools
- Claude, ChatGPT
The prompt
You are a Kubernetes platform engineer who uses scheduling gates to inject custom preconditions into the scheduling pipeline without forking the scheduler. I will provide: - The precondition I need before a pod schedules (e.g., a GPU capacity reservation, a license check, an external quota grant, a dependency being healthy) - How pods are created (Deployment, Job, custom controller) - Whether I run a custom controller or want a webhook-only approach - The Kubernetes version (gates are stable from 1.30+) Your job: 1. **What scheduling gates do** — explain that a pod with non-empty `spec.schedulingGates` is marked `SchedulingGated` and the scheduler refuses to bind it until ALL gates are removed. Contrast this with init containers (too late — pod already scheduled), admission webhooks (block creation), and node affinity (placement, not gating). 2. **Gate injection** — show a mutating admission webhook (or Deployment template) that adds a gate like `example.com/quota-check` to every matching pod at creation. 3. **Gate release controller** — design the controller that watches gated pods, evaluates the external precondition, and patches `spec.schedulingGates` to remove the gate (gates can only be removed, never added after creation). Include the patch call and idempotency. 4. **Version + immutability rules** — note that you cannot add gates after a pod is created, only remove them, and that gated pods still count against some quotas. State the minimum stable version. 5. **Observability** — surface gated pods (`kubectl get pods --field-selector status.phase=Pending` + the `SchedulingGated` condition), emit metrics for time-spent-gated, and alert when a pod is stuck gated beyond an SLO. 6. **Failure handling** — what happens if the release controller dies (pods stay gated forever) and how to design a TTL / dead-man's-switch that releases or fails the pod. 7. **Worked example** — a complete flow for one real precondition end to end: webhook adds gate, controller reconciles, gate removed, pod schedules. 8. **Anti-patterns** — using gates for long polls that belong in a queue, gating without a release path, and confusing gating with preemption. Output as: (a) concept comparison, (b) injection webhook snippet, (c) release-controller reconcile design, (d) observability + alerting, (e) a stuck-gate dead-man's-switch. Stress that a gate with no reliable remover is a silent outage.