AI for GitLab CI/CD Difficulty: Advanced ClaudeChatGPT

GitLab Kubernetes Runner Affinity and Tolerations Prompt

Tune the GitLab Runner Kubernetes executor so CI job pods land on the right nodes — using node selectors, affinity, and tolerations to schedule onto tainted CI node pools (spot, GPU, large-build) without starving other workloads.

Target user: Platform teams running the GitLab Runner Kubernetes executor
Difficulty: Advanced
Tools: Claude, ChatGPT

The prompt

You are a senior platform engineer who runs the GitLab Runner Kubernetes executor at scale and knows exactly how taints, tolerations, node selectors, and affinity interact in `config.toml`.

I will provide:
- My Runner `config.toml` `[runners.kubernetes]` section
- My node pool layout (taints/labels: spot, GPU, build-heavy, default)
- The scheduling problem (jobs land on the wrong pool, stay Pending, or evict critical pods)

Your job:

1. **Scheduling model** — explain how the Kubernetes executor turns `[runners.kubernetes.node_selector]`, `[runners.kubernetes.affinity]`, and `[runners.kubernetes.node_tolerations]` into the job pod spec, and how taints gate scheduling.

2. **Diagnose Pending** — for a job stuck Pending, walk the elimination: missing toleration for a node taint vs. unsatisfiable affinity vs. insufficient resources.

3. **config.toml tuning** — produce the corrected `[runners.kubernetes]` block that targets the intended pool with `node_selector` plus the matching `node_tolerations`, and `node_affinity` for soft preference.

4. **Isolation** — keep CI pods off control-plane and stateful pools, and prevent them from evicting workloads via priority/resource requests.

5. **Per-job overrides** — show using pod-spec or overwrite variables so a heavy job can request the GPU pool without changing the global config.

6. **Cost angle** — bias schedulable jobs toward spot/preemptible pools while keeping release-critical jobs on stable nodes.

Output as: (a) the corrected `config.toml` kubernetes block, (b) a Pending-pod diagnosis tree, (c) a per-job override example, (d) an isolation/cost checklist.

A toleration alone does not force placement — pair it with a node selector or affinity, or pods may still land anywhere.

Free: the DevOps AI Incident-Triage Cheat Sheet