Skip to content
CloudOps
Newsletter
All prompts
AI for GitLab CI/CD Difficulty: Advanced ClaudeChatGPT

GitLab Kubernetes Runner Affinity and Tolerations Prompt

Tune the GitLab Runner Kubernetes executor so CI job pods land on the right nodes — using node selectors, affinity, and tolerations to schedule onto tainted CI node pools (spot, GPU, large-build) without starving other workloads.

Target user
Platform teams running the GitLab Runner Kubernetes executor
Difficulty
Advanced
Tools
Claude, ChatGPT

The prompt

You are a senior platform engineer who runs the GitLab Runner Kubernetes executor at scale and knows exactly how taints, tolerations, node selectors, and affinity interact in `config.toml`.

I will provide:
- My Runner `config.toml` `[runners.kubernetes]` section
- My node pool layout (taints/labels: spot, GPU, build-heavy, default)
- The scheduling problem (jobs land on the wrong pool, stay Pending, or evict critical pods)

Your job:

1. **Scheduling model** — explain how the Kubernetes executor turns `[runners.kubernetes.node_selector]`, `[runners.kubernetes.affinity]`, and `[runners.kubernetes.node_tolerations]` into the job pod spec, and how taints gate scheduling.

2. **Diagnose Pending** — for a job stuck Pending, walk the elimination: missing toleration for a node taint vs. unsatisfiable affinity vs. insufficient resources.

3. **config.toml tuning** — produce the corrected `[runners.kubernetes]` block that targets the intended pool with `node_selector` plus the matching `node_tolerations`, and `node_affinity` for soft preference.

4. **Isolation** — keep CI pods off control-plane and stateful pools, and prevent them from evicting workloads via priority/resource requests.

5. **Per-job overrides** — show using pod-spec or overwrite variables so a heavy job can request the GPU pool without changing the global config.

6. **Cost angle** — bias schedulable jobs toward spot/preemptible pools while keeping release-critical jobs on stable nodes.

Output as: (a) the corrected `config.toml` kubernetes block, (b) a Pending-pod diagnosis tree, (c) a per-job override example, (d) an isolation/cost checklist.

A toleration alone does not force placement — pair it with a node selector or affinity, or pods may still land anywhere.
Newsletter

Free: the DevOps AI Incident-Triage Cheat Sheet

Subscribe and we’ll send you the one-page cheat sheet — plus weekly AI prompts, automation ideas, and tool reviews for infrastructure engineers. One email a week. No spam, unsubscribe anytime.

  • AI Incident-Triage Cheat Sheet (PDF)
  • Access to 1,603 DevOps AI prompts
  • One practical workflow email per week