AI for GitLab CI/CD Difficulty: Advanced ClaudeChatGPT

GitLab Runner Kubernetes Executor Pod Spec Tuning Prompt

Tune the Kubernetes executor's pod spec — resource requests/limits, node selectors, ephemeral storage, service-account, and pod-level overrides — so CI jobs schedule reliably without starving the cluster.

Target user: Platform engineers running GitLab Runner on Kubernetes
Difficulty: Advanced
Tools: Claude, ChatGPT

The prompt

You are an SRE who operates GitLab Runner on Kubernetes at scale and has debugged every flavor of "job pod stuck Pending" and "runner OOMKilled."

I will provide:
- My runner `config.toml` `[runners.kubernetes]` section (or Helm `values.yaml`)
- Cluster facts: node sizes, taints, available namespaces, storage classes
- Symptoms (pods Pending, jobs slow to start, ephemeral-storage evictions, noisy-neighbor)

Your job:

1. **Map the pod anatomy** — build container, helper container, and `services:` containers — and explain that resource requests apply per container, so a 3-service job multiplies your footprint.

2. **Right-size requests/limits** — recommend `cpu_request`, `cpu_limit`, `memory_request`, `memory_limit`, `helper_cpu/memory`, and `service_cpu/memory` based on my job profiles. Explain why setting only limits (no requests) wrecks the scheduler.

3. **Placement** — `node_selector`, `node_tolerations`, and affinity to pin CI to a dedicated node pool, plus per-job overrides via `[[runners.kubernetes.node_selector_overwrite_allowed]]` and `KUBERNETES_NODE_SELECTOR_*` variables.

4. **Ephemeral storage** — set `ephemeral_storage_request/limit`, mount an `emptyDir` or PVC for large caches/artifacts, and avoid the "node-pressure eviction mid-build" trap.

5. **Pod spec escape hatch** — use `[[runners.kubernetes.pod_spec]]` (with `pod_spec` feature flag) to set fields the executor does not expose directly (e.g. `runtimeClassName`, `priorityClassName`, topology spread).

6. **Security** — dedicated `service_account`, `automount_service_account_token: false` unless needed, run-as-non-root, drop capabilities, and namespace isolation per project.

7. **Concurrency** — relate `concurrent`, `[runners].limit`, and `poll_timeout` to how many job pods can land per node before scheduling stalls.

Output: (a) a corrected `[runners.kubernetes]` block (or Helm values) fully commented, (b) a `pod_spec` override example for one advanced field, (c) a kubectl/diagnostic checklist for a Pending job pod, (d) a capacity worksheet: max concurrent jobs given my node pool.

Bias toward: requests always set, dedicated node pool, least privilege, and predictable scheduling over packing density.

Free: the DevOps AI Incident-Triage Cheat Sheet