Skip to content
CloudOps
Newsletter
All prompts
AI for GitLab CI/CD Difficulty: Advanced ClaudeChatGPT

GitLab Runner Kubernetes Executor Pod Spec Tuning Prompt

Tune the Kubernetes executor's pod spec — resource requests/limits, node selectors, ephemeral storage, service-account, and pod-level overrides — so CI jobs schedule reliably without starving the cluster.

Target user
Platform engineers running GitLab Runner on Kubernetes
Difficulty
Advanced
Tools
Claude, ChatGPT

The prompt

You are an SRE who operates GitLab Runner on Kubernetes at scale and has debugged every flavor of "job pod stuck Pending" and "runner OOMKilled."

I will provide:
- My runner `config.toml` `[runners.kubernetes]` section (or Helm `values.yaml`)
- Cluster facts: node sizes, taints, available namespaces, storage classes
- Symptoms (pods Pending, jobs slow to start, ephemeral-storage evictions, noisy-neighbor)

Your job:

1. **Map the pod anatomy** — build container, helper container, and `services:` containers — and explain that resource requests apply per container, so a 3-service job multiplies your footprint.

2. **Right-size requests/limits** — recommend `cpu_request`, `cpu_limit`, `memory_request`, `memory_limit`, `helper_cpu/memory`, and `service_cpu/memory` based on my job profiles. Explain why setting only limits (no requests) wrecks the scheduler.

3. **Placement** — `node_selector`, `node_tolerations`, and affinity to pin CI to a dedicated node pool, plus per-job overrides via `[[runners.kubernetes.node_selector_overwrite_allowed]]` and `KUBERNETES_NODE_SELECTOR_*` variables.

4. **Ephemeral storage** — set `ephemeral_storage_request/limit`, mount an `emptyDir` or PVC for large caches/artifacts, and avoid the "node-pressure eviction mid-build" trap.

5. **Pod spec escape hatch** — use `[[runners.kubernetes.pod_spec]]` (with `pod_spec` feature flag) to set fields the executor does not expose directly (e.g. `runtimeClassName`, `priorityClassName`, topology spread).

6. **Security** — dedicated `service_account`, `automount_service_account_token: false` unless needed, run-as-non-root, drop capabilities, and namespace isolation per project.

7. **Concurrency** — relate `concurrent`, `[runners].limit`, and `poll_timeout` to how many job pods can land per node before scheduling stalls.

Output: (a) a corrected `[runners.kubernetes]` block (or Helm values) fully commented, (b) a `pod_spec` override example for one advanced field, (c) a kubectl/diagnostic checklist for a Pending job pod, (d) a capacity worksheet: max concurrent jobs given my node pool.

Bias toward: requests always set, dedicated node pool, least privilege, and predictable scheduling over packing density.
Newsletter

Free: the DevOps AI Incident-Triage Cheat Sheet

Subscribe and we’ll send you the one-page cheat sheet — plus weekly AI prompts, automation ideas, and tool reviews for infrastructure engineers. One email a week. No spam, unsubscribe anytime.

  • AI Incident-Triage Cheat Sheet (PDF)
  • Access to 1,603 DevOps AI prompts
  • One practical workflow email per week