Tuning the GitLab Kubernetes Executor for Fast, Reliable

The first time I moved a team off static shell runners onto the GitLab Kubernetes executor, the pitch was easy: elastic capacity, clean isolation, no more “works on runner-3 but not runner-7.” All true. What nobody warned me about was the month I’d spend afterward chasing slow pod startup, OOM-killed builds, and a cluster that quietly throttled every job because I’d left the resource requests at their defaults.

The Kubernetes executor is the right architecture for most teams. But the defaults are tuned for “it boots,” not “it’s fast and reliable under load.” Here’s how I tune it.

Understand what a job pod actually is

When the Kubernetes executor picks up a job, it doesn’t run one container. It schedules a pod with several containers: a build container (your image), a helper container (handles git clone, cache, artifacts), and one container per services: entry. Every one of those needs CPU and memory, and every one counts against your node’s allocatable resources.

This matters because the single most common mistake is tuning only the build container and ignoring the helper and services. A Postgres service container with no limit can happily eat the memory your build needed.

Set resource requests and limits explicitly

The defaults give every container a tiny request and no limit, which means the scheduler packs pods too tightly and then the kernel OOM-kills your builds when they actually need memory. Set them per container in the runner config:

[runners.kubernetes]
  cpu_request = "1"
  cpu_limit = "2"
  memory_request = "2Gi"
  memory_limit = "4Gi"

  helper_cpu_request = "100m"
  helper_memory_request = "128Mi"
  helper_memory_limit = "256Mi"

  service_cpu_request = "200m"
  service_memory_request = "256Mi"
  service_memory_limit = "512Mi"

Requests drive scheduling and guarantee capacity; limits cap the blast radius of a runaway job. My rule of thumb: set the request to what a normal build needs and the limit to roughly 2x, so a heavy build can burst without one pathological job starving the node.

You can let projects override these with CI variables, which is the escape hatch for that one memory-hungry integration suite:

variables:
  KUBERNETES_MEMORY_REQUEST: "4Gi"
  KUBERNETES_MEMORY_LIMIT: "8Gi"

Kill cold-start latency with image pull policy and pre-pull

Cold starts are where Kubernetes executors feel sluggish compared to shell runners. The job pod can’t start until the node has pulled your build image. If that’s a 2 GB image being pulled fresh on every job, you’ve added a minute to every pipeline before a single line of your script runs.

Two fixes. First, set a sane pull policy so cached images on the node are reused:

[runners.kubernetes]
  pull_policy = ["if-not-present"]
  allowed_pull_policies = ["always", "if-not-present"]

Second, keep a warm pool of nodes and pre-pull your common base images with a DaemonSet, or just keep your CI images lean. A 300 MB image pulls in seconds; a 2 GB image is a tax you pay thousands of times a day.

Get concurrency right at three levels

Concurrency in this setup is confusing because there are three knobs and they interact:

concurrent (global, top of config.toml) — total jobs across all runners on this manager.
limit (per [[runners]]) — max jobs this specific runner will run.
Cluster capacity — how many pods your nodes can actually schedule.

The failure mode is setting concurrent = 50 on a cluster that can fit 12 job pods. GitLab dutifully tries to schedule 50, 38 pods sit Pending, and your pipeline queue looks like it’s running but is really stuck waiting for nodes. Size concurrent to what the cluster can hold, and let the cluster autoscaler add nodes for genuine bursts rather than overcommitting the manager.

concurrent = 12
check_interval = 3

[[runners]]
  limit = 12
  [runners.kubernetes]
    poll_timeout = 600

Bump poll_timeout if your autoscaler needs time to bring up a node; otherwise jobs fail with “timed out waiting for pod to run” while the node is still booting.

Use node selectors and tolerations to isolate CI

Don’t let CI pods land on the nodes running your production workloads. Carve out a dedicated node pool, taint it, and pin runner pods to it:

[runners.kubernetes.node_selector]
  "workload" = "ci"

[runners.kubernetes.node_tolerations]
  "dedicated=ci" = "NoSchedule"

Now CI bursts scale a CI-only node pool, and a runaway build can’t compete with your API for memory. This one change has saved me more 3am pages than any amount of resource tuning.

Speed up the git clone

The helper container clones your repo on every job. For a big monorepo, that’s slow and wasteful. Tune the fetch:

variables:
  GIT_DEPTH: "10"
  GIT_STRATEGY: fetch
  GIT_CLEAN_FLAGS: -ffdx

GIT_STRATEGY: fetch reuses the cached working copy from a persistent volume instead of cloning fresh, and a shallow GIT_DEPTH avoids dragging down years of history you don’t need to build a commit.

Watch the right signals

Once it’s tuned, keep an eye on the metrics that tell you it’s drifting:

Pod pending time — climbing means you’re cluster-capacity bound; add nodes or lower concurrent.
OOMKilled count on CI pods — your limits are too low or one job needs an override.
Image pull duration — your images are too fat or your pull policy is wrong.
Job queue duration in GitLab — the user-facing symptom that something upstream is constrained.

I review these weekly. Tuning the Kubernetes executor isn’t a one-time event; it drifts as your image sizes, test suites, and team size change.

Where to go from here

The Kubernetes executor rewards a little upfront tuning with genuinely elastic, isolated CI. Set explicit resources on every container, size concurrency to real cluster capacity, isolate CI onto its own node pool, and keep your images lean. Do that and the “slow Kubernetes runner” reputation never materializes.

If you want more on runner architecture and pipeline design, browse our GitLab CI/CD guides — and if you’re reviewing a teammate’s runner config or pipeline changes, our AI code review assistant is handy for catching resource and security footguns before they merge.

Runner tuning is workload-specific. Validate these settings against your own cluster capacity and job profiles before rolling them out broadly.

Tuning the GitLab Kubernetes Executor for Fast, Reliable Runners