Tuning the GitLab Kubernetes Executor for Fast, Reliable Runners
The Kubernetes executor is the right call for elastic CI, but the defaults will burn you. Here's how I tune resources, concurrency and pod overhead for speed.
- #gitlab
- #cicd
- #kubernetes
- #runners
- #performance
- #devops
The first time I moved a team off static shell runners onto the GitLab Kubernetes executor, the pitch was easy: elastic capacity, clean isolation, no more “works on runner-3 but not runner-7.” All true. What nobody warned me about was the month I’d spend afterward chasing slow pod startup, OOM-killed builds, and a cluster that quietly throttled every job because I’d left the resource requests at their defaults.
The Kubernetes executor is the right architecture for most teams. But the defaults are tuned for “it boots,” not “it’s fast and reliable under load.” Here’s how I tune it.
Understand what a job pod actually is
When the Kubernetes executor picks up a job, it doesn’t run one container. It schedules a pod with several containers: a build container (your image), a helper container (handles git clone, cache, artifacts), and one container per services: entry. Every one of those needs CPU and memory, and every one counts against your node’s allocatable resources.
This matters because the single most common mistake is tuning only the build container and ignoring the helper and services. A Postgres service container with no limit can happily eat the memory your build needed.
Set resource requests and limits explicitly
The defaults give every container a tiny request and no limit, which means the scheduler packs pods too tightly and then the kernel OOM-kills your builds when they actually need memory. Set them per container in the runner config:
[runners.kubernetes]
cpu_request = "1"
cpu_limit = "2"
memory_request = "2Gi"
memory_limit = "4Gi"
helper_cpu_request = "100m"
helper_memory_request = "128Mi"
helper_memory_limit = "256Mi"
service_cpu_request = "200m"
service_memory_request = "256Mi"
service_memory_limit = "512Mi"
Requests drive scheduling and guarantee capacity; limits cap the blast radius of a runaway job. My rule of thumb: set the request to what a normal build needs and the limit to roughly 2x, so a heavy build can burst without one pathological job starving the node.
You can let projects override these with CI variables, which is the escape hatch for that one memory-hungry integration suite:
variables:
KUBERNETES_MEMORY_REQUEST: "4Gi"
KUBERNETES_MEMORY_LIMIT: "8Gi"
Kill cold-start latency with image pull policy and pre-pull
Cold starts are where Kubernetes executors feel sluggish compared to shell runners. The job pod can’t start until the node has pulled your build image. If that’s a 2 GB image being pulled fresh on every job, you’ve added a minute to every pipeline before a single line of your script runs.
Two fixes. First, set a sane pull policy so cached images on the node are reused:
[runners.kubernetes]
pull_policy = ["if-not-present"]
allowed_pull_policies = ["always", "if-not-present"]
Second, keep a warm pool of nodes and pre-pull your common base images with a DaemonSet, or just keep your CI images lean. A 300 MB image pulls in seconds; a 2 GB image is a tax you pay thousands of times a day.
Get concurrency right at three levels
Concurrency in this setup is confusing because there are three knobs and they interact:
concurrent(global, top ofconfig.toml) — total jobs across all runners on this manager.limit(per[[runners]]) — max jobs this specific runner will run.- Cluster capacity — how many pods your nodes can actually schedule.
The failure mode is setting concurrent = 50 on a cluster that can fit 12 job pods. GitLab dutifully tries to schedule 50, 38 pods sit Pending, and your pipeline queue looks like it’s running but is really stuck waiting for nodes. Size concurrent to what the cluster can hold, and let the cluster autoscaler add nodes for genuine bursts rather than overcommitting the manager.
concurrent = 12
check_interval = 3
[[runners]]
limit = 12
[runners.kubernetes]
poll_timeout = 600
Bump poll_timeout if your autoscaler needs time to bring up a node; otherwise jobs fail with “timed out waiting for pod to run” while the node is still booting.
Use node selectors and tolerations to isolate CI
Don’t let CI pods land on the nodes running your production workloads. Carve out a dedicated node pool, taint it, and pin runner pods to it:
[runners.kubernetes.node_selector]
"workload" = "ci"
[runners.kubernetes.node_tolerations]
"dedicated=ci" = "NoSchedule"
Now CI bursts scale a CI-only node pool, and a runaway build can’t compete with your API for memory. This one change has saved me more 3am pages than any amount of resource tuning.
Speed up the git clone
The helper container clones your repo on every job. For a big monorepo, that’s slow and wasteful. Tune the fetch:
variables:
GIT_DEPTH: "10"
GIT_STRATEGY: fetch
GIT_CLEAN_FLAGS: -ffdx
GIT_STRATEGY: fetch reuses the cached working copy from a persistent volume instead of cloning fresh, and a shallow GIT_DEPTH avoids dragging down years of history you don’t need to build a commit.
Watch the right signals
Once it’s tuned, keep an eye on the metrics that tell you it’s drifting:
- Pod pending time — climbing means you’re cluster-capacity bound; add nodes or lower
concurrent. - OOMKilled count on CI pods — your limits are too low or one job needs an override.
- Image pull duration — your images are too fat or your pull policy is wrong.
- Job queue duration in GitLab — the user-facing symptom that something upstream is constrained.
I review these weekly. Tuning the Kubernetes executor isn’t a one-time event; it drifts as your image sizes, test suites, and team size change.
Where to go from here
The Kubernetes executor rewards a little upfront tuning with genuinely elastic, isolated CI. Set explicit resources on every container, size concurrency to real cluster capacity, isolate CI onto its own node pool, and keep your images lean. Do that and the “slow Kubernetes runner” reputation never materializes.
If you want more on runner architecture and pipeline design, browse our GitLab CI/CD guides — and if you’re reviewing a teammate’s runner config or pipeline changes, our AI code review assistant is handy for catching resource and security footguns before they merge.
Runner tuning is workload-specific. Validate these settings against your own cluster capacity and job profiles before rolling them out broadly.
Download the Free 500-Prompt DevOps AI Toolkit
500 battle-tested, copy-paste AI prompts engineered by a senior systems engineer — every one with fill-in placeholders and safety/back-out notes. Drop your email and it's yours.
- 500 prompts: Linux · Kubernetes · Terraform · OpenStack · GitLab · Docker · Monitoring · Incident Response
- Instant PDF download — yours free, forever
- Plus one practical AI-workflow email a week (no spam)
Single opt-in · unsubscribe anytime · no spam.