GitLab Runners Explained: Autoscaling and the Kubernetes

You can write the most elegant .gitlab-ci.yml in the world, but nothing happens until a runner picks up the job. Runners are the part of GitLab CI that people understand least and pay for most. Get them wrong and you either wait in a queue all day or get a cloud bill that makes finance schedule a meeting.

I’ve run runner fleets from a single shared VM up to autoscaling clusters serving thousands of jobs an hour. Here’s the mental model and the setup that’s held up.

What a runner actually is

A runner is an agent that polls GitLab, claims jobs, and runs them with an executor. The executor is the part that matters most:

shell — runs directly on the host. Simple, but jobs share state and can poison each other. Avoid except for trivial cases.
docker — each job runs in a fresh container. The default for most teams. Clean isolation, easy to reason about.
kubernetes — each job becomes a pod in a cluster. Scales elastically and reuses infrastructure you may already have.
docker-autoscaler — spins up cloud VMs on demand and tears them down when idle.

Most teams should start with the docker executor and move to kubernetes or docker-autoscaler when queue times or cost force the issue.

Tags are how you route work

Runners advertise tags. Jobs request them. This is your routing layer, and it’s worth designing deliberately:

build:
  tags:
    - linux
    - docker
  script: make build

gpu-train:
  tags:
    - gpu
  script: ./train.sh

Tag by capability, not by name. gpu, arm64, high-memory, production-network — these describe what a runner can do. When you tag by hostname, every infra change becomes a pipeline change. Capability tags decouple the two.

The Kubernetes executor

If you already run Kubernetes, the Kubernetes executor is compelling: each job becomes a pod, the cluster autoscaler handles capacity, and idle runners cost nothing. Install via the GitLab Runner Helm chart and configure the executor in values.yaml:

runners:
  config: |
    [[runners]]
      [runners.kubernetes]
        namespace = "gitlab-runner"
        cpu_request = "500m"
        memory_request = "1Gi"
        cpu_limit = "2"
        memory_limit = "4Gi"
        poll_timeout = 600
        [runners.kubernetes.pod_labels]
          "ci-job" = "true"

A few hard-won lessons:

Set requests and limits. Without them one greedy job can starve a node and stall the whole cluster. Requests drive the autoscaler; limits protect your neighbors.
Cache needs real storage. Pod filesystems are ephemeral. Use an S3-compatible cache backend so caches survive pod death, or your “cache” does nothing.
Give pods time to schedule. Bump poll_timeout so jobs don’t fail while the cluster autoscaler is still bringing up a node.

Autoscaling without overpaying

The whole point of autoscaling is paying for capacity only when you have work. The two failure modes are equally bad: scaling too slowly (queues) or too aggressively (idle VMs burning money).

For the docker-autoscaler, the key knobs are idle count and idle time:

[[runners.autoscaler.policy]]
  idle_count = 1
  idle_time = "20m0s"
  periods = ["* 8-18 * * mon-fri"]

Keep one warm runner during business hours so the first job of the morning doesn’t wait on a cold VM boot, and let it scale to zero overnight. Match the policy to your team’s actual rhythm. I’ve cut runner spend by more than half just by aligning the idle policy with when people actually push code.

Spot instances for the win

CI jobs are interruptible by nature — if a job dies, GitLab retries it. That makes CI a perfect fit for spot/preemptible instances at a fraction of on-demand cost. Set a sane retry policy so an interrupted job re-runs cleanly:

default:
  retry:
    max: 2
    when:
      - runner_system_failure
      - stuck_or_timeout_failure

The savings here are large and the risk is low, precisely because retries are cheap.

Watch the right metrics

Runners expose Prometheus metrics. The three I actually watch:

Job queue duration — how long jobs wait before a runner claims them. This is the number your developers feel.
Concurrent jobs vs. capacity — are you saturated or idle?
Job failure rate by runner — a single bad node can quietly fail a slice of your jobs.

When queue duration creeps up, you need more capacity or faster scaling. When it’s near zero and you’re paying for idle runners, tighten the idle policy.

Where AI helps with runner config

Runner config files are dense and easy to get subtly wrong. Pasting a config.toml or Helm values.yaml into a model and asking “what happens to a job that needs 8GB when my limit is 4GB, and is my autoscaler policy going to leave VMs idle overnight?” surfaces problems before they cost you. I keep GitLab CI prompts for runner reviews, and run infra-config changes through our Code Review tool before applying them to a live fleet.

The short version

Start with the docker executor. Tag by capability. Move to Kubernetes or autoscaling when queues or cost demand it. Always set resource requests and limits, back your cache with real storage, lean on spot instances with retries, and watch queue duration as your north star.

Get the runners right and the pipeline you wrote so carefully actually runs — fast, isolated, and at a cost you can defend.

AI suggestions for runner configuration are assistive, not authoritative. Always validate executor and autoscaler changes in a staging fleet before rolling them out.

GitLab Runners Explained: Autoscaling and the Kubernetes Executor