AI-Assisted GitLab Runner Tag and Resource Tuning

I noticed our CI bill creeping up the same week developers started complaining that pipelines felt slow. When I actually dug in, the two problems were the same problem. Our heaviest job, a multi-arch container build, was landing on a tiny shared runner with two cores and getting OOM-killed on every third run. Meanwhile a fleet of beefy, expensive runners sat almost completely idle, racking up cost while doing nothing but waiting for jobs that never got tagged to reach them.

Both the queue times and the bill were bad, and they were bad for the same reason: nobody had ever sat down and matched job weight to runner capacity. That kind of correlation work — line up job durations against resource hints, propose a tag map — is exactly where AI shines as a fast junior engineer. It will read your pipeline and a CSV of job stats in seconds and hand you a tidy proposal. What it will not do is understand your blast radius, your budget, or your security boundary. So you review everything, and you never hand it a runner registration token or a CI secret. Treat it as a sharp pairing partner, not an operator.

Start with what runners you actually have

You can’t tune placement without an inventory. On GitLab.com SaaS runners, that means knowing the machine types and their tags. On self-managed, list your runners and the tags they advertise:

# Self-managed: see registered runners and their tags
gitlab-runner list

# Or via the API (read-only token, scoped to read_api)
curl --header "PRIVATE-TOKEN: $READ_TOKEN" \
  "https://gitlab.example.com/api/v4/runners/all?per_page=100"

The output is a clean thing to feed an assistant: a list of runner names, their tags, and the executor type. Paste that alongside your .gitlab-ci.yml and ask the model to flag any job that has no tag matching an available runner — those are the jobs silently falling back to whatever picks them up.

Match jobs to runners with tags

Tags are the routing layer. A job’s tags: block must intersect with a runner’s tags, or it will never schedule on that runner. The fix for our OOM problem started here: give heavy jobs a tag that only big runners carry.

build-image:
  stage: build
  tags:
    - kubernetes
    - high-memory      # only our 16GB runners advertise this
  script:
    - docker buildx build --platform linux/amd64,linux/arm64 -t $IMAGE .

lint:
  stage: test
  tags:
    - kubernetes
    - small            # cheap runners are fine for linting
  script:
    - npm run lint

The trap is over-tagging. If you tag a job with high-memory but only two runners carry that tag, that job now queues behind everything else competing for those two runners. AI is good at spotting this: feed it the tag list plus job frequency and it will tell you which tag is a bottleneck before you find out the hard way.

Set Kubernetes executor resource requests

If you run the Kubernetes executor, tags get you to the right cluster but not the right pod size. That comes from the KUBERNETES_* variables, which control the requests and limits on the build container Kubernetes schedules.

build-image:
  stage: build
  tags: [kubernetes, high-memory]
  variables:
    KUBERNETES_CPU_REQUEST: "2"
    KUBERNETES_CPU_LIMIT: "4"
    KUBERNETES_MEMORY_REQUEST: "4Gi"
    KUBERNETES_MEMORY_LIMIT: "8Gi"
  script:
    - docker buildx build -t $IMAGE .

The request is what the scheduler reserves; the limit is the ceiling before the kernel throttles CPU or OOM-kills on memory. Setting a request too high wastes cluster capacity (the node reserves it whether you use it or not). Setting the limit too low gets you the exact OOM kills I was chasing. The sweet spot is request near your steady-state usage and limit near your peak.

Pro Tip: Match KUBERNETES_MEMORY_REQUEST to the job’s typical peak RSS, not its average. A build that sits at 1Gi for ten minutes then spikes to 6Gi during linking needs a request that covers the spike, or it gets evicted right at the finish line.

Override resources per job

You rarely want one resource profile for the whole pipeline. Set sane defaults at the top, then override only the jobs that need it.

default:
  tags: [kubernetes, small]
  variables:
    KUBERNETES_CPU_REQUEST: "500m"
    KUBERNETES_MEMORY_REQUEST: "1Gi"

unit-tests:
  stage: test
  script: [pytest -q]
  # inherits the small default

integration-tests:
  stage: test
  tags: [kubernetes, high-memory]
  variables:
    KUBERNETES_CPU_REQUEST: "2"
    KUBERNETES_MEMORY_REQUEST: "4Gi"
  services:
    - postgres:16
  script: [pytest tests/integration]

This is where a CSV of job durations earns its keep. The jobs that dominate your wall-clock time are the ones worth scaling up; everything else can stay on the cheap default. Right-sizing the long tail of small jobs saves almost nothing and just complicates the file.

Pick the right SaaS runner on GitLab.com

If you’re on GitLab.com rather than self-managed, you don’t set Kubernetes requests — you pick a machine size with a tag. The hosted runners advertise sizes like saas-linux-small-amd64, saas-linux-medium-amd64, and saas-linux-large-amd64, and the larger sizes cost more compute minutes per minute of run time.

quick-checks:
  tags: [saas-linux-small-amd64]
  script: [make lint typecheck]

heavy-build:
  tags: [saas-linux-large-amd64]   # more cores, higher minute multiplier
  script: [make build]

The decision is genuinely a tradeoff: a large runner costs more per minute but may finish a CPU-bound build in a third of the time, netting cheaper. A job that’s I/O- or network-bound won’t speed up on a bigger box at all, so you’re just paying the multiplier for nothing. This is the precise judgment call where AI’s duration data helps and its lack of context hurts — it can show you the math, but only you know whether that build is CPU-bound. Verify before you commit. A quick session in the prompt workspace to talk through the math against your real numbers beats guessing.

Use interruptible and resource_group to stop waste

Two cheap settings claw back a surprising amount of cost and contention. interruptible: true lets GitLab auto-cancel a redundant pipeline when you push again, so you stop paying for builds nobody will look at. resource_group serializes jobs that must not run concurrently — like a deploy — so you don’t pay for two runners colliding over the same environment.

build:
  interruptible: true        # killed if a newer commit supersedes this one
  script: [make build]

deploy-staging:
  resource_group: staging     # only one staging deploy runs at a time
  environment: staging
  script: [./deploy.sh staging]

Pair interruptible with the “auto-cancel redundant pipelines” project setting and your fast-moving branches stop stacking up dead jobs. These two are some of the highest-leverage one-liners in the whole file.

Let AI read the pipeline and the data

Here’s the workflow that actually moved our numbers. Export a CSV of job stats from the GitLab API — name, average duration, p95 duration, failure rate, current tags — and hand it to the assistant alongside the .gitlab-ci.yml.

# Pull recent job data for one project as CSV input for the model
curl --header "PRIVATE-TOKEN: $READ_TOKEN" \
  "https://gitlab.example.com/api/v4/projects/$ID/jobs?per_page=100" \
  | jq -r '.[] | [.name, .duration, .runner.description, .status] | @csv'

Then the prompt is roughly: “Given this pipeline and this job-duration CSV, propose a tag map and KUBERNETES_* resource values per job. Flag jobs whose tags match no runner, jobs whose memory limit looks too low given their failure pattern, and any job that should be interruptible.” The model came back with a per-job table that correctly fingered our build job’s OOM kills from its failure rate and proposed a high-memory tag plus a 8Gi limit.

That’s the model at its best: correlating durations and failures with resource hints faster than I could by hand. It’s still a junior engineer’s draft. It suggested bumping a flaky integration test to 8 cores when the real issue was a race condition, not CPU starvation. I caught that in review. You will catch things too, which is the point.

Pro Tip: Strip secrets out of any file before it goes near a model. Never paste a .runner_system_id, a registration token, or your CI_JOB_TOKEN into a chat. Resource tuning needs job names and durations, nothing privileged.

If you want a starting prompt, the prompts library and the ready-made bundles in prompt packs include CI analysis templates, and tools like Claude or Cursor handle the CSV-plus-YAML correlation well. The broader GitLab CI/CD guides cover the pipeline mechanics underneath all of this.

Wrapping up

Runner tuning isn’t glamorous, but matching job weight to runner capacity fixed both my queue times and my bill in a single afternoon. AI made that afternoon possible by doing the tedious correlation work — lining up durations, failures, and tags — and handing me a reviewable draft. I kept what was right, threw out the race-condition misdiagnosis, and never let a token anywhere near the chat window. Use it as the fast junior engineer it is: great at the first pass, never the last word, and never trusted with the keys.