Skip to content
DevOps AI ToolKit
Newsletter
All guides
AI for GitLab CI/CD By James Joyner IV · · 9 min read

GitLab CI Error Guide: 'command terminated with exit code 137' OOMKilled on the Kubernetes Executor

Fix GitLab CI exit code 137 (OOMKilled) on the Kubernetes executor: raise pod memory limits, make the JVM/Node cgroup-aware, split jobs, and stop SIGKILL build failures.

  • #gitlab-cicd
  • #troubleshooting
  • #errors
  • #kubernetes-executor

Exact Error Message

When a job container on the GitLab Kubernetes executor exceeds its memory limit, the kernel’s OOM killer terminates the process and the job fails with exit code 137:

Running with gitlab-runner 16.11.0 (...)
  on kubernetes-runner abc123, system ID: s_...
Using Kubernetes executor with image maven:3.9-eclipse-temurin-21 ...
$ mvn -B verify
[INFO] Building project ...
Killed
ERROR: Job failed: command terminated with exit code 137

The same failure can surface from the helper or a service container, or appear in the pod’s status as OOMKilled:

ERROR: Job failed (system failure): pod "runner-abc123-project-42-..." status is "Failed":
container build terminated with reason "OOMKilled" (exit code 137)

The exit code is the giveaway: 137 = 128 + 9, and signal 9 is SIGKILL. Something killed the process forcibly — and on Kubernetes, that “something” is almost always the OOM killer enforcing a memory limit.

What the Error Means

Exit code 137 is not a normal application failure. It means the process did not get to exit on its own terms — it was sent SIGKILL. By POSIX convention, a process terminated by signal N reports 128 + N, so 128 + 9 = 137.

On the Kubernetes executor, GitLab Runner schedules each job as a pod with several containers: build (your script), helper (git/artifacts), and one container per services: entry (for example docker:dind). Each container has a memory limit derived from the runner’s config.toml. When a container’s resident memory crosses that limit, the kernel cgroup OOM killer terminates the offending process inside it, Kubernetes marks the container’s lastState.terminated.reason as OOMKilled, and the runner reports exit code 137.

This is distinct from a 137 caused by node-level memory pressure (the kubelet evicting a pod) — same exit code, different culprit. The diagnostics below tell the two apart.

Common Causes

  • The build container exceeded its memory limit (classic OOMKilled). A compile, test run, or webpack build uses more RAM than memory_limit allows. This is the most common cause.
  • The runtime is not cgroup-aware. Old JVMs and some Node configurations read the node’s total memory, not the container limit, then size their heap far above the cgroup limit and get killed mid-run.
  • memory_limit/memory_request in config.toml is too low for the workload, or is unset and falling back to a small default.
  • A service container (often docker:dind) ran out of memory. Building large images inside dind or running a memory-hungry database service blows the service container’s own limit.
  • Node memory pressure / eviction. The node itself is under pressure and the kubelet evicts the pod, which also surfaces as a terminated container.
  • Large or parallel test suites. Loading a big fixture set, running browsers (headless Chrome), or -j parallelism multiplies peak memory.

How to Reproduce the Error

Set a deliberately low limit on the runner and run a memory-hungry job:

# config.toml on the GitLab Runner
[[runners]]
  executor = "kubernetes"
  [runners.kubernetes]
    memory_limit   = "256Mi"
    memory_request = "256Mi"
# .gitlab-ci.yml
oom-demo:
  image: python:3.12
  script:
    # allocate ~512Mi — well over the 256Mi limit
    - python -c "x = bytearray(512 * 1024 * 1024); print('allocated', len(x))"

The job log ends with Killed and ERROR: Job failed: command terminated with exit code 137, and the pod records OOMKilled.

Diagnostic Commands

Confirm it was an OOM kill (not a generic crash) by inspecting the pod. Job pods are short-lived, so watch in real time or use a runner with pod cleanup disabled for debugging:

# Find the job pod (runner pods live in the runner's namespace)
kubectl -n gitlab-runner get pods

# The smoking gun: lastState.terminated.reason == OOMKilled
kubectl -n gitlab-runner get pod runner-abc123-project-42-... -o yaml \
  | grep -A6 lastState
lastState:
  terminated:
    exitCode: 137
    reason: OOMKilled
    startedAt: "2026-06-25T09:12:01Z"
# Per-container limits/requests and recent events (look for "OOMKilling")
kubectl -n gitlab-runner describe pod runner-abc123-project-42-...

# If you suspect node pressure rather than a container limit
kubectl describe node <node> | grep -A5 "MemoryPressure\|Allocated"
kubectl top pod -n gitlab-runner          # requires metrics-server

Check the runner config that sets the limits:

grep -E "memory_(limit|request|limit_overwrite)" /etc/gitlab-runner/config.toml
# On the node, kernel OOM messages name the killed cgroup/process:
dmesg -T | grep -i "killed process\|oom"

Step-by-Step Resolution

  1. Raise the build container’s memory limit and request. Set both so the scheduler reserves the memory and the cgroup allows it:

    [runners.kubernetes]
      memory_request = "2Gi"
      memory_limit   = "4Gi"
      # helper and services have their own knobs:
      helper_memory_limit            = "512Mi"
      service_memory_request         = "1Gi"
      service_memory_limit           = "2Gi"
  2. Let jobs override limits when they need more, without editing global config. Allow overwrite in the runner, then set it per job:

    [runners.kubernetes]
      memory_limit_overwrite_max_allowed = "8Gi"
    heavy-build:
      variables:
        KUBERNETES_MEMORY_LIMIT: "6Gi"
        KUBERNETES_MEMORY_REQUEST: "4Gi"
      script: ["./build.sh"]
  3. Make the runtime cgroup-aware so it sizes the heap below the limit. Cap the heap explicitly rather than letting it auto-detect the node:

    variables:
      # JVM: respect the container limit and cap heap below it
      JAVA_TOOL_OPTIONS: "-XX:MaxRAMPercentage=75.0"
      # Node: leave headroom under the limit
      NODE_OPTIONS: "--max-old-space-size=3072"
  4. Give docker:dind and other services their own headroom. Image builds happen inside the dind container, so its limit — not the build container’s — is what matters:

    build-image:
      image: docker:27
      services:
        - name: docker:27-dind
      variables:
        KUBERNETES_SERVICE_MEMORY_LIMIT: "4Gi"
      script:
        - docker build -t "$CI_REGISTRY_IMAGE:$CI_COMMIT_SHORT_SHA" .
  5. Split the job or reduce peak memory if you cannot grant more RAM. Shard tests, drop test parallelism (-j1), stream large files instead of loading them, and avoid building everything in one job.

  6. If it was node pressure, not a container limit, add memory_request so the scheduler stops over-committing the node, and consider larger or autoscaled node pools.

After changing config.toml, restart the runner (gitlab-runner restart or roll the runner pod) and re-run the job; confirm the pod no longer records OOMKilled.

Prevention and Best Practices

  • Always set both memory_request and memory_limit — requests stop the scheduler from packing nodes too tightly; limits stop one job starving its neighbours.
  • Pin runtime heaps (MaxRAMPercentage, --max-old-space-size) so the JVM/Node never assume they own the whole node.
  • Size limits from observed peak usage (kubectl top pod, container metrics) plus ~25% headroom, not from guesswork.
  • Keep dind and database services: limits separate from the build container — they OOM independently.
  • Alert on pod restarts with reason: OOMKilled so a creeping memory leak is caught before it blocks pipelines. The free monitoring assistant can turn an OOMKilled event into a likely-cause-and-fix note.
  • ERROR: Job failed: command terminated with exit code 1 — an ordinary application/test failure, not a signal kill.
  • Job failed (system failure): prepare environment — pod could not be scheduled (often Insufficient memory from over-large requests).
  • context deadline exceeded / timed out waiting for pod to start — scheduling/pull timeouts, distinct from OOM.
  • See the GitLab CI/CD guides for more Kubernetes executor failures.

Frequently Asked Questions

Why exit code 137 specifically? Because the process was killed by signal 9 (SIGKILL). Unix reports a signal-terminated process as 128 + signal_number, and 128 + 9 = 137. On Kubernetes that signal almost always comes from the cgroup OOM killer enforcing a memory limit.

How do I know whether it was a container limit or the whole node? Run kubectl get pod -o yaml and read lastState.terminated.reason. OOMKilled means a container crossed its own memory limit. If instead the pod shows Evicted with a MemoryPressure event in describe node, the node ran out of memory and the kubelet evicted the pod.

My limit looks high enough but the JVM still gets killed — why? The JVM likely is not cgroup-aware and sized its heap from the node’s total RAM. Set -XX:MaxRAMPercentage (modern JDKs) or an explicit -Xmx below the container limit so the heap plus non-heap overhead stays under it.

Where do I set memory for a docker:dind service? In config.toml under service_memory_limit/service_memory_request, or per job via KUBERNETES_SERVICE_MEMORY_LIMIT. Image builds run inside the dind container, so raising only the build container’s limit will not help.

Can a single job override the runner’s memory limit? Yes, if the admin sets memory_limit_overwrite_max_allowed in config.toml. Then jobs set KUBERNETES_MEMORY_LIMIT/KUBERNETES_MEMORY_REQUEST up to that ceiling — useful for one heavy job without raising limits globally.

Free download · 368-page PDF

Download the Free 500-Prompt DevOps AI Toolkit

500 battle-tested, copy-paste AI prompts engineered by a senior systems engineer — every one with fill-in placeholders and safety/back-out notes. Drop your email and it's yours.

  • 500 prompts: Linux · Kubernetes · Terraform · OpenStack · GitLab · Docker · Monitoring · Incident Response
  • Instant PDF download — yours free, forever
  • Plus one practical AI-workflow email a week (no spam)

Single opt-in · unsubscribe anytime · no spam.