GitLab CI Error Guide: 'waiting for pod running: timed out

Exact Error Message

A job using the Kubernetes executor fails in the prepare stage with a timeout:

ERROR: Job failed (system failure): prepare environment: waiting for pod running: timed out waiting for pod to start

Two closely related variants point at why the pod never started:

ERROR: Job failed (system failure): prepare environment: image pull failed: failed to pull image "registry.example.com/ci/build:latest": rpc error: code = Unknown desc = failed to pull and unpack image ... 401 Unauthorized

ERROR: Job failed (system failure): prepare environment: waiting for pod running: pod "runner-xyz-project-42-concurrent-0" status is "Pending"

All three mean the same thing at the GitLab level — the build pod never reached Running before poll_timeout elapsed — but the Pending and image pull variants tell you which Kubernetes problem to chase.

What the Error Means

With the Kubernetes executor, GitLab Runner does not run your job on a fixed host. For each job it asks the cluster to create a build pod (build container + helper container + any services:), then polls the Kubernetes API until that pod reports Running. Only then does it stream your script: into the build container.

If the pod is still Pending or ContainerCreating when the runner’s poll_timeout expires, the runner gives up and reports timed out waiting for pod to start as a system failure. The job log is the GitLab side of the story; the real reason lives in the Kubernetes events for that pod — image pull errors, no schedulable node, a LimitRange rejection, or a quota denial. The runner only sees “still not Running,” so you must inspect the cluster to learn why.

This is fundamentally a scheduling and pull problem, not a script problem. Your .gitlab-ci.yml commands never ran.

Common Causes

Image cannot be pulled. Wrong image name/tag, a private registry with no imagePullSecrets, or a rate-limited public registry → pod stuck ImagePullBackOff/ErrImagePull.
The GitLab helper image fails to pull (air-gapped cluster without helper_image mirrored), so the pod never fully starts.
No schedulable node. Cluster is at capacity, or every node has a taint the pod does not tolerate → Pending, event 0/N nodes are available.
Resource requests exceed availability. cpu_request/memory_request in config.toml are larger than any node can satisfy → Insufficient cpu/memory.
Namespace ResourceQuota or LimitRange rejection. The pod is denied admission or forced to invalid limits.
CNI / networking not ready on a freshly added node, delaying ContainerCreating.
Cluster autoscaler lag. A new node is being provisioned but takes longer than poll_timeout to join and schedule.
poll_timeout too low for large images or slow pulls — the default may be too short for a cold node.

How to Reproduce the Error

Point a job at a private image without supplying pull credentials, on a runner using the Kubernetes executor:

# .gitlab-ci.yml
build:
  tags: [k8s]
  image: registry.example.com/private/build:latest   # no pull secret configured
  script:
    - make build

# config.toml — Kubernetes executor, no image_pull_secrets, short timeout
[[runners]]
  executor = "kubernetes"
  [runners.kubernetes]
    namespace = "gitlab-ci"
    poll_timeout = 180
    image = "alpine:3.20"

The pod is created but stays Pending/ImagePullBackOff; after 180s the job fails with timed out waiting for pod to start.

Diagnostic Commands

Watch the build pod and, crucially, its events — that is where Kubernetes records the real reason:

# Find the build pod (named runner-<id>-project-<n>-concurrent-<m>)
kubectl get pods -n gitlab-ci

# The single most useful command — events explain Pending / ImagePull / quota
kubectl describe pod -n gitlab-ci runner-xyz-project-42-concurrent-0

# Cluster-wide recent events, sorted
kubectl get events -n gitlab-ci --sort-by=.lastTimestamp | tail -30

# Per-container status (which container is stuck)
kubectl get pod -n gitlab-ci runner-xyz-project-42-concurrent-0 \
  -o jsonpath='{range .status.containerStatuses[*]}{.name}{": "}{.state}{"\n"}{end}'

# Node schedulability, taints, and free capacity
kubectl get nodes
kubectl describe node <node> | grep -A5 -E 'Taints|Allocatable|Allocated resources'

# Namespace quota / limit range that may be rejecting the pod
kubectl get resourcequota,limitrange -n gitlab-ci -o yaml

# Runner-side config and timeout
sudo grep -A20 '\[runners.kubernetes\]' /etc/gitlab-runner/config.toml

For deeper GitLab-side tracing, set variables: { CI_DEBUG_TRACE: "true" } in the job and inspect gitlab-runner --debug run if you control the runner host. The pod name embeds CI_PROJECT_ID and the concurrent slot, so kubectl get pods correlates directly to the failing job.

Step-by-Step Resolution

kubectl describe pod on the stuck pod and read the Events. The fix depends entirely on what they say.

Image pull (ImagePullBackOff/401): create a registry secret and wire it into the runner:

kubectl create secret docker-registry ci-pull \
  --docker-server=registry.example.com \
  --docker-username="$REG_USER" --docker-password="$REG_PASS" \
  -n gitlab-ci

[runners.kubernetes]
  image_pull_secrets = ["ci-pull"]

Helper image (air-gapped): mirror it and set helper_image explicitly in config.toml.
Pending / 0/N nodes available: add capacity, or add a toleration / fix the node selector:
```
[runners.kubernetes.node_tolerations]
  "ci-only=true" = "NoSchedule"
```
Insufficient cpu/memory: lower cpu_request/memory_request, or scale the node pool so requests fit a node.
Quota/LimitRange rejection: raise the namespace ResourceQuota, or set cpu_limit/memory_limit that comply with the LimitRange.

Autoscaler lag / large images: raise poll_timeout so a cold node has time to join and pull:

[runners.kubernetes]
  poll_timeout = 600   # seconds; default is often too short for cold pulls

Restart the runner (sudo gitlab-runner restart) after editing config.toml, then retry the job.

Prevention and Best Practices

Pre-pull common images onto nodes (or run a registry pull-through cache) so cold pulls do not race poll_timeout.
Always configure image_pull_secrets for private registries, and authenticate to public registries to dodge anonymous rate limits.
Set realistic cpu_request/memory_request that fit your smallest node, and keep them under the namespace LimitRange ceiling.
Reserve a node pool for CI with a taint and matching tolerations, so build pods schedule predictably and do not starve workloads.
Tune poll_timeout to match your worst-case (autoscaler spin-up + largest image pull), not the happy path.
Mirror the helper image in air-gapped clusters and pin its version in config.toml.
For triage, the free incident assistant can turn a pod-timeout log plus kubectl describe output into a likely cause. More patterns live in the GitLab CI/CD guides.

GitLab CI Error: prepare environment exit status 1 — the shell-executor prepare-stage failure.
GitLab CI Error: stuck runners tag mismatch — job never assigned to a runner.
GitLab CI Error: Cannot connect to the Docker daemon (dind) — Docker-in-Docker failures inside a running pod.

Frequently Asked Questions

Where is the real reason for the timeout? Not in the GitLab job log — that only says “timed out.” Run kubectl describe pod <build-pod> -n <ci-namespace> and read the Events section. It will show ImagePullBackOff, 0/N nodes are available, Insufficient cpu, or a quota rejection.

How do I find the build pod for a failing job? Its name is runner-<runner-id>-project-<CI_PROJECT_ID>-concurrent-<slot>. List pods in your CI namespace with kubectl get pods -n gitlab-ci and match the project ID from the job.

My image is private — what do I configure? Create a docker-registry secret in the CI namespace and reference it via image_pull_secrets under [runners.kubernetes] in config.toml. Without it the pod sits in ImagePullBackOff until poll_timeout fires.

Should I just raise poll_timeout to fix every timeout? Only when the cause is genuinely slow (cold autoscaled node, large image pull). If the pod is Pending due to taints, quotas, or a missing pull secret, a higher timeout just delays the same failure — fix the scheduling/pull issue instead.

Why does the helper container matter? The pod includes a GitLab helper container alongside your build container. If the helper image cannot be pulled — common in air-gapped clusters — the pod never reaches Running even when your build image is fine. Mirror and pin helper_image.

GitLab CI Error Guide: 'waiting for pod running: timed out waiting for pod to start' Kubernetes Executor

Exact Error Message

What the Error Means

Common Causes

How to Reproduce the Error

Diagnostic Commands

Step-by-Step Resolution

Prevention and Best Practices

Frequently Asked Questions

Download the Free 500-Prompt DevOps AI Toolkit

Exact Error Message

What the Error Means

Common Causes

How to Reproduce the Error

Diagnostic Commands

Step-by-Step Resolution

Prevention and Best Practices

Related Errors

Frequently Asked Questions

Download the Free 500-Prompt DevOps AI Toolkit