Skip to content
DevOps AI ToolKit
Newsletter
All guides
AI for GitLab CI/CD By James Joyner IV · · 9 min read

GitLab CI Error Guide: 'Job failed: execution took longer than' CI Job Timeouts

Fix GitLab CI 'execution took longer than' job timeouts: project vs job vs runner maximum_timeout, hung commands, no-output stalls, and RUNNER_SCRIPT_TIMEOUT.

  • #gitlab-cicd
  • #troubleshooting
  • #errors
  • #timeout

Overview

ERROR: Job failed: execution took longer than ... seconds is not a bug in your build — it is GitLab Runner enforcing a wall-clock limit and killing a job that ran past it. The runner starts the script, watches a timer, and once the effective timeout elapses it terminates the process group, marks the job failed, and reports how long it allowed. The job may have been doing real work the whole time, or it may have been stuck waiting on something that was never going to return.

You will see it at the bottom of the job log, after everything else has scrolled past:

Cleaning up project directory and file based variables
ERROR: Job failed: execution took longer than 1h0m0s seconds

The effective limit is the lowest of several timeouts that all apply at once: the project’s pipeline timeout (CI/CD Settings > General pipelines, default 1h), any per-job timeout: keyword in .gitlab-ci.yml, and the runner’s maximum_timeout from config.toml. There is also a separate no-output timeout (RUNNER_SCRIPT_TIMEOUT and the silent-job killer) that fires when a job produces no log output for too long, even if the overall wall-clock budget is fine. Knowing which of these tripped is the whole battle.

Symptoms

  • The job fails at the very end with execution took longer than ... and a duration that exactly matches a configured limit (e.g. 1h0m0s, 3600 seconds, 30m0s).
  • The job log stops scrolling for a long stretch, then jumps straight to the failure line.
  • Running after_script appears and then the job is killed — the timeout fired during cleanup.
  • The duration shown is suspiciously round (exactly 1 hour, exactly 30 minutes) — that points at a configured cap, not the work itself.
  • Re-running the job times out at the same point every time, or hangs forever locally when you run the failing command by hand.
# Inspect the failing job's reported duration and final lines
glab ci view --job 84213117
Status: failed   Duration: 1h0m0s
...
$ ./run-integration-tests.sh
WARNING: Timed out waiting for the build to finish
ERROR: Job failed: execution took longer than 1h0m0s seconds
# Confirm what limit the runner actually enforced for this job
journalctl -u gitlab-runner --since '20 min ago' | grep -i timeout
Job timeout: 3600s (project), runner maximum_timeout: 1800s -> effective 1800s
WARNING: Terminated job execution-timeout reached, killing process group pid=20194

Common Root Causes

1. The job genuinely exceeds the project timeout

The simplest case: a slow build or test suite that legitimately needs more time than the project allows. The default project pipeline timeout is 1 hour (CI/CD Settings > General pipelines > Timeout). A compile step or end-to-end suite that grew past that gets cut off mid-run.

# How long did the job actually run before the kill?
glab ci view --job 84213117 | grep -i duration
Status: failed   Duration: 1h0m0s
$ npm run test:e2e
  running 412 of 980 specs ...
ERROR: Job failed: execution took longer than 1h0m0s seconds

The duration is exactly the project limit and the suite was only ~40% through. Either raise the project timeout, split the suite across parallel: jobs, or cache more aggressively so the job finishes inside the budget.

2. A per-job timeout: shorter than the work

The timeout: keyword in .gitlab-ci.yml sets a per-job cap that overrides the project timeout when it is lower. A deploy or migration job with timeout: 10m will die at ten minutes even though the project allows an hour.

deploy:
  stage: deploy
  timeout: 10m          # <- caps THIS job at 10 minutes
  script:
    - ./deploy.sh --wait-for-healthy
$ ./deploy.sh --wait-for-healthy
waiting for rollout to become healthy (8m elapsed)...
ERROR: Job failed: execution took longer than 10m0s seconds

The work needs more than ten minutes but the job-level timeout: says otherwise. Raise the per-job timeout: to match reality, or fix the underlying slow rollout so it finishes inside it.

3. Runner maximum_timeout caps below the project timeout

A runner’s config.toml can set maximum_timeout, and GitLab uses the lower of the runner cap and the project/job timeout. If your project allows 1h but the runner was registered with a 30-minute cap, every job on that runner dies at 30 minutes regardless of project settings.

# Check the runner's configured cap
grep -i maximum_timeout /etc/gitlab-runner/config.toml
  maximum_timeout = 1800
$ make build
...
ERROR: Job failed: execution took longer than 30m0s seconds

The job stopped at exactly 30m0s even though the project allows 1h — the runner cap won. Raise maximum_timeout in config.toml (or re-register with --maximum-timeout 3600) and restart the runner, or pin long jobs to a runner with a higher cap via tags:.

4. A hung command waiting on input or a never-returning process

A foreground server, a missing -d/--detach, or a command that blocks on an interactive prompt will sit forever and burn the entire budget without ever finishing. The job log goes silent because the process is alive and waiting — it just never returns control to the script.

test:
  script:
    - docker compose up        # <- foreground, blocks forever, no -d
    - ./run-tests.sh
$ docker compose up
api-1     | listening on :8080
db-1      | database system is ready to accept connections
(no further output for 59 minutes)
ERROR: Job failed: execution took longer than 1h0m0s seconds

docker compose up never returns, so run-tests.sh never runs and the job times out at the full project limit. Use docker compose up -d, background long-running processes, and pipe yes or --non-interactive/--yes flags into anything that might prompt.

5. No-output / silent job killed by RUNNER_SCRIPT_TIMEOUT or a network stall

GitLab also kills jobs that produce no log output for too long — a stalled download, a frozen network mount, or a process that swallows its own output. The script-stage timeout is governed by RUNNER_SCRIPT_TIMEOUT (and RUNNER_AFTER_SCRIPT_TIMEOUT for after_script), and a hung transfer can trip it well before the wall-clock limit.

variables:
  RUNNER_SCRIPT_TIMEOUT: 30m
build:
  script:
    - curl -fSL https://artifacts.internal/blob.tar.gz -o blob.tar.gz
$ curl -fSL https://artifacts.internal/blob.tar.gz -o blob.tar.gz
  % Total    % Received   Average Speed
 14  512M   14 73.2M       0     0
(stalled at 14% — no bytes, no output)
ERROR: Job failed: execution took longer than 30m0s seconds

The download stalled and the job produced no output until the script timeout fired. Add --max-time/--connect-timeout to curl, retry with backoff, or wrap the command in timeout 600 ... so a single stalled transfer fails fast instead of consuming the whole budget.

6. Deadlock waiting on a resource or a service that never starts

Integration tests that wait for a database, a license server, a lock, or a services: container that never becomes ready will block until the timeout. This looks identical to a slow job, but the cause is a dependency that is down, misconfigured, or never came up.

test:
  services:
    - postgres:16
  script:
    - ./wait-for-it.sh db:5432 -t 0   # <- -t 0 means wait forever
    - pytest
$ ./wait-for-it.sh db:5432 -t 0
waiting for db:5432 without a timeout...
(blocked — postgres service failed to start, no port ever opens)
ERROR: Job failed: execution took longer than 1h0m0s seconds

The wait-for-it.sh -t 0 (no timeout) loops forever because the postgres service crashed and the port never opens. Give wait loops a finite timeout (-t 60), check service health, and fail loudly when a dependency does not come up instead of waiting silently.

Diagnostic Workflow

Step 1: Read the exact duration in the failure line

glab ci view --job <JOB_ID> | grep -iE 'duration|execution took longer'

A round number (1h0m0s, 30m0s, 3600 seconds) means a configured limit tripped; an odd duration means the work itself ran long. This single line tells you whether to look at config or at the job.

Step 2: Identify which of the three timeouts is lowest

# Per-job timeout in the pipeline file
grep -n 'timeout:' .gitlab-ci.yml
# Project timeout: CI/CD Settings > General pipelines > Timeout (UI / API)
glab api projects/:id | grep -i build_timeout
# Runner cap
grep -i maximum_timeout /etc/gitlab-runner/config.toml

GitLab enforces the lowest of per-job timeout:, project timeout, and runner maximum_timeout. Match the killed duration to whichever one equals it.

Step 3: Look for the silent stretch in the log

glab ci view --job <JOB_ID> --log | tail -40

If the log goes quiet for a long time before the failure, suspect a hung command (cause 4), a stalled transfer (cause 5), or a dependency deadlock (cause 6) rather than slow-but-progressing work.

Step 4: Reproduce the failing command in isolation

# Run the exact script step locally with a hard wrapper
timeout 120 ./run-integration-tests.sh; echo "exit=$?"

If it hangs forever locally too, it is a hung/blocking command or a missing dependency — not a tuning problem. An exit=124 from timeout confirms the command never returns on its own.

Step 5: Check the runner side for what it enforced

journalctl -u gitlab-runner --since '30 min ago' | grep -iE 'timeout|killing|terminated'

The runner logs the effective timeout and the process-group kill. This confirms whether the runner’s own cap (or the no-output killer) ended the job versus the project/job limit.

Example Root Cause Analysis

The integration job in a service repo started failing at exactly one hour, every run, after weeks of green pipelines.

glab ci view --job 84213117 | grep -iE 'duration|execution took'
Status: failed   Duration: 1h0m0s
ERROR: Job failed: execution took longer than 1h0m0s seconds

A clean 1h0m0s matched the project default, so the question was whether the work grew or something hung. The tail of the log showed a long silent stretch:

glab ci view --job 84213117 --log | tail -8
$ docker compose -f compose.test.yml up -d
$ ./wait-for-it.sh kafka:9092 -t 0
waiting for kafka:9092 without a timeout...
(no output for 58 minutes)
ERROR: Job failed: execution took longer than 1h0m0s seconds

The job was blocked in wait-for-it.sh ... -t 0, never reaching the tests. Checking why the port never opened:

glab ci view --job 84213117 --log | grep -i kafka | head -3
kafka-1  | ERROR Exiting Kafka due to fatal exception (kafka.Kafka$)
kafka-1  | java.lang.IllegalArgumentException: Missing required configuration "node.id"

The kafka service container crashed on a recent image bump (a required node.id was no longer defaulted), so kafka:9092 never opened. The -t 0 wait loop had no timeout, so it sat there until the project limit killed the whole job.

Fix: give the wait a finite timeout so a dead dependency fails fast, and pin/repair the service config:

test:
  script:
    - docker compose -f compose.test.yml up -d
    - ./wait-for-it.sh kafka:9092 -t 60   # fail in 60s, not 1 hour
    - pytest tests/integration

With a 60-second wait the job now fails in about a minute with a clear “kafka:9092 not reachable” message, the broken node.id config gets fixed, and the pipeline is green again — instead of burning an hour to report nothing useful.

Prevention Best Practices

  • Read the failure duration first: a round number is a configured cap, an odd one is slow work. That distinction routes the entire investigation.
  • Set explicit per-job timeout: values so a deploy or migration fails in minutes, not at the hour-long project default — and so a runaway job is killed before it wastes runner minutes.
  • Keep the project timeout, per-job timeout:, and runner maximum_timeout consistent; remember GitLab uses the lowest of the three, so a small runner cap silently overrides everything else.
  • Never run foreground/blocking commands in script: without backgrounding (-d, &) and never leave wait loops or downloads unbounded — wrap risky steps in timeout N ... and add --max-time/--connect-timeout to network calls.
  • Give services: and dependency waits finite timeouts and health checks so a service that never starts fails loudly instead of deadlocking the job. See more in the GitLab CI/CD guides.
  • Tune RUNNER_SCRIPT_TIMEOUT/RUNNER_AFTER_SCRIPT_TIMEOUT for legitimately long, low-output stages so the no-output killer does not cut off a slow-but-working job.

Quick Command Reference

# The failure line and exact duration (round = configured cap)
glab ci view --job <JOB_ID> | grep -iE 'duration|execution took longer'

# Which of the three timeouts is lowest?
grep -n 'timeout:' .gitlab-ci.yml                      # per-job timeout:
glab api projects/:id | grep -i build_timeout          # project timeout
grep -i maximum_timeout /etc/gitlab-runner/config.toml # runner cap

# Find the silent stretch / hung step
glab ci view --job <JOB_ID> --log | tail -40

# Reproduce the failing command with a hard kill
timeout 120 ./failing-step.sh; echo "exit=$?"          # 124 = never returned

# What the runner actually enforced (and the kill)
journalctl -u gitlab-runner --since '30 min ago' \
  | grep -iE 'timeout|killing|terminated'

# Re-register a runner with a higher cap (seconds)
gitlab-runner register --maximum-timeout 3600

Conclusion

ERROR: Job failed: execution took longer than ... seconds means GitLab Runner enforced a wall-clock limit and killed the job. The usual root causes:

  1. The job genuinely exceeds the project timeout because the build or test suite outgrew it.
  2. A per-job timeout: in .gitlab-ci.yml is set shorter than the work the job actually does.
  3. The runner’s maximum_timeout in config.toml caps below the project timeout, and GitLab uses the lower of the two.
  4. A hung or blocking command (foreground server, missing -d/--detach, interactive prompt) never returns.
  5. A silent, no-output job or a stalled download is killed by RUNNER_SCRIPT_TIMEOUT or the no-output timeout.
  6. A deadlock waiting on a resource or a services: container that never starts blocks until the limit.

Start with the exact duration in the failure line and which of the three timeouts it matches — those two facts identify almost every timeout before you touch a config. For ad-hoc triage, the free incident assistant can turn a timed-out job log into the likely root cause.

Free download · 368-page PDF

Download the Free 500-Prompt DevOps AI Toolkit

500 battle-tested, copy-paste AI prompts engineered by a senior systems engineer — every one with fill-in placeholders and safety/back-out notes. Drop your email and it's yours.

  • 500 prompts: Linux · Kubernetes · Terraform · OpenStack · GitLab · Docker · Monitoring · Incident Response
  • Instant PDF download — yours free, forever
  • Plus one practical AI-workflow email a week (no spam)

Single opt-in · unsubscribe anytime · no spam.