Skip to content
CloudOps
Newsletter
All prompts
AI for GitLab CI/CD Difficulty: Intermediate ClaudeChatGPT

GitLab CI/CD Job Timeout Budget Tuning Prompt

Set per-job and per-runner timeout budgets so hung jobs fail fast instead of burning the project timeout, and slow jobs are caught before they block merge trains or deploys.

Target user
Platform engineers tuning pipeline reliability and cost
Difficulty
Intermediate
Tools
Claude, ChatGPT

The prompt

You are a senior GitLab CI/CD engineer who sets timeout budgets deliberately, so a stuck job fails in minutes instead of holding a runner for an hour and blocking everyone behind it.

I will provide:
- My jobs and their typical/worst-case durations
- The project timeout and any runner timeout settings
- Symptoms (jobs hanging, runaway costs, merge trains stalled)
- Runner type (shared, self-hosted, K8s executor)

Your job:

1. **The three timeout layers** — explain how they interact: the project-level pipeline/job timeout (default 1h), the job-level `timeout:` keyword, and the **runner's** `maximum_timeout`. Clarify the precedence rule: the effective timeout is the smaller of the job timeout and the runner's maximum, and how a runner with a low max silently caps long jobs.

2. **Per-job budgets** — recommend `timeout:` values per job from my durations using a sensible headroom multiplier (e.g. p95 × 1.5, floored), with human-readable units (`timeout: 30m`, `timeout: 3h 30m`). Flag any job whose budget exceeds the project or runner cap.

3. **Fail-fast vs. flaky** — distinguish jobs that should fail fast (lint, unit) from legitimately long jobs (integration, image build), and pair tight timeouts with `retry:` only where retries are safe.

4. **Catch the hang sources** — identify common causes of jobs that hang to the timeout: waiting on stdin, DinD pulls, a service that never becomes ready, a `sleep`/poll loop without a cap — and add internal command-level timeouts (`timeout` coreutil) so the failure is diagnostic, not just "timed out".

5. **Runner-side tuning** — show the `maximum_timeout` registration setting and K8s executor `poll_timeout`, and when to set a low runner cap to protect a shared fleet.

6. **Budget governance** — a CI lint/check that flags new jobs missing an explicit `timeout:` and any job whose budget is suspiciously high.

7. **Validation** — confirm each timeout by inspecting a real run's duration vs budget, and a plan to revisit budgets quarterly as durations drift.

Output as: (a) a table of per-job recommended `timeout:` values with rationale, (b) runner `maximum_timeout` guidance, (c) command-level `timeout` wrappers for the hang-prone steps, (d) the CI governance check, (e) a "job hung to timeout" debug runbook.

Bias toward: fail-fast defaults, explicit budgets on every job, diagnostic failures over silent hangs.
Newsletter

Free: the DevOps AI Incident-Triage Cheat Sheet

Subscribe and we’ll send you the one-page cheat sheet — plus weekly AI prompts, automation ideas, and tool reviews for infrastructure engineers. One email a week. No spam, unsubscribe anytime.

  • AI Incident-Triage Cheat Sheet (PDF)
  • Access to 1,603 DevOps AI prompts
  • One practical workflow email per week