GitLab CI/CD Pipeline Optimization Prompt
Speed up slow GitLab pipelines — DAG with `needs:`, cache vs artifacts, parallel jobs, image pre-builds, dependency proxy, and shallow clones.
- Target user
- DevOps engineers wanting faster GitLab pipelines
- Difficulty
- Intermediate
- Tools
- Claude, ChatGPT
The prompt
You are a senior DevOps engineer who has shaved hours off real GitLab pipelines in production. You know the difference between cache and artifacts, when DAG with `needs:` actually helps, and when an "optimization" is just complexity. I will provide: - The current pipeline timing: total duration, longest jobs, where the critical path lives (`Pipelines → <id>` view, optionally the "PipelineGraph") - The full `.gitlab-ci.yml` (or relevant excerpts) - The runner executor type (Docker, K8s, shell) - Constraints: must run on every push? Specific compliance requirements? Single-runner cluster? - Recent pipeline durations (e.g., median over last 30 runs from analytics) Your job: 1. **Identify the critical path** — the longest chain of dependent jobs. Total pipeline time is dominated by this. Optimizations that don't shorten the critical path don't help wall-clock duration. 2. **Apply optimizations in priority order** (highest impact first): - **Convert stages → DAG with `needs:`**: stages enforce sequential gates between groups; DAG lets each job start as soon as its inputs are ready. Often shaves 30-50% off long pipelines. - **Parallelize naturally splittable jobs** with `parallel: <N>` or `parallel: matrix:` — tests, lints, builds across N versions. - **Cache dependencies properly**: `cache:key:files:` instead of static key; cache `node_modules/`, `~/.cache/pip`, `~/.gradle/caches/`, target/build directories per language. Set `cache:policy: pull` for read-only consumers. - **Use artifacts ONLY for cross-job handoff**, not as a "cache." Artifacts upload AND download for every consumer. - **Pre-build CI base images** so jobs don't `apt-get install` on every run. Build a "ci-base" image with toolchain baked, push to your registry, use as job `image:`. - **Shallow clone**: `GIT_DEPTH: 50` (or smaller) for jobs that don't need full history. Default is `20` on GitLab.com; verify yours. - **Skip unchanged paths** with `rules:changes:` — don't run frontend tests if only backend code changed. - **Dependency proxy / registry mirror**: avoid Docker Hub rate limits, faster pulls. `dependency_proxy:` on GitLab; or set runner pull-policy to `if-not-present`. - **`interruptible: true`** on jobs that should cancel when a new pipeline starts on the same MR. Saves CPU on outdated pipelines. 3. **Identify ANTI-optimizations** the user might be doing: - Excessive parallelization without enough runners → jobs queue instead of run - Caching too aggressively → cache restore time > rebuild time - Pre-building images for every commit → image build itself becomes the bottleneck - Long-lived branch-specific caches that grow unbounded 4. **Estimate the win** for each recommendation, qualitatively (small / medium / large) so the user can prioritize. 5. **Watch for the trade-offs**: pipeline speed vs determinism, cost (more runners), or maintenance complexity (DAG is harder to reason about than stages). 6. **Recommend monitoring**: GitLab's pipeline analytics, job duration trends, runner utilization. Optimization is iterative. Mark any change that requires runner / cluster reconfiguration (e.g., upgrade dependency-proxy, install more runners) separately from `.gitlab-ci.yml`-only changes. --- Current pipeline duration (median): [N minutes] Critical path (longest jobs in order): [DESCRIBE] Runner executor + count: [e.g., 4 shared K8s runners, 2 GPU specific] Constraints: [must run X / cannot Y / regulatory] Full `.gitlab-ci.yml`: ```yaml [PASTE — or relevant 70%] ``` Recent timing data: [PASTE — job names + durations]
Why this prompt works
Pipeline optimization is a domain where there are many techniques but only a few apply to any given pipeline. The first question is always “where’s the critical path?” — without that, every recommendation is a guess. This prompt forces the model to optimize the critical path specifically rather than scattering generic advice.
How to use it
- Get real timing data first. “Pipeline is slow” tells the model nothing. Median duration over 30 runs + the slowest jobs tells everything.
- Identify the critical path explicitly. If your slowest job is
test-integrationat 20 min and total pipeline is 25 min, optimizing the 2-min lint doesn’t matter. - One change at a time. Apply DAG OR caching changes, not both at once — you won’t know which won/lost.
- Measure after each change. Compare median durations before/after; one slow run can mislead.
Useful diagnostics
# View pipeline timing
# In GitLab UI: Pipelines → click pipeline → "Pipeline graph" tab
# Or: Analytics → CI/CD analytics → pipeline durations chart
# Per-job durations via API
curl -s --header "PRIVATE-TOKEN: <token>" \
"https://gitlab.example.com/api/v4/projects/<id>/pipelines/<pipeline-id>/jobs" | \
jq -r '.[] | "\(.duration)s \(.name) [\(.stage)]"' | sort -nr | head
# Find the bottleneck (longest job)
curl -s --header "PRIVATE-TOKEN: <token>" \
"https://gitlab.example.com/api/v4/projects/<id>/pipelines/<pipeline-id>/jobs" | \
jq '.[] | {name, stage, duration, queued_duration}' | jq -s 'sort_by(-.duration) | .[0:5]'
# Runner utilization (admin)
# Admin → Monitoring → CI/CD → Runner utilization chart
High-impact patterns
Convert stages to DAG (needs:)
Before:
stages: [build, test, deploy]
build-app: { stage: build, script: ./build.sh }
build-frontend: { stage: build, script: ./build-fe.sh }
test-unit: { stage: test, script: pytest }
test-integration: { stage: test, script: ./integ.sh }
deploy: { stage: deploy, script: ./deploy.sh }
After (DAG):
stages: [build, test, deploy] # still useful for UI grouping
build-app: { stage: build, script: ./build.sh }
build-frontend: { stage: build, script: ./build-fe.sh }
test-unit: { stage: test, script: pytest, needs: ["build-app"] }
test-integration: { stage: test, script: ./integ.sh,
needs: ["build-app", "build-frontend"] }
deploy: { stage: deploy, script: ./deploy.sh,
needs: ["test-unit", "test-integration"] }
Now test-unit starts as soon as build-app finishes — no wait for build-frontend.
Smart caching (Node example)
.node-cache: &node-cache
cache:
key:
files:
- package-lock.json
paths:
- node_modules/
policy: pull-push # producer
test-unit:
<<: *node-cache
cache:
policy: pull # consumer (read-only, fast)
script: npm test
cache:key:files: invalidates only when package-lock.json changes. Most pipelines never invalidate.
rules:changes: to skip unchanged paths
test-frontend:
rules:
- changes:
- frontend/**/*
- package.json
script: npm test --prefix frontend
Frontend tests skipped when only backend code changed.
parallel: matrix: for matrix testing
test:
parallel:
matrix:
- PYTHON_VERSION: ["3.10", "3.11", "3.12"]
OS: ["ubuntu", "alpine"]
image: python:$PYTHON_VERSION-$OS
script: pytest
6 jobs run in parallel — across all 6 combinations.
Pre-built CI base image
Instead of:
test:
image: python:3.12
before_script:
- apt-get update && apt-get install -y postgresql-client libpq-dev
- pip install -r requirements-dev.txt # 90 seconds every job
script: pytest
Build once:
# In a separate ".images/Dockerfile.ci-python"
FROM python:3.12-slim
RUN apt-get update && apt-get install -y postgresql-client libpq-dev
COPY requirements-dev.txt /tmp/
RUN pip install -r /tmp/requirements-dev.txt
Then in pipelines:
test:
image: registry.example.com/team/ci-python:1.2
script: pytest # no before_script!
interruptible: true for MR pipelines
default:
interruptible: true # cancel outdated MR pipelines automatically
deploy:
interruptible: false # never cancel deploys
script: ./deploy.sh
Dependency proxy (avoid Docker Hub rate limits)
variables:
CI_DEPENDENCY_PROXY_SERVER: $CI_SERVER_HOST:$CI_SERVER_PORT
image: $CI_DEPENDENCY_PROXY_DIRECT_GROUP_IMAGE_PREFIX/python:3.12
GitLab caches the image; subsequent jobs hit your registry instead of Docker Hub.
Common pitfalls this catches
- Caching everything: cache restore at 30s + 200MB pull > recomputing in 10s. Profile.
- DAG with hidden ordering deps: deploy job runs before tests because the user forgot to
needs: ["test-*"]. Validate visually in the pipeline graph. - Artifacts used as cache: every job uploads + downloads. Use
cache:for build outputs that don’t need to flow between jobs. - Excessive
parallel:against few runners: jobs queue; no real speedup. GIT_STRATEGY: cloneinstead offetch: clones from scratch every job;fetchreuses.when: alwayson a cleanup job after a flaky deploy: cleanup runs when deploy fails, may delete state needed for diagnosis.
Estimating wins (qualitative)
| Change | Typical win | Cost |
|---|---|---|
Stages → DAG (needs:) | 30-50% on long pipelines | Pipeline-graph complexity |
| Effective dependency cache | 1-3 min per job | Cache invalidation risk |
| Pre-built CI base image | 1-2 min per job | Image maintenance |
interruptible: true | Frees runners for active MRs | None |
parallel: matrix: | 2-10× wall-clock on testable | More runners needed |
rules:changes: to skip | 100% of skipped jobs | Risk: skipping when shouldn’t |
| Dependency proxy | 5-30s per pull | Setup once |
GIT_DEPTH: 50 | 5-30s on big repos | Tools needing history break |
When to escalate
- Slow runner provisioning at scale — engage runner team to provision more / faster nodes.
- Cache backend (S3, MinIO) saturated → larger cache backend, or smaller caches.
- Pipeline doesn’t fit in 1 hour even after optimization — consider parent/child pipelines or scheduled jobs.
Related prompts
-
GitLab CI/CD Cache vs Artifacts Design Prompt
Choose between cache and artifacts in GitLab CI/CD — design cache keys that invalidate correctly, set artifact expiry, and avoid the common 'cache as artifact' mistake.
-
GitLab CI/CD Debugging Prompt
Diagnose failing GitLab CI/CD pipelines from job logs, .gitlab-ci.yml, and runner configuration.
-
GitLab CI/CD `needs:` DAG Optimization Prompt
Convert stage-based GitLab pipelines to DAG (`needs:`), find hidden ordering bugs, design clean fan-out/fan-in patterns, and avoid `needs:` traps.
-
GitLab Runner Troubleshooting Prompt
Diagnose GitLab Runner failures — runner offline, executor errors, Docker-in-Docker issues, autoscaler problems, slow job pickup, and resource exhaustion.