Skip to content
CloudOps
All prompts
AI for GitLab CI/CD Difficulty: Advanced ClaudeChatGPT

GitLab CI/CD `needs:` DAG Optimization Prompt

Convert stage-based GitLab pipelines to DAG (`needs:`), find hidden ordering bugs, design clean fan-out/fan-in patterns, and avoid `needs:` traps.

Target user
DevOps engineers shortening pipeline wall-clock time
Difficulty
Advanced
Tools
Claude, ChatGPT

The prompt

You are a senior DevOps engineer who has converted dozens of stage-based GitLab pipelines into DAG (`needs:`) pipelines. You can spot hidden ordering assumptions that break when stages disappear, and you know when DAG hurts more than helps.

I will provide:
- The current pipeline shape (stages, jobs, what each does)
- Pipeline timing data (median duration per job, total pipeline duration)
- The data flow (which jobs produce artifacts/files needed by which downstream)
- Constraints: deploy gates, manual approvals, environment safety

Your job:

1. **Map the actual dependency graph**:
   - For each job, what does it NEED to start? (Real dependency: an artifact, a setup, an approval)
   - What jobs CURRENTLY block it (via stages) that aren't real dependencies?
   - Surface ordering assumptions that are NOT explicit (e.g., "deploy assumes test passed; test assumes lint passed because both are in earlier stages")
2. **Convert to DAG safely**:
   - Add `needs:` to each job listing its real prerequisites
   - Keep `stages:` for UI grouping (still useful for readers)
   - For artifact-consuming jobs, `needs:` with `artifacts: true` (default for jobs in `needs:`) ensures the artifacts are downloaded
   - For manual gates (deploys), keep them as gating jobs that downstream `needs:` references
3. **Identify wins**:
   - Critical-path reduction: which jobs that ran sequentially can now run in parallel?
   - Faster failures: a 30-min integration test no longer blocked by a 5-min lint
   - Better runner utilization: more parallelism
4. **Identify risks**:
   - Hidden assumption: "deploy assumed dbmigrate ran" — make explicit via `needs:`
   - DAG depth limit: GitLab has a `needs:` count cap (50 in most versions); large monorepos hit it
   - Job not in any downstream's `needs:` → orphan job; still runs but unreached
   - Cycles: `A needs: B; B needs: A` → pipeline lint catches; design issue
   - `needs:` on a job in a LATER stage: works in DAG mode but confusing to readers
5. **Patterns to recommend**:
   - **Fan-out**: one prep job, many parallel jobs depending on it
   - **Fan-in**: many parallel jobs converging into one
   - **Diamond**: A → (B, C in parallel) → D
   - **Optional dependency**: `needs: [job: build, optional: true]` (newer GitLab) — proceed even if upstream skipped
   - **Cross-stage `needs:`**: explicitly cross stage boundaries when stages are kept for UI only
6. **`needs:` ergonomics**:
   - Short form: `needs: ["build", "test"]`
   - Long form for artifact control: `needs: [{job: "build", artifacts: true}]`
   - To avoid artifact downloads: `artifacts: false` (saves time when you don't need the files)
   - `needs:project:` for multi-project pulls (see [parent-child design](/prompts/gitlab-parent-child-pipelines-design/))
7. **Common errors to surface**:
   - `needs: ["hidden-job"]` where the job starts with `.` → silently ignored (hidden jobs don't run)
   - `needs:` to a job that's excluded by `rules:` → downstream may fail or skip; consider `optional: true`
   - `needs:` count > 50 → pipeline rejected; consider parent/child split
   - Forgetting `needs:` on a job that genuinely needed artifacts → downstream fails with "file not found" mysteriously
8. **For monorepos**: large `needs:` graphs hit the 50-job-per-needs limit. Switch to parent/child pipelines for that scale.

---

Current pipeline shape: [DESCRIBE — stages + job count]
Median duration per job (top 10):
```
[PASTE]
```
Current `.gitlab-ci.yml`:
```yaml
[PASTE]
```
Real data dependencies (artifacts, files, side effects):
```
[DESCRIBE]
```
Constraints (gates, approvals, environment safety):
[DESCRIBE]

Why this prompt works

Converting stages → DAG is the highest-impact single optimization for many GitLab pipelines (often 30-50% wall-clock reduction). But it can also expose ordering bugs that were silently masked by stages. This prompt forces explicit dependency mapping before flipping the switch.

How to use it

  1. Inventory artifacts first. For each job: what files/artifacts does it READ, what does it PRODUCE. That’s your dependency graph.
  2. Sketch the graph visually before editing YAML. A diagram beats a guess.
  3. Convert one chain at a time. Don’t flip the whole pipeline to DAG in one MR.
  4. Watch the pipeline graph in the UI after — GitLab visualizes the DAG.

Useful diagnostics

# Per-job durations (via API)
curl --header "PRIVATE-TOKEN: <t>" \
  "https://gitlab.example.com/api/v4/projects/<id>/pipelines/<pid>/jobs" | \
  jq -r '.[] | "\(.duration)s \(.queued_duration)s \(.name) [\(.stage)]"' | \
  sort -nr | head -20

# Sum of all job durations vs. pipeline wall-clock — measures parallelism gap
TOTAL=$(curl ... | jq '[.[].duration] | add')
WALL=$(curl ... | jq '.duration')
echo "Sum: $TOTAL  Wall: $WALL  Parallelism: $(echo "$TOTAL/$WALL" | bc -l)"

A high sum-to-wall ratio means you’re already parallel; a low ratio (close to 1.0) means jobs are sequential and DAG can help.

Patterns

Stage-based (before)

stages: [setup, build, test, deploy]

install-deps:    { stage: setup, script: npm ci }
build-app:       { stage: build, script: npm run build }
build-frontend:  { stage: build, script: npm run build-fe }
test-unit:       { stage: test, script: npm test }
test-integration:{ stage: test, script: ./integ.sh }
lint:            { stage: test, script: npm run lint }
deploy:          { stage: deploy, script: ./deploy.sh }

Wall-clock = max(setup) + max(build) + max(test) + max(deploy)

DAG (after)

stages: [setup, build, test, deploy]   # kept for UI

install-deps:
  stage: setup
  script: npm ci
  artifacts:
    paths: [node_modules/]
    expire_in: 1 hour

build-app:
  stage: build
  needs: [install-deps]
  script: npm run build
  artifacts: { paths: [dist/app/] }

build-frontend:
  stage: build
  needs: [install-deps]
  script: npm run build-fe
  artifacts: { paths: [dist/fe/] }

# Lint doesn't need build outputs — runs in parallel with build
lint:
  stage: test
  needs: [install-deps]
  script: npm run lint

test-unit:
  stage: test
  needs: [build-app]
  script: npm test

test-integration:
  stage: test
  needs: [build-app, build-frontend]
  script: ./integ.sh

deploy:
  stage: deploy
  needs: [test-unit, test-integration, lint]
  script: ./deploy.sh
  environment: production

Wins:

  • lint starts as soon as install-deps finishes (parallel with both builds)
  • test-unit starts as soon as build-app finishes (doesn’t wait for frontend)
  • Deploy still waits for all three test jobs (explicit needs:)

Optional dependency (newer GitLab)

e2e-screenshots:
  needs:
    - job: e2e-test
      optional: true              # don't fail if e2e-test was skipped
    - job: build-app
  script: ./capture-screenshots.sh

No-artifact needs: (faster startup)

notify-slack:
  needs:
    - job: deploy
      artifacts: false            # don't download deploy's artifacts
  script: ./slack-notify.sh "$DEPLOY_VERSION"

Common findings this catches

  • Deploy needs: [test-unit, test-integration] but assumes dbmigrate ran → previously was an earlier stage. Make explicit: add needs: [dbmigrate].
  • needs: [".install-deps"] (typo: hidden job) → silently ignored. Rename or fix.
  • needs: cycle: A→B→A → GitLab CI lint rejects; restructure.
  • Job listed in 30+ needs: lists approaching the 50 limit → consider parent/child split.
  • Lint blocked behind heavy integration tests (in pre-DAG pipeline) → move lint to needs: [install-deps] only.
  • Artifact-less needs: with artifacts: true (default) → wastes time downloading. Set artifacts: false.
  • Deploy with needs: [test] but test is excluded by rules: on this branch → deploy job fails or skips. Use optional: true or restructure.

When DAG hurts more than helps

  • Very short pipelines (< 5 min total): the complexity isn’t worth the few seconds saved.
  • Sequential-by-nature workflows: lint → build → deploy → notify — there’s no parallelism to exploit.
  • Inexperienced team: DAG is harder to reason about. Stage-based is more obvious.

When to escalate

  • Approaching the 50-needs: limit → architectural choice; consider parent/child pipelines.
  • Pipeline restructure spans many MR-author teams → coordinate; documentation matters more than the YAML change.
  • DAG-converted pipeline now flaky → re-audit for hidden ordering dependencies that stages had masked.

Related prompts

Newsletter

Get weekly AI workflows for DevOps engineers

Practical prompts, automation ideas, and tool reviews for infrastructure engineers. One email per week. No spam.