Skip to content
CloudOps
All prompts
AI for GitLab CI/CD Difficulty: Intermediate ClaudeChatGPT

GitLab CI/CD Cache vs Artifacts Design Prompt

Choose between cache and artifacts in GitLab CI/CD — design cache keys that invalidate correctly, set artifact expiry, and avoid the common 'cache as artifact' mistake.

Target user
DevOps engineers designing pipeline data flow
Difficulty
Intermediate
Tools
Claude, ChatGPT

The prompt

You are a senior DevOps engineer with deep experience designing GitLab CI/CD caching and artifact strategy. You know that cache and artifacts solve different problems, and that confusing them produces slow, expensive pipelines.

I will provide:
- The current `.gitlab-ci.yml` cache/artifacts setup
- The runner type (Docker, K8s, shell) and cache backend (local-disk, S3, MinIO)
- The data being cached/artifacted: dependency dirs, build outputs, test reports, generated docs
- The pipeline's overall shape: how many jobs, which jobs need which data
- Symptom: slow restore, cache always missing, artifact upload timeouts, ballooning storage

Your job:

1. **Clarify the mental model**:
   - **Cache** = persisted between pipelines, for speeding up recomputation. Stored on the runner (or in S3) keyed by `cache:key:`. **Not** a job-to-job handoff. Best for `node_modules`, `~/.cache/pip`, etc. — things you'd recompute from a lockfile.
   - **Artifacts** = produced by one job, consumed by another (or saved for download). Uploaded to GitLab server, downloaded by downstream jobs. Best for build outputs that feed other jobs (compiled binary, test reports), or for keeping deploy artifacts.
   - **Rule of thumb**: cache = "I could regenerate this." Artifact = "I need to pass this to another job or keep it for the user."
2. **Cache key strategy** — assess the current `cache:key:`:
   - **Static key** (e.g., `key: deps`) → never invalidates; great until dependencies change. Risky.
   - **`cache:key:files:`** → invalidates when listed files change. Use for `package-lock.json`, `requirements.txt`, `Gemfile.lock`, etc. Preferred for most.
   - **`cache:key:prefix:` + `files:`** → adds a manual versioning prefix for forced invalidation.
   - **`$CI_COMMIT_REF_SLUG`** in key → per-branch cache. Useful for feature-branch isolation but multiplies storage.
   - **`$CI_JOB_NAME`** in key → per-job cache. Avoids cross-contamination but redundant if jobs share deps.
3. **Cache scope policies**:
   - `policy: pull-push` (default) → reads and writes
   - `policy: pull` → read-only; downstream jobs that don't modify deps. Faster, less write contention.
   - `policy: push` → write-only; rare; for "build cache" jobs that initialize.
4. **Cache paths**:
   - List EXACTLY the directories that hold cached state. Don't cache the project working dir — that's the git checkout.
   - Common per-language: `node_modules/`, `~/.cache/pip`, `~/.gradle/caches`, `~/.m2/repository`, `target/`, `.venv/`
5. **Artifacts strategy**:
   - **`artifacts:paths:`** — files/dirs to upload. Avoid huge dirs (test fixtures, build caches).
   - **`artifacts:reports:`** — typed artifacts (`junit`, `coverage`, `dotenv`, `codequality`, `dast`, `sast`). Get UI integration.
   - **`artifacts:expire_in:`** — default depends on project setting; set explicitly for clarity. Use short for ephemeral (1 hour), long for releases (never).
   - **`artifacts:when:`** — `on_success` (default), `on_failure`, `always`. For failed-job logs/screenshots, use `on_failure`.
   - **`artifacts:exclude:`** — strip noise (e.g., exclude `**/node_modules`).
6. **Common anti-patterns** to flag:
   - **Using artifacts as cache**: every job uploads + downloads N MB. Massive overhead vs `cache:`.
   - **Caching build outputs** instead of artifacting them: works on the same runner sometimes, fails when job goes to a different runner.
   - **Unscoped cache** (no key files) growing forever: defaults are pretty good, but `key: dependencies` with no invalidation hits stale cache for months.
   - **Cache the project directory**: GitLab clones the project; caching `./` defeats clone+cache.
   - **Artifact size > 100 MB on every job**: server storage + upload time. Trim or use a real artifact registry (Package Registry).
   - **No `artifacts:expire_in`**: relies on project's default; admin may have set it generously.
7. **For multi-runner / autoscaler setups**:
   - Local-disk cache is per-runner — different runners miss each other's caches. Use distributed cache (S3/MinIO).
   - Configure `[runners.cache]` in `config.toml` for shared cache.

Provide concrete YAML diffs for each finding.

---

Runner type: [Docker / K8s / shell]
Cache backend: [local / S3 / MinIO]
Symptom: [DESCRIBE]
Current cache + artifacts config (from `.gitlab-ci.yml`):
```yaml
[PASTE]
```
Data sizes (rough): [node_modules: 500 MB, build/: 200 MB, etc.]
Pipeline shape: [DESCRIBE jobs and their data dependencies]

Why this prompt works

The single biggest GitLab CI/CD design mistake is using artifacts where cache belongs — a job uploads 500 MB of node_modules as artifacts and 10 downstream jobs each download it. The right answer is “cache the deps, artifact the build output.” This prompt forces explicit reasoning per piece of data.

How to use it

  1. Inventory the data flowing through your pipeline. For each directory: who produces it, who consumes it, can it be regenerated?
  2. Apply the rule of thumb: can regenerate → cache. Job-to-job handoff → artifact. Both criteria fail → maybe you don’t need to persist it at all.
  3. Set explicit expire_in on every artifact block. Don’t rely on defaults.
  4. For shared runner pools, ensure cache backend is distributed (S3/MinIO).

Decision matrix

DataSolution
node_modules/, ~/.cache/pip (dependency dirs)Cache with files: [lockfile] key
Compiled binary used by deploy jobArtifact with short expire_in
JUnit test reportArtifact reports:junit (UI integration)
Build cache (incremental compile state)Cache with policy: pull-push
Generated documentation site for deployArtifact (passed to pages job)
Code coverage reportArtifact reports:coverage
Logs / screenshots from failed testsArtifact when: on_failure
Release binary for downloadPackage Registry or long-lived artifact
Intermediate target/ between two compile jobsCache if same runner, artifact if cross-runner

Cache key patterns

Single lockfile (Node.js)

.node-cache: &node-cache
  cache:
    key:
      files:
        - package-lock.json
    paths:
      - node_modules/

Multiple lockfiles (monorepo)

.cache: &cache
  cache:
    key:
      files:
        - package-lock.json
        - subproject/package-lock.json
    paths:
      - node_modules/
      - subproject/node_modules/

Manually invalidatable (with prefix)

.cache: &cache
  cache:
    key:
      prefix: v3-                # bump to invalidate
      files: [package-lock.json]
    paths:
      - node_modules/

Per-branch (use sparingly)

.cache: &cache
  cache:
    key: "$CI_COMMIT_REF_SLUG"
    paths: [node_modules/]

Producer / consumer with policy: pull

install-deps:
  stage: setup
  cache:
    key:
      files: [package-lock.json]
    paths: [node_modules/]
    policy: pull-push       # producer; reads existing, updates
  script:
    - npm ci

test:
  stage: test
  cache:
    key:
      files: [package-lock.json]
    paths: [node_modules/]
    policy: pull            # consumer; read-only, fast
  script:
    - npm test
  needs: [install-deps]

Artifact patterns

Build → deploy handoff

build:
  stage: build
  script:
    - go build -o bin/app ./cmd/app
  artifacts:
    paths:
      - bin/app
    expire_in: 1 day

deploy:
  stage: deploy
  needs: [build]
  script:
    - ./deploy.sh bin/app

Test reports (UI integration)

test:
  script:
    - pytest --junitxml=report.xml --cov-report=xml:coverage.xml
  artifacts:
    when: always
    paths:
      - report.xml
      - coverage.xml
    reports:
      junit: report.xml
      coverage_report:
        coverage_format: cobertura
        path: coverage.xml
    expire_in: 1 week

Failed-job diagnostics

e2e:
  script:
    - npx playwright test
  artifacts:
    when: on_failure
    paths:
      - test-results/
      - playwright-report/
    expire_in: 3 days

Don’t artifact this

# DON'T — using artifacts as a poor man's cache
build:
  script:
    - npm ci                # always pulls 500 MB
    - npm run build
  artifacts:
    paths:
      - node_modules/      # WRONG — 500 MB upload per pipeline
      - dist/

# DO
build:
  cache:
    key: { files: [package-lock.json] }
    paths: [node_modules/]
  script:
    - npm ci                # cached on most runs
    - npm run build
  artifacts:
    paths:
      - dist/              # only the build output (10s of MB)
    expire_in: 1 day

Common findings this catches

  • node_modules in artifacts:paths: → switch to cache:. Big win.
  • cache:key: static-key without invalidation → switch to key:files:[lockfile].
  • No cache:policy: on consumer jobs → they’re pushing the cache too. Set policy: pull.
  • Artifacts > 100 MB on every job → look for **/*.log or accidentally-included caches. Use artifacts:exclude:.
  • artifacts:expire_in: never → audit; usually unnecessary.
  • Local-disk runner cache + multi-runner cluster → caches miss every cross-runner job. Configure S3 distributed cache.

When to escalate

  • GitLab server storage usage spiking → audit artifact expiry settings org-wide; clean old pipelines.
  • S3 cache backend showing high latency → consider regional placement; bandwidth between runner and S3.
  • Pipeline reliability dropping due to cache flakiness → audit policy: settings; consider per-job key fragmentation.

Related prompts

Newsletter

Get weekly AI workflows for DevOps engineers

Practical prompts, automation ideas, and tool reviews for infrastructure engineers. One email per week. No spam.