AI for GitLab CI/CD Difficulty: Intermediate ClaudeChatGPT

GitLab CI/CD Cache vs Artifacts Design Prompt

Choose between cache and artifacts in GitLab CI/CD — design cache keys that invalidate correctly, set artifact expiry, and avoid the common 'cache as artifact' mistake.

Target user: DevOps engineers designing pipeline data flow
Difficulty: Intermediate
Tools: Claude, ChatGPT

The prompt

You are a senior DevOps engineer with deep experience designing GitLab CI/CD caching and artifact strategy. You know that cache and artifacts solve different problems, and that confusing them produces slow, expensive pipelines.

I will provide:
- The current `.gitlab-ci.yml` cache/artifacts setup
- The runner type (Docker, K8s, shell) and cache backend (local-disk, S3, MinIO)
- The data being cached/artifacted: dependency dirs, build outputs, test reports, generated docs
- The pipeline's overall shape: how many jobs, which jobs need which data
- Symptom: slow restore, cache always missing, artifact upload timeouts, ballooning storage

Your job:

1. **Clarify the mental model**:
   - **Cache** = persisted between pipelines, for speeding up recomputation. Stored on the runner (or in S3) keyed by `cache:key:`. **Not** a job-to-job handoff. Best for `node_modules`, `~/.cache/pip`, etc. — things you'd recompute from a lockfile.
   - **Artifacts** = produced by one job, consumed by another (or saved for download). Uploaded to GitLab server, downloaded by downstream jobs. Best for build outputs that feed other jobs (compiled binary, test reports), or for keeping deploy artifacts.
   - **Rule of thumb**: cache = "I could regenerate this." Artifact = "I need to pass this to another job or keep it for the user."
2. **Cache key strategy** — assess the current `cache:key:`:
   - **Static key** (e.g., `key: deps`) → never invalidates; great until dependencies change. Risky.
   - **`cache:key:files:`** → invalidates when listed files change. Use for `package-lock.json`, `requirements.txt`, `Gemfile.lock`, etc. Preferred for most.
   - **`cache:key:prefix:` + `files:`** → adds a manual versioning prefix for forced invalidation.
   - **`$CI_COMMIT_REF_SLUG`** in key → per-branch cache. Useful for feature-branch isolation but multiplies storage.
   - **`$CI_JOB_NAME`** in key → per-job cache. Avoids cross-contamination but redundant if jobs share deps.
3. **Cache scope policies**:
   - `policy: pull-push` (default) → reads and writes
   - `policy: pull` → read-only; downstream jobs that don't modify deps. Faster, less write contention.
   - `policy: push` → write-only; rare; for "build cache" jobs that initialize.
4. **Cache paths**:
   - List EXACTLY the directories that hold cached state. Don't cache the project working dir — that's the git checkout.
   - Common per-language: `node_modules/`, `~/.cache/pip`, `~/.gradle/caches`, `~/.m2/repository`, `target/`, `.venv/`
5. **Artifacts strategy**:
   - **`artifacts:paths:`** — files/dirs to upload. Avoid huge dirs (test fixtures, build caches).
   - **`artifacts:reports:`** — typed artifacts (`junit`, `coverage`, `dotenv`, `codequality`, `dast`, `sast`). Get UI integration.
   - **`artifacts:expire_in:`** — default depends on project setting; set explicitly for clarity. Use short for ephemeral (1 hour), long for releases (never).
   - **`artifacts:when:`** — `on_success` (default), `on_failure`, `always`. For failed-job logs/screenshots, use `on_failure`.
   - **`artifacts:exclude:`** — strip noise (e.g., exclude `**/node_modules`).
6. **Common anti-patterns** to flag:
   - **Using artifacts as cache**: every job uploads + downloads N MB. Massive overhead vs `cache:`.
   - **Caching build outputs** instead of artifacting them: works on the same runner sometimes, fails when job goes to a different runner.
   - **Unscoped cache** (no key files) growing forever: defaults are pretty good, but `key: dependencies` with no invalidation hits stale cache for months.
   - **Cache the project directory**: GitLab clones the project; caching `./` defeats clone+cache.
   - **Artifact size > 100 MB on every job**: server storage + upload time. Trim or use a real artifact registry (Package Registry).
   - **No `artifacts:expire_in`**: relies on project's default; admin may have set it generously.
7. **For multi-runner / autoscaler setups**:
   - Local-disk cache is per-runner — different runners miss each other's caches. Use distributed cache (S3/MinIO).
   - Configure `[runners.cache]` in `config.toml` for shared cache.

Provide concrete YAML diffs for each finding.

---

Runner type: [Docker / K8s / shell]
Cache backend: [local / S3 / MinIO]
Symptom: [DESCRIBE]
Current cache + artifacts config (from `.gitlab-ci.yml`):
```yaml
[PASTE]
```
Data sizes (rough): [node_modules: 500 MB, build/: 200 MB, etc.]
Pipeline shape: [DESCRIBE jobs and their data dependencies]

Run this prompt with AI

Test it, get an AI-improved version, or compare models — live in the Prompt Workspace. No copy-paste.

Safety notes

**Setting `artifacts:expire_in: never`** (or relying on a high default) on every job will exhaust GitLab server storage. Audit periodically.
Cache keyed only on `$CI_COMMIT_REF_SLUG` creates a new cache for every branch — branches accumulate. Add a cleanup job or use a stable key with `files:` invalidation.
Cache uploads happen at job end. If the cache is large (> 1 GB), upload time can dominate job duration. Trim paths.
Artifacts uploaded by one job and downloaded by another use the GitLab server's bandwidth. Large artifacts in fan-out patterns (one job, 10 consumers) multiply bandwidth.
`artifacts:paths:` with `**/*.log` recursively can capture an unexpected amount of data, including from `node_modules` if not excluded.
Switching from local cache to S3 cache requires runner config change AND may not be transparent — cache might miss the first time.
Distributed cache (S3) saves cross-runner hits but costs you S3 bandwidth and egress. For tightly-scoped runner pools, local cache may be cheaper.

Why this prompt works

The single biggest GitLab CI/CD design mistake is using artifacts where cache belongs — a job uploads 500 MB of node_modules as artifacts and 10 downstream jobs each download it. The right answer is “cache the deps, artifact the build output.” This prompt forces explicit reasoning per piece of data.

How to use it

Inventory the data flowing through your pipeline. For each directory: who produces it, who consumes it, can it be regenerated?
Apply the rule of thumb: can regenerate → cache. Job-to-job handoff → artifact. Both criteria fail → maybe you don’t need to persist it at all.
Set explicit expire_in on every artifact block. Don’t rely on defaults.
For shared runner pools, ensure cache backend is distributed (S3/MinIO).

Decision matrix

Data	Solution
`node_modules/`, `~/.cache/pip` (dependency dirs)	Cache with `files: [lockfile]` key
Compiled binary used by deploy job	Artifact with short `expire_in`
JUnit test report	Artifact `reports:junit` (UI integration)
Build cache (incremental compile state)	Cache with `policy: pull-push`
Generated documentation site for deploy	Artifact (passed to pages job)
Code coverage report	Artifact `reports:coverage`
Logs / screenshots from failed tests	Artifact `when: on_failure`
Release binary for download	Package Registry or long-lived artifact
Intermediate `target/` between two compile jobs	Cache if same runner, artifact if cross-runner

Cache key patterns

Single lockfile (Node.js)

.node-cache: &node-cache
  cache:
    key:
      files:
        - package-lock.json
    paths:
      - node_modules/

Multiple lockfiles (monorepo)

.cache: &cache
  cache:
    key:
      files:
        - package-lock.json
        - subproject/package-lock.json
    paths:
      - node_modules/
      - subproject/node_modules/

Manually invalidatable (with prefix)

.cache: &cache
  cache:
    key:
      prefix: v3-                # bump to invalidate
      files: [package-lock.json]
    paths:
      - node_modules/

Per-branch (use sparingly)

.cache: &cache
  cache:
    key: "$CI_COMMIT_REF_SLUG"
    paths: [node_modules/]

Producer / consumer with `policy: pull`

install-deps:
  stage: setup
  cache:
    key:
      files: [package-lock.json]
    paths: [node_modules/]
    policy: pull-push       # producer; reads existing, updates
  script:
    - npm ci

test:
  stage: test
  cache:
    key:
      files: [package-lock.json]
    paths: [node_modules/]
    policy: pull            # consumer; read-only, fast
  script:
    - npm test
  needs: [install-deps]

Artifact patterns

Build → deploy handoff

build:
  stage: build
  script:
    - go build -o bin/app ./cmd/app
  artifacts:
    paths:
      - bin/app
    expire_in: 1 day

deploy:
  stage: deploy
  needs: [build]
  script:
    - ./deploy.sh bin/app

Test reports (UI integration)

test:
  script:
    - pytest --junitxml=report.xml --cov-report=xml:coverage.xml
  artifacts:
    when: always
    paths:
      - report.xml
      - coverage.xml
    reports:
      junit: report.xml
      coverage_report:
        coverage_format: cobertura
        path: coverage.xml
    expire_in: 1 week

Failed-job diagnostics

e2e:
  script:
    - npx playwright test
  artifacts:
    when: on_failure
    paths:
      - test-results/
      - playwright-report/
    expire_in: 3 days

Don’t artifact this

# DON'T — using artifacts as a poor man's cache
build:
  script:
    - npm ci                # always pulls 500 MB
    - npm run build
  artifacts:
    paths:
      - node_modules/      # WRONG — 500 MB upload per pipeline
      - dist/

# DO
build:
  cache:
    key: { files: [package-lock.json] }
    paths: [node_modules/]
  script:
    - npm ci                # cached on most runs
    - npm run build
  artifacts:
    paths:
      - dist/              # only the build output (10s of MB)
    expire_in: 1 day

Common findings this catches

node_modules in artifacts:paths: → switch to cache:. Big win.
cache:key: static-key without invalidation → switch to key:files:[lockfile].
No cache:policy: on consumer jobs → they’re pushing the cache too. Set policy: pull.
Artifacts > 100 MB on every job → look for **/*.log or accidentally-included caches. Use artifacts:exclude:.
artifacts:expire_in: never → audit; usually unnecessary.
Local-disk runner cache + multi-runner cluster → caches miss every cross-runner job. Configure S3 distributed cache.

When to escalate

GitLab server storage usage spiking → audit artifact expiry settings org-wide; clean old pipelines.
S3 cache backend showing high latency → consider regional placement; bandwidth between runner and S3.
Pipeline reliability dropping due to cache flakiness → audit policy: settings; consider per-job key fragmentation.

Related prompts

More GitLab CI/CD prompts & error guides

Browse every GitLab CI/CD prompt and troubleshooting guide in one place.

Free download · 368-page PDF

Reading prompts? Get all 500 in one free PDF

500 battle-tested, copy-paste AI prompts engineered by a senior systems engineer — every one with fill-in placeholders and safety/back-out notes. Drop your email and it's yours.

500 prompts: Linux · Kubernetes · Terraform · OpenStack · GitLab · Docker · Monitoring · Incident Response
Instant PDF download — yours free, forever
Plus one practical AI-workflow email a week (no spam)

Single opt-in · unsubscribe anytime · no spam.

GitLab CI/CD Cache vs Artifacts Design Prompt

Why this prompt works

How to use it

Decision matrix

Cache key patterns

Single lockfile (Node.js)

Multiple lockfiles (monorepo)

Manually invalidatable (with prefix)

Per-branch (use sparingly)

Producer / consumer with `policy: pull`

Artifact patterns

Build → deploy handoff

Test reports (UI integration)

Failed-job diagnostics

Don’t artifact this

Common findings this catches

When to escalate

Related prompts

GitLab CI/CD Pipeline Optimization Prompt

GitLab CI/CD `needs:` DAG Optimization Prompt

GitLab Runner Troubleshooting Prompt

GitLab CI/CD Git LFS Large-File Pipeline Prompt

Reading prompts? Get all 500 in one free PDF

Why this prompt works

How to use it

Decision matrix

Cache key patterns

Single lockfile (Node.js)

Multiple lockfiles (monorepo)

Manually invalidatable (with prefix)

Per-branch (use sparingly)

Producer / consumer with policy: pull

Artifact patterns

Build → deploy handoff

Test reports (UI integration)

Failed-job diagnostics

Don’t artifact this

Common findings this catches

When to escalate

Related prompts

GitLab CI/CD Pipeline Optimization Prompt

GitLab CI/CD `needs:` DAG Optimization Prompt

GitLab Runner Troubleshooting Prompt

GitLab CI/CD Git LFS Large-File Pipeline Prompt

Reading prompts? Get all 500 in one free PDF

Producer / consumer with `policy: pull`