Using AI to Debug GitLab CI Cache Misses That Waste Your

I lost an entire afternoon to a pipeline that swore it was caching. Every single run, the install job dutifully re-downloaded the whole node_modules tree from the registry, chewed through three minutes of runner time, and then printed Created cache like it had done me a favor. The next job hit Cache not found for key, shrugged, and reinstalled everything again. Twelve developers, dozens of pushes a day, and a cache that never once produced a hit. That is real money in runner minutes evaporating into the void.

The maddening part is that GitLab CI caching is not complicated. It is just unforgiving. A cache key that changes when it should be stable, a path that points at the wrong directory, or a policy that uploads when it should only download — any one of these silently turns your cache into expensive dead weight. The logs rarely tell you why the key was wrong, only that it missed.

This is exactly the kind of grunt work I now hand to an AI assistant. Not to trust blindly — more on that below — but because reading YAML, spotting a mismatched key, and proposing a corrected config is something a model does in seconds that would take me a careful re-read and a coffee. Think of it as a fast junior engineer: quick, tireless, occasionally confidently wrong. You still review every line before it merges.

Why cache keys are where it all goes wrong

The cache key is the single most common failure point. GitLab restores a cache only when the key on the current run matches the key from a previous run. If your key embeds something volatile — a commit SHA, a timestamp, the pipeline ID — it will never match a prior run, so you write a fresh cache every time and read it never.

Here is a broken config I have seen more than once:

install:
  stage: build
  cache:
    key: "$CI_COMMIT_SHA"
    paths:
      - node_modules/
  script:
    - npm install

$CI_COMMIT_SHA is unique per commit. The cache is saved under a key that, by definition, no future pipeline will ever request. I pasted this into Claude with a one-line prompt — “why does this GitLab cache never hit?” — and it immediately flagged the SHA key and suggested keying off the lockfile instead:

install:
  stage: build
  cache:
    key:
      files:
        - package-lock.json
    paths:
      - node_modules/
  script:
    - npm ci

With cache:key:files, GitLab computes the key from a hash of the listed files. The key only changes when package-lock.json changes, which is exactly when you actually want fresh dependencies. Between dependency bumps, every run hits the same key and restores instantly.

Pro Tip: cache:key:files accepts up to two files. List your lockfile plus the manifest (e.g. package-lock.json and package.json) so a manual dependency edit that hasn’t regenerated the lock still busts the cache.

Cache paths that quietly point at nothing

The second classic miss: the key matches, the cache restores, and your job still rebuilds everything because the cached path was never the directory holding the artifacts. Caching node_modules/ when your tooling installs into .npm/ or ~/.cache saves an empty-ish folder and restores it faithfully — to no effect.

test:
  cache:
    key:
      files:
        - package-lock.json
    paths:
      - .npm/
  script:
    - npm ci --cache .npm --prefer-offline
    - npm test

That one is actually correct if you pass --cache .npm. The bug is usually a mismatch between the path you cache and the path your tool writes to. An AI is good here because you can paste both the cache block and the install command and ask it to confirm they agree. It reads the --cache flag, checks it against paths:, and tells you whether they line up. A human skims past that mismatch constantly; the model treats both lines with equal attention. For a deeper pass on config like this, I sometimes run it through the code review dashboard before it goes near main.

Pull vs push: stop uploading caches you only read

cache:policy controls whether a job downloads the cache, uploads it, or both. The default is pull-push — every job downloads at the start and re-uploads at the end. For jobs that only consume dependencies (your test and lint stages), that upload is pure waste: it spends time and runner minutes re-archiving a cache that hasn’t changed.

Broken — every downstream job needlessly re-uploads:

default:
  cache:
    key:
      files:
        - package-lock.json
    paths:
      - node_modules/

install:
  stage: build
  script:
    - npm ci

test:
  stage: test
  script:
    - npm test

lint:
  stage: test
  script:
    - npm run lint

The AI-corrected version splits responsibilities: one job builds and pushes the cache, the rest only pull it.

install:
  stage: build
  cache:
    key:
      files:
        - package-lock.json
    paths:
      - node_modules/
    policy: pull-push
  script:
    - npm ci

test:
  stage: test
  cache:
    key:
      files:
        - package-lock.json
    paths:
      - node_modules/
    policy: pull
  script:
    - npm test

lint:
  stage: test
  cache:
    key:
      files:
        - package-lock.json
    paths:
      - node_modules/
    policy: pull
  script:
    - npm run lint

Now test and lint download once and skip the upload entirely. On a wide pipeline with five parallel test jobs, that is five archive-and-upload cycles you stop paying for on every run.

Fallback keys for the first run after a dependency bump

There is a frustrating cliff with lockfile-based keys: the moment package-lock.json changes, the new key has no cache yet, so you get a full cold install. fallback_keys softens that landing by letting GitLab restore the closest previous cache when the exact key misses.

install:
  stage: build
  cache:
    key:
      files:
        - package-lock.json
    fallback_keys:
      - npm-default
    paths:
      - node_modules/
    policy: pull-push
  script:
    - npm ci

When the lockfile changes, GitLab can’t find the new hashed key, so it falls back to npm-default and restores a recent-ish node_modules. npm ci then only reconciles the delta instead of fetching the whole world. You still want a stable job somewhere that writes the npm-default key so the fallback has something to land on — a detail the AI flagged for me that I would have missed, because the config looks complete without it.

Per-branch keys and the prefix pattern

If every branch shares one cache key, a feature branch with experimental dependencies can poison the cache that main reads. The fix is a key prefix that combines a stable hash with the branch name, so branches get isolated caches without losing lockfile-based invalidation.

install:
  cache:
    key:
      files:
        - package-lock.json
      prefix: "$CI_COMMIT_REF_SLUG"
    paths:
      - node_modules/
  script:
    - npm ci

The resulting key is something like feature-x-3f9a.... Each branch maintains its own cache, keyed off both the branch slug and the lockfile hash. This is a genuinely nice pattern, but it has a cost trap: lots of short-lived branches means lots of orphaned caches piling up in storage. When I asked the AI about the trade-off, it correctly noted that per-branch caches multiply storage and suggested pairing the prefix with fallback_keys pointing at the main cache, so new branches start warm:

install:
  cache:
    key:
      files:
        - package-lock.json
      prefix: "$CI_COMMIT_REF_SLUG"
    fallback_keys:
      - "main"
    paths:
      - node_modules/
  script:
    - npm ci

How to actually drive the AI without handing it the keys

Here is the part I will not stop repeating: the AI is reading your .gitlab-ci.yml, not your runtime. That distinction is your safety boundary. Paste the structure — keys, paths, policies, stages — and ask sharp questions:

“This cache key uses $CI_COMMIT_SHA. Will it ever produce a hit across pipelines?”
“Does my paths: block match where npm ci actually writes?”
“Which of these jobs can safely use policy: pull?”

What you do not paste is the rest of the file: your CI_REGISTRY_PASSWORD, deploy tokens, $AWS_SECRET_ACCESS_KEY, or any masked variable. None of that is relevant to a caching diagnosis, and a CI config is the last document you want leaking into a chat history. Scrub the variables: and secrets: blocks before they go anywhere near a model. The AI does not need them to tell you your cache key is volatile.

Pro Tip: Keep a small, sanitized snippet — just the cache: and script: blocks of the failing job — as your standard paste. It gives the model everything it needs to reason about misses and nothing it shouldn’t see.

Treat the response like a junior engineer’s pull request: useful, fast, and not yet trusted. The model will occasionally invent a key syntax that GitLab doesn’t support, or confidently recommend policy: pull for a job that genuinely needs to push. Cross-check anything structural against the GitLab cache docs, and validate the YAML in a throwaway pipeline before you let it touch your default branch. The workflow that has served me well: AI proposes, I read every line, a real pipeline run confirms the cache actually hits. If you want this same loop available to your team mid-review, the same models plug into editors like Cursor and GitHub Copilot, so the config reasoning happens right where the YAML lives.

For caching specifically, I keep a few reusable prompts on hand so I am not re-typing the same diagnostic questions — you can browse the prompt library or grab a curated prompt pack if you want a head start. And if you are wrangling more GitLab pipeline problems than just caching, the GitLab CI/CD guides cover the adjacent failure modes.

Conclusion

Cache misses are almost never a GitLab bug — they are a config that looks right and quietly isn’t. A volatile key, a mismatched path, an unnecessary push policy: small mistakes with a recurring bill measured in runner minutes. AI is genuinely excellent at catching these, because reading config and spotting inconsistencies is precisely what it is fast at. Just remember what it is: a quick junior engineer who reads your YAML, proposes a fix, and should never see your secrets. You verify, a real pipeline confirms, and then you merge. Do that, and the next time a job prints Created cache, it will actually mean something.

Using AI to Debug GitLab CI Cache Misses That Waste Your Runner Minutes