GitLab CI Error Guide: 'WARNING: Failed to extract cache'

Exact Error Message

These warnings appear in the cache section near the top of the job log:

Checking cache for default-protected...
WARNING: Failed to extract cache: archive/tar: invalid tar header
Failed to extract cache

Other common variants from the same subsystem:

WARNING: file does not exist                       (cache)

Checking cache for node-modules-main...
WARNING: No URL provided, cache will not be downloaded from shared cache server.
Successfully extracted cache

Note these are usually WARNING, not ERROR — a failed cache restore does not fail the job by itself. The job continues with a cold cache, which is why the real symptom is often “my build got slow” or “a dependency that should be cached is being reinstalled every run.”

What the Error Means

GitLab caches are tar archives the runner saves after a job (cache:push) and restores before the next one (cache:pull), keyed by cache:key. On each run the runner looks for an archive matching the key, downloads it (from local disk or a distributed object store), and untars it into cache:paths.

Failed to extract cache / archive/tar: invalid tar header — the archive was found but couldn’t be unpacked. It’s corrupted, truncated (a previous job died mid-upload), or was written by an incompatible archiver/zip version.
file does not exist / cache will not be downloaded — no archive matched the key, or there’s no place to fetch it from. This is normal on a first run, but persistent occurrences mean the key changes every time or the cache lives on a runner this job didn’t land on.
No URL provided ... shared cache server — distributed caching isn’t configured, so the cache is local to a single runner and invisible to others.

Common Causes

Corrupted or truncated archive from a job that was cancelled or OOM-killed while uploading the cache.
Cache key mismatch across branches, runners, or commits — a key that includes $CI_COMMIT_SHA or $CI_JOB_ID is unique every run, so nothing is ever reused.
Local cache on a different runner. With multiple runners and no shared cache, job 2 lands on a runner that never saw job 1’s cache.
Distributed cache (S3/MinIO/GCS) not configured or wrong credentials — so caches are never uploaded to the shared store (No URL provided).
cache:policy mismatch — a pull-only job runs before any push job has populated the cache.
Archiver/zip version mismatch between runner versions writing and reading the archive.
Cache too large, causing slow or partial uploads that truncate.

How to Reproduce the Error

A per-pipeline key guarantees a permanent miss, since the key is unique every run:

build:
  cache:
    key: "$CI_COMMIT_SHA"     # unique per commit → never reused
    paths:
      - node_modules/
  script:
    - npm ci

A pull-only consumer with no producer reproduces the empty-cache path:

test:
  cache:
    key: deps
    paths: [node_modules/]
    policy: pull              # nothing ever pushed this key
  script:
    - npm test

WARNING: file does not exist                       (cache)

Diagnostic Commands

1. Read the cache section of the job log. Every job prints Checking cache for <key>... near the top and Creating cache <key>... near the bottom. Compare the key strings between the producing and consuming jobs — if they differ, that’s your miss.

2. Inspect the resolved key by echoing it in the job:

build:
  variables:
    CACHE_KEY: "deps-$CI_COMMIT_REF_SLUG"
  cache:
    key: "$CACHE_KEY"
    paths: [node_modules/]
  script:
    - echo "cache key is: $CACHE_KEY"     # confirm it's stable across runs
    - npm ci

3. Turn on full trace to see the exact cache URL, archiver, and any S3 errors:

variables:
  CI_DEBUG_TRACE: "true"

With CI_DEBUG_TRACE: "true" the log shows the signed cache URL (or its absence), the tar/zip invocation, and credential/bucket errors from the object store.

4. Check the runner config for distributed cache. On the runner host, inspect config.toml:

sudo cat /etc/gitlab-runner/config.toml

[runners.cache]
  Type = "s3"
  Shared = true
  [runners.cache.s3]
    ServerAddress = "minio.example.com"
    BucketName = "gitlab-runner-cache"
    AccessKey = "..."
    SecretKey = "..."

If [runners.cache] is missing or Type is empty, caches are local-only — the cause of No URL provided and cross-runner misses.

Step-by-Step Resolution

1. Use a stable, meaningful cache:key. Key on something that should share a cache (a branch, or a lockfile hash), never on a per-run value:

build:
  cache:
    key:
      files:
        - package-lock.json     # key changes only when deps change
    paths:
      - node_modules/

Or key per branch with a fallback to the default branch’s cache:

cache:
  key: "$CI_COMMIT_REF_SLUG"
  fallback_keys:
    - "main"                    # warm start from main's cache on new branches
  paths:
    - node_modules/

2. Configure a distributed cache so all runners share one object store. In config.toml:

[runners.cache]
  Type = "s3"
  Shared = true
  [runners.cache.s3]
    ServerAddress = "s3.amazonaws.com"
    BucketName = "my-org-ci-cache"
    BucketLocation = "us-east-1"
    AuthenticationType = "iam"   # or AccessKey/SecretKey

Restart the runner (sudo gitlab-runner restart). This fixes both No URL provided and cross-runner misses in autoscaling/multi-runner fleets.

3. Set cache:policy to match the job’s role. Producers push, consumers pull:

install-deps:
  cache:
    key: deps
    paths: [node_modules/]
    policy: pull-push          # populate the cache
  script: [npm ci]

run-tests:
  cache:
    key: deps
    paths: [node_modules/]
    policy: pull               # reuse only; faster, no re-upload
  script: [npm test]

Make sure a pull-push (or push) job runs before any pull-only consumer.

4. Clear a corrupted cache. In the UI: Pipelines > Clear runner caches (this bumps an index so old archives are ignored). Or change the cache:key to force a fresh archive. For object-store caches, you can also delete the offending object from the bucket.

5. Keep caches small. Cache dependency directories (node_modules/, .m2/, vendor/), not build outputs — large archives slow uploads and risk truncation. Use artifacts: for build outputs that must pass between stages.

Prevention and Best Practices

Key caches on a lockfile or branch, never on $CI_COMMIT_SHA/$CI_JOB_ID — those guarantee a permanent miss.
Set up distributed S3/MinIO caching for any multi-runner or autoscaling setup; local caches don’t survive across runners.
Use fallback_keys so a new branch warms its cache from main instead of starting cold.
Split producer/consumer roles with cache:policy: pull-push and pull to avoid redundant uploads.
Cache dependencies, not artifacts. Keep archives small to avoid truncation; use artifacts: for things that must reliably pass between jobs.
Clear runner caches after a corruption warning rather than chasing a phantom build break.

GitLab CI Error Guide: artifacts too large / cache upload failed — distinct issue: this guide is about restoring an existing cache failing, whereas that one covers oversized artifacts and upload-size limits.
GitLab CI Error Guide: ‘ERROR: Job failed: exit code 1’ — when a cold cache causes a downstream command to fail.
GitLab CI Error Guide: ‘Invalid CI config’ — a malformed cache: block can reject the whole config.

Frequently Asked Questions

Does Failed to extract cache fail my job? No — it’s a WARNING, not an ERROR. The job continues with an empty cache, so the visible effect is a slower job (dependencies reinstalled from scratch) rather than a red pipeline. It only becomes a failure if a later command relies on cached files that aren’t there.

Why do I see No URL provided, cache will not be downloaded? Distributed caching isn’t configured, so the cache only exists on the local runner. Configure [runners.cache] with an S3/MinIO/GCS backend in the runner’s config.toml and set Shared = true. Without it, caches can’t be shared across runners and won’t survive on ephemeral/autoscaled ones.

My cache works on one runner but not others. How do I fix it? You have local caches and multiple runners. A job that lands on a different runner won’t find a cache written by another. Set up a distributed object-store cache (S3/MinIO) so every runner reads and writes the same shared bucket.

How do I clear a corrupted cache? Go to Pipelines > Clear runner caches in the project — it bumps the cache index so stale archives are ignored. Alternatively, change the cache:key to force a new archive, or delete the bad object directly from your S3/MinIO bucket.

What’s the difference between cache and artifacts here? Cache is a best-effort speed optimization (dependencies) keyed by cache:key and may be missing; a failed restore is a warning. Artifacts are a reliable mechanism to pass files between jobs/stages via artifacts:/needs:, and a missing required artifact does fail the job. Use cache for node_modules/, artifacts for build outputs.

GitLab CI Error Guide: 'WARNING: Failed to extract cache' Cache Restore Failure

Exact Error Message

What the Error Means

Common Causes

How to Reproduce the Error

Diagnostic Commands

Step-by-Step Resolution

Prevention and Best Practices

Frequently Asked Questions

Download the Free 500-Prompt DevOps AI Toolkit

Exact Error Message

What the Error Means

Common Causes

How to Reproduce the Error

Diagnostic Commands

Step-by-Step Resolution

Prevention and Best Practices

Related Errors

Frequently Asked Questions

Download the Free 500-Prompt DevOps AI Toolkit