Top 25 GitLab CI/CD Pipeline Mistakes (and How to Avoid

The 25 most common GitLab CI/CD pipeline mistakes fall into five buckets: security (leaked secrets, over-privileged job tokens, no scanning), performance and cost (no caching, floating image tags, no interruptible), reliability (no timeouts, blind retries, no rollback path), maintainability (only/except sprawl, monolithic jobs, unreviewed pipeline code), and workflow (no rules:, deploying from feature branches, ignoring merge-request pipelines). Almost every painful .gitlab-ci.yml I have inherited makes a handful of these at once, and each one is cheap to fix once you can name it. Below are all 25, grouped by theme, with the exact YAML I use to fix them.

I have spent years cleaning up other people’s pipelines, and the failure modes rhyme. None of these require a platform migration or a new tool — they are config changes you can ship in an afternoon. Work through them in order of blast radius: secrets and privilege first, then cost and reliability, then the slow-burn maintainability problems.

Security mistakes

Security mistakes in CI are the worst kind because the pipeline runs with credentials a developer would never get directly. Fix these first.

1. Hardcoding secrets in `.gitlab-ci.yml` or CI variables

Putting a token, password, or kubeconfig directly in your YAML — or in a plain (non-masked, non-protected) CI/CD variable — means it lives in git history forever and shows up in any job log. Use masked, protected variables for static secrets, and prefer short-lived credentials via OIDC so nothing long-lived is stored at all.

# Bad: secret baked into the file
deploy:
  script:
    - curl -H "Authorization: Bearer glpat-xxxxxxxxxxxx" https://api.internal/deploy

# Good: injected from a masked + protected CI/CD variable
deploy:
  script:
    - curl -H "Authorization: Bearer $DEPLOY_TOKEN" https://api.internal/deploy

Mark the variable Masked and Protected in Settings → CI/CD → Variables so it never prints and never reaches unprotected branches. For cloud credentials, see GitLab CI secrets management with OIDC — it removes the static key entirely.

2. Over-privileged `CI_JOB_TOKEN`

The job token is convenient, but by default it can be configured to access more projects and APIs than a single job needs. Lock down the token’s allowlist so a compromised job in one repo cannot pull source or trigger pipelines across your whole group.

In Settings → CI/CD → Token Access, set inbound access to only the projects that legitimately call this one, and disable the broad “All groups and projects” option. Treat CI_JOB_TOKEN like any other credential with a least-privilege scope.

3. Running every job in a `privileged: true` runner

A privileged runner shares the host kernel namespaces and can trivially escape the container. People enable it once for Docker-in-Docker and then leave it on for the whole fleet. Run only your image-build jobs on a tagged privileged runner and keep everything else on an unprivileged one.

build-image:
  tags: [dind-privileged]   # only this job lands on the privileged runner
  services: [docker:dind]
  script:
    - docker build -t $CI_REGISTRY_IMAGE:$CI_COMMIT_SHA .

unit-tests:
  tags: [shared-unprivileged]
  script:
    - make test

Better yet, replace DinD entirely with a rootless builder like Kaniko or BuildKit so you never need privilege at all.

4. No SAST or dependency scanning

If your pipeline does not run static analysis and dependency scanning, you are shipping known-vulnerable code and CVEs into production blind. GitLab ships templates that wire this in with one include:.

include:
  - template: Jobs/SAST.gitlab-ci.yml
  - template: Jobs/Dependency-Scanning.gitlab-ci.yml
  - template: Jobs/Secret-Detection.gitlab-ci.yml

The findings surface directly in the merge request. For a tuning walkthrough see using AI to harden GitLab CI security scanning.

5. Echoing secrets into job logs

Even a masked variable leaks if you transform it — base64, concatenation, or piping it through a command that prints its arguments. A masked variable only matches the exact stored string. Never echo a secret, never run set -x in a block that touches one, and avoid passing secrets as CLI flags that show up in process listings.

# Bad: derived value defeats masking and prints in the log
script:
  - echo "Bearer $(echo -n $TOKEN | base64)"

# Good: keep it in the environment, never print it
script:
  - apply --token-env DEPLOY_TOKEN

6. Shell injection via unquoted variables

CI variables can contain attacker-controlled values (branch names, MR titles, webhook payloads). Unquoted interpolation in a shell script lets that content run as code. Quote every expansion and never eval user-influenced input.

# Bad: a branch named "; rm -rf /" is now your problem
script:
  - deploy --env=$CI_COMMIT_REF_NAME

# Good
script:
  - deploy --env="$CI_COMMIT_REF_NAME"

Performance and cost mistakes

Slow pipelines burn runner minutes and developer patience. These five are where most of the waste hides.

7. Floating image tags like `image: latest`

image: node:latest makes builds non-reproducible — the same commit produces different results next week, and a breaking upstream change lands in your pipeline without a single line of your code changing. Pin to a specific tag, and ideally a digest, so builds are deterministic.

# Bad
image: node:latest

# Good
image: node:20.11.1-bookworm
# Best: immutable digest
image: node:20.11.1-bookworm@sha256:abc123...

8. No caching (or a cache key that never hits)

Reinstalling dependencies from scratch on every job adds minutes and external bandwidth. The classic failure is a cache key so generic it never matches, or so specific it never reuses. Key the cache on your lockfile and scope the paths tightly.

test:
  cache:
    key:
      files:
        - package-lock.json
    paths:
      - .npm/
  script:
    - npm ci --cache .npm --prefer-offline
    - npm test

There is a lot of nuance here — fallback keys, policy: pull for read-only jobs, cache-vs-artifacts. The full treatment is in GitLab CI caching strategies: a deep dive.

9. No `interruptible`, so superseded pipelines keep running

When you push three commits in a row, the first two pipelines are already obsolete but keep consuming runners. Marking jobs interruptible: true (with auto-cancel-redundant enabled) cancels stale pipelines automatically.

default:
  interruptible: true   # applies to all jobs

deploy:
  interruptible: false  # opt deploys back out so they finish

10. Pulling images without a registry mirror

Every job that pulls from Docker Hub competes for a shared rate limit and pays a latency tax. Point your runners at a pull-through cache or your GitLab Dependency Proxy so images come from a warm, local mirror.

build:
  image: ${CI_DEPENDENCY_PROXY_GROUP_IMAGE_PREFIX}/node:20.11.1
  script:
    - npm ci

This alone can cut cold-start time dramatically and immunizes you from Docker Hub rate-limit outages.

11. No artifact expiry

Artifacts default to staying around far longer than they are useful, and they count against your storage quota. Set expire_in on every artifact and keep only what a human or downstream job actually needs.

build:
  artifacts:
    paths: [dist/]
    expire_in: 1 week
    when: on_success

Reserve long retention for release builds; everything else can expire in hours or days. Pair this with a container registry cleanup policy so old images do not pile up either.

Reliability mistakes

A pipeline that lies about its own health is worse than no pipeline. These mistakes erode trust in green checkmarks.

12. No `timeout` on jobs

A hung job — a flaky network call, a deadlocked test — will sit consuming a runner until the project-wide default kills it, often an hour later. Set an aggressive per-job timeout so failures fail fast.

integration-tests:
  timeout: 15 minutes
  script:
    - make integration

13. `allow_failure: true` hiding real failures

allow_failure is meant for genuinely optional jobs, but it gets sprinkled onto flaky tests to make pipelines “go green.” Now real regressions pass silently. Reserve it for advisory jobs and fix the flake instead.

# Acceptable: a non-blocking lint advisory
spell-check:
  allow_failure: true
  script: [make spellcheck]

# Not acceptable: hiding a broken test suite
unit-tests:
  allow_failure: true   # delete this and fix the tests

14. Blind `retry:` masking flaky tests

Setting retry: 2 on a job without scoping the failure type re-runs everything — including legitimate failures — and hides flakiness that should be fixed. Scope retries to infrastructure errors only.

e2e:
  retry:
    max: 2
    when:
      - runner_system_failure
      - stuck_or_timeout_failure

That retries genuine infra hiccups but lets a real assertion failure fail immediately.

15. No environment or rollback path

Deploying with a bare kubectl apply and no environment: block means GitLab has no record of what is running where, and no one-click rollback. Declare environments so deploys are tracked and reversible.

deploy-prod:
  stage: deploy
  environment:
    name: production
    url: https://app.example.com
  script:
    - ./deploy.sh

With this, GitLab’s environment page shows the deploy history and offers a re-deploy of any prior commit as your rollback.

16. No `resource_group` on deploys

Without a resource group, two deploy jobs to the same environment can run concurrently and clobber each other’s state — especially painful with Terraform. A resource group serializes them.

deploy-prod:
  resource_group: production   # only one prod deploy runs at a time
  environment: production
  script: [./deploy.sh]

I cover this alongside interruptible in taming GitLab pipeline concurrency.

17. Giant monolithic jobs

A single 400-line build-test-deploy job is impossible to debug, cache, or parallelize — one failure throws away all the work before it. Split work into focused jobs per stage so failures are isolated and re-runnable.

stages: [build, test, deploy]

build:  { stage: build,  script: [make build] }
lint:   { stage: test,   script: [make lint] }
test:   { stage: test,   script: [make test] }
deploy: { stage: deploy, script: [make deploy] }

Maintainability mistakes

These do not break anything today. They make your pipeline impossible to change six months from now.

18. Using `only/except` instead of `rules`

only/except is legacy, cannot be combined cleanly, and does not support changes plus if in one place. rules: is the modern, composable replacement and is what every new GitLab feature targets.

# Bad: legacy, hard to compose
deploy:
  only: [main]

# Good: explicit, extensible
deploy:
  rules:
    - if: '$CI_COMMIT_BRANCH == "main"'
      when: on_success
    - when: never

See mastering rules:changes for path-scoped pipelines for the powerful changes: patterns.

19. No `needs:` / DAG, so everything runs stage-by-stage

Stage-based execution forces every job in a stage to finish before the next stage starts, even when there is no real dependency. A needs: DAG lets independent chains run as soon as their inputs are ready.

unit-tests:
  stage: test
  needs: [build]        # starts the instant build finishes

deploy:
  stage: deploy
  needs: [unit-tests, integration-tests]

This often shaves whole minutes off wall-clock time. More in optimizing GitLab pipeline DAGs with needs.

20. No `.gitlab-ci.yml` validation or lint in the workflow

Pushing a YAML typo means waiting for the pipeline to fail before you find out. Validate locally and in a pre-merge job. GitLab exposes a CI Lint API and the glab CLI does it offline.

glab ci lint                       # validate the current file
# or hit the API:
# POST /projects/:id/ci/lint

Add a fast validate-ci job that runs on every MR so a broken config never reaches main.

21. Copy-pasted job definitions instead of `extends`/`!reference`

When the same five lines appear in eight jobs, every change is an eight-way edit and they drift. Factor shared config into a hidden .template job and extends it.

.docker-job:
  image: docker:24
  services: [docker:dind]
  before_script: [docker login -u "$CI_REGISTRY_USER" -p "$CI_REGISTRY_PASSWORD" "$CI_REGISTRY"]

build:
  extends: .docker-job
  script: [docker build -t "$CI_REGISTRY_IMAGE" .]

For cross-project reuse, publish reusable CI/CD catalog components.

22. No review process for pipeline-as-code

.gitlab-ci.yml is production infrastructure, but teams let it merge without the scrutiny they apply to application code. Protect the file with a CODEOWNERS rule so a platform engineer reviews every pipeline change.

# CODEOWNERS
/.gitlab-ci.yml      @platform-team
/ci/                 @platform-team

Combined with required approvals on protected branches, no one ships a privilege escalation or a leaked-secret pattern unreviewed.

Workflow mistakes

The last bucket is about how pipelines fit into how your team actually ships.

23. Running the full pipeline on every commit (no `rules:` scoping)

Building the whole monorepo and running every test on a one-line docs change wastes minutes and money. Scope jobs to the paths they care about with rules:changes.

backend-tests:
  rules:
    - if: '$CI_PIPELINE_SOURCE == "merge_request_event"'
      changes: [backend/**/*]
  script: [make -C backend test]

For monorepos, dynamic child pipelines take this further — see GitLab monorepo pipelines with child pipelines and rules.

24. Deploying to production from feature branches without a manual gate

If any branch can trigger a production deploy, one mis-scoped rules: entry is an outage. Gate production behind a protected environment plus a manual when: manual action so a human deliberately promotes.

deploy-prod:
  stage: deploy
  environment: production
  rules:
    - if: '$CI_COMMIT_BRANCH == "main"'
      when: manual            # explicit click to ship
  script: [./deploy.sh]

Combine this with deployment approval gates and protected environments so only authorized users can press the button.

25. Ignoring merge-request pipelines (and double pipelines)

Running branch pipelines instead of merge-request pipelines means your tests never see the merge result, and the classic only config triggers two redundant pipelines per push. Use workflow:rules to run MR pipelines and suppress duplicates.

workflow:
  rules:
    - if: '$CI_PIPELINE_SOURCE == "merge_request_event"'
    - if: '$CI_COMMIT_BRANCH && $CI_OPEN_MERGE_REQUESTS'
      when: never            # no duplicate branch pipeline
    - if: '$CI_COMMIT_BRANCH'

Now tests run against the merged code reviewers will actually approve, and you pay for one pipeline, not two.

How AI helps catch these

The honest problem with a list of 25 mistakes is that you will not remember all 25 while reviewing a 300-line YAML file at 5pm on a Friday. That is exactly the kind of pattern-matching an AI assistant is good at. Paste your .gitlab-ci.yml into the code review dashboard and ask it to audit against this list — it will flag the floating tags, the missing interruptible, the unquoted variables, and the allow_failure that is hiding a broken test, with the specific line and a suggested fix.

I keep a set of reusable GitLab CI prompts for exactly this: “review this pipeline for security mistakes,” “convert this only/except to rules:,” “add a needs: DAG to this stage-based pipeline.” If you want a curated, ready-to-run bundle, the prompt packs include a GitLab CI hardening set. Any capable model works — I run these through both Claude and ChatGPT depending on the task, and the output is consistently good enough to catch the obvious offenders before a human ever opens the MR.

The workflow that sticks: AI does the first pass against the checklist, a human reviews the diff, and CODEOWNERS makes sure that human is on the platform team. You get breadth from the model and judgment from the person.

FAQ

What is the single most damaging GitLab CI mistake? Hardcoded long-lived secrets. They leak into git history and logs, survive forever, and grant standing access to anyone who reads them. Move to masked, protected variables immediately and to OIDC short-lived credentials as soon as you can.

Should I use rules: or only/except? Always rules: for new work. It is the actively developed mechanism, supports combining if, changes, and exists, and integrates with workflow:rules. only/except is legacy and will not receive new capabilities — migrate existing jobs as you touch them.

How do I stop redundant pipelines from running? Add a workflow:rules block that runs merge-request pipelines and explicitly sets when: never for branch pipelines when an open MR exists. That kills the classic double-pipeline and ensures tests run against the merged result.

Is image: latest really that bad? Yes. It makes builds non-reproducible and lets upstream changes break your pipeline with no commit of your own. Pin to a specific version tag, and use a digest for builds you need to reproduce exactly months later.

How can AI realistically help with pipeline review? Treat it as a tireless first-pass reviewer. Paste the YAML, ask it to audit against a known checklist of security, cost, and reliability mistakes, and let it flag specific lines. It is excellent at catching the mechanical issues — floating tags, missing timeouts, unquoted variables — so your human reviewers can focus on intent and architecture.

Conclusion

None of these 25 mistakes require a re-platform. They are config changes you can land incrementally, and the payoff compounds: faster pipelines, lower runner bills, fewer 2am rollbacks, and a .gitlab-ci.yml your team can still understand next year. Start with the security bucket, then knock out caching and interruptible for the quick cost wins, and wire an AI review pass into your MR flow so you stop re-introducing the ones you just fixed. For more depth on any single topic, the GitLab CI/CD category has a focused deep dive on each.

Top 25 GitLab CI/CD Pipeline Mistakes (and How to Avoid Them)

Security mistakes

1. Hardcoding secrets in `.gitlab-ci.yml` or CI variables

2. Over-privileged `CI_JOB_TOKEN`

3. Running every job in a `privileged: true` runner

4. No SAST or dependency scanning

5. Echoing secrets into job logs

6. Shell injection via unquoted variables

Performance and cost mistakes

7. Floating image tags like `image: latest`

8. No caching (or a cache key that never hits)

9. No `interruptible`, so superseded pipelines keep running

10. Pulling images without a registry mirror

11. No artifact expiry

Reliability mistakes

12. No `timeout` on jobs

13. `allow_failure: true` hiding real failures

14. Blind `retry:` masking flaky tests

15. No environment or rollback path

16. No `resource_group` on deploys

17. Giant monolithic jobs

Maintainability mistakes

18. Using `only/except` instead of `rules`

19. No `needs:` / DAG, so everything runs stage-by-stage

20. No `.gitlab-ci.yml` validation or lint in the workflow

21. Copy-pasted job definitions instead of `extends`/`!reference`

22. No review process for pipeline-as-code

Workflow mistakes

23. Running the full pipeline on every commit (no `rules:` scoping)

24. Deploying to production from feature branches without a manual gate

25. Ignoring merge-request pipelines (and double pipelines)

How AI helps catch these

FAQ

Conclusion

Download the Free 500-Prompt DevOps AI Toolkit

Security mistakes

1. Hardcoding secrets in .gitlab-ci.yml or CI variables

2. Over-privileged CI_JOB_TOKEN

3. Running every job in a privileged: true runner

4. No SAST or dependency scanning

5. Echoing secrets into job logs

6. Shell injection via unquoted variables

Performance and cost mistakes

7. Floating image tags like image: latest

8. No caching (or a cache key that never hits)

9. No interruptible, so superseded pipelines keep running

10. Pulling images without a registry mirror

11. No artifact expiry

Reliability mistakes

12. No timeout on jobs

13. allow_failure: true hiding real failures

14. Blind retry: masking flaky tests

15. No environment or rollback path

16. No resource_group on deploys

17. Giant monolithic jobs

Maintainability mistakes

18. Using only/except instead of rules

19. No needs: / DAG, so everything runs stage-by-stage

20. No .gitlab-ci.yml validation or lint in the workflow

21. Copy-pasted job definitions instead of extends/!reference

22. No review process for pipeline-as-code

Workflow mistakes

23. Running the full pipeline on every commit (no rules: scoping)

24. Deploying to production from feature branches without a manual gate

25. Ignoring merge-request pipelines (and double pipelines)

How AI helps catch these

FAQ

Conclusion

Download the Free 500-Prompt DevOps AI Toolkit

1. Hardcoding secrets in `.gitlab-ci.yml` or CI variables

2. Over-privileged `CI_JOB_TOKEN`

3. Running every job in a `privileged: true` runner

7. Floating image tags like `image: latest`

9. No `interruptible`, so superseded pipelines keep running

12. No `timeout` on jobs

13. `allow_failure: true` hiding real failures

14. Blind `retry:` masking flaky tests

16. No `resource_group` on deploys

18. Using `only/except` instead of `rules`

19. No `needs:` / DAG, so everything runs stage-by-stage

20. No `.gitlab-ci.yml` validation or lint in the workflow

21. Copy-pasted job definitions instead of `extends`/`!reference`

23. Running the full pipeline on every commit (no `rules:` scoping)