Skip to content
CloudOps
Newsletter
All guides
AI for GitLab CI/CD By James Joyner IV · · 9 min read

Taming GitLab Pipeline Concurrency: Resource Groups and Interruptible Jobs

Two deploys racing to prod, stale pipelines burning runner minutes: concurrency bugs are silent. Here is how resource_group and interruptible fix them.

  • #gitlab
  • #ci-cd
  • #concurrency
  • #performance

The bug that took me longest to diagnose wasn’t a broken job — it was two deploy jobs from two different pipelines running at the exact same time, both applying Terraform to the same environment, leaving the state file corrupted. There was no error in the YAML. The pipelines were perfect in isolation. The problem was concurrency, and GitLab has two underused features that solve it cleanly: resource_group and interruptible. They’re a handful of lines each and they’ll save you from a whole category of silent, intermittent disasters.

The two concurrency problems

There are really two distinct issues people conflate:

  1. Mutual exclusion. Two pipelines must not run a given job at the same time — deploys to one environment, terraform apply against one state, migrations against one database. This is resource_group.
  2. Wasted work. You push three commits in a minute; the first two pipelines are now obsolete but keep churning through runner minutes. This is interruptible.

Solve them with different tools. Mixing them up is where people get stuck.

resource_group: a mutex for jobs

Add resource_group to a job and GitLab guarantees only one instance of jobs in that group runs at a time across all pipelines. Others queue.

deploy-production:
  stage: deploy
  resource_group: production
  script:
    - terraform apply -auto-approve
  environment:
    name: production
  rules:
    - if: $CI_COMMIT_BRANCH == $CI_DEFAULT_BRANCH

Now if two pipelines both reach deploy-production, the second waits for the first to finish. No more racing applies, no more corrupted state. The group name is just a string you choose — anything sharing a name shares the lock.

Pro Tip: Name the resource group after the thing being protected, not the job. Use production or terraform-state-prod, not deploy-production. Then a separate migrate-production job can join the same group and serialize against the deploy too, which is usually what you want.

Process modes: oldest, newest, or in-order

By default resource_group processes waiting jobs in an unspecified order. For deploys you usually want the newest to win — if three deploys queue up, deploying the oldest commit last is backwards. Set the process mode via the API or, more practically, design around it: use interruptible to cancel the stale pipelines so they never reach the queue. The two features complement each other.

interruptible: kill stale work automatically

interruptible: true marks a job as safe to cancel if a newer pipeline starts on the same ref. Combined with the project setting “Auto-cancel redundant pipelines”, GitLab cancels the obsolete runs:

.interruptible-defaults:
  interruptible: true

build:
  extends: .interruptible-defaults
  stage: build
  script:
    - make build

test:
  extends: .interruptible-defaults
  stage: test
  script:
    - make test

Push five commits fast, and only the latest pipeline survives — the rest get canceled the moment a newer one starts. On a busy repo this reclaims an enormous amount of runner time.

The critical safety rule for interruptible

Here is the line that matters: a job is only interruptible if it’s safe to cancel mid-run. Build and test? Almost always safe. A terraform apply or a database migration? Absolutely not — canceling those mid-flight leaves the world half-changed.

So the pattern is: interruptible on everything up to and including the last idempotent, abortable stage, and not interruptible on deploys and migrations.

deploy-production:
  stage: deploy
  interruptible: false   # never cancel a deploy mid-flight
  resource_group: production
  script:
    - ./deploy.sh

GitLab also stops auto-canceling a pipeline once a non-interruptible job has started, which protects you — but only if you’ve correctly marked the deploy as interruptible: false. Get this wrong and you can cancel a half-finished production change. AI gets this exactly backwards with alarming frequency, so this is a line I always hand-check.

Combining both for a safe, lean pipeline

The full pattern on a real deploy pipeline:

stages: [build, test, deploy]

build:
  stage: build
  interruptible: true
  script: ["make build"]

test:
  stage: test
  interruptible: true
  script: ["make test"]

deploy-production:
  stage: deploy
  interruptible: false
  resource_group: production
  script: ["./deploy.sh"]
  environment:
    name: production
  rules:
    - if: $CI_COMMIT_BRANCH == $CI_DEFAULT_BRANCH
      when: manual

Build and test get auto-canceled when superseded, saving minutes. The deploy is mutually exclusive (resource group) and never canceled mid-run (not interruptible). That combination is the sweet spot.

Concurrency limits at the runner and group level

resource_group and interruptible operate inside a project, but concurrency also has a fleet dimension. A shared runner has a concurrent setting that caps how many jobs run at once across all projects; set it too high and jobs thrash the host, too low and pipelines queue needlessly. That tuning lives in the runner’s config.toml, not your .gitlab-ci.yml, and it’s worth coordinating with whoever owns the runners.

Within a job you can also bound parallelism deliberately. A parallel: matrix that fans out to fifty jobs will happily saturate your runner fleet and starve every other team’s pipeline. When a job is greedy, I cap it and let it queue:

load-test:
  stage: test
  parallel: 4
  resource_group: load-test-cluster
  script:
    - ./run-load-shard.sh "$CI_NODE_INDEX" "$CI_NODE_TOTAL"

Here four shards run, but the shared resource_group ensures they don’t collide with another pipeline’s load test against the same cluster. The interplay is the point: parallel: controls fan-out within a job, resource_group controls exclusivity across pipelines, and the runner’s concurrent setting is the global ceiling. Reason about all three together, or you’ll tune one and get surprised by another. AI can explain each in isolation but rarely reasons about their interaction correctly, so I sketch the desired behavior myself and use it only to draft the YAML.

How AI helps here

This is a great fast-junior-engineer task with one sharp caveat. AI is excellent at:

  • Adding interruptible: true across your build/test jobs via an extends anchor.
  • Spotting jobs that share a protected resource and suggesting a common resource_group.
  • Explaining the queueing behavior.

It is dangerously unreliable at the one thing that matters most: it will happily mark a terraform apply or migration as interruptible: true, which is precisely the mistake that corrupts state. So every AI suggestion here gets read with one question in mind — is this job actually safe to cancel mid-run? If the answer is no, I override it to false. Never delegate that judgment.

And the standing rule on any deploy pipeline: don’t hand AI your CI secrets. Share the YAML structure and behavior, never the kubeconfig, state-backend credentials, or tokens. For a careful second read of a concurrency-sensitive diff, the code review dashboard is worth the pass before merge.

My reusable prompt: “Add GitLab CI concurrency controls to this pipeline. Mark build/test jobs interruptible, give the production deploy a resource_group, and set the deploy interruptible: false. Explicitly flag any job you’re unsure is safe to cancel mid-run.” That last clause forces the model to surface the risky calls. More variants live in my prompt library and the platform prompt packs.

Conclusion

Concurrency bugs are silent until they corrupt something. resource_group is a mutex that serializes jobs touching a shared resource; interruptible reclaims runner minutes by canceling stale work — but only on jobs that are genuinely safe to cancel. Let AI add the boilerplate, then personally decide which jobs are abortable and keep your secrets out of the chat. More guides in the GitLab CI/CD category.

Free download · 368-page PDF

Download the Free 500-Prompt DevOps AI Toolkit

500 battle-tested, copy-paste AI prompts engineered by a senior systems engineer — every one with fill-in placeholders and safety/back-out notes. Drop your email and it's yours.

  • 500 prompts: Linux · Kubernetes · Terraform · OpenStack · GitLab · Docker · Monitoring · Incident Response
  • Instant PDF download — yours free, forever
  • Plus one practical AI-workflow email a week (no spam)

Single opt-in · unsubscribe anytime · no spam.