Taming GitLab Pipeline Concurrency: Resource Groups and Interruptible Jobs
Two deploys racing to prod, stale pipelines burning runner minutes: concurrency bugs are silent. Here is how resource_group and interruptible fix them.
- #gitlab
- #ci-cd
- #concurrency
- #performance
The bug that took me longest to diagnose wasn’t a broken job — it was two deploy jobs from two different pipelines running at the exact same time, both applying Terraform to the same environment, leaving the state file corrupted. There was no error in the YAML. The pipelines were perfect in isolation. The problem was concurrency, and GitLab has two underused features that solve it cleanly: resource_group and interruptible. They’re a handful of lines each and they’ll save you from a whole category of silent, intermittent disasters.
The two concurrency problems
There are really two distinct issues people conflate:
- Mutual exclusion. Two pipelines must not run a given job at the same time — deploys to one environment,
terraform applyagainst one state, migrations against one database. This isresource_group. - Wasted work. You push three commits in a minute; the first two pipelines are now obsolete but keep churning through runner minutes. This is
interruptible.
Solve them with different tools. Mixing them up is where people get stuck.
resource_group: a mutex for jobs
Add resource_group to a job and GitLab guarantees only one instance of jobs in that group runs at a time across all pipelines. Others queue.
deploy-production:
stage: deploy
resource_group: production
script:
- terraform apply -auto-approve
environment:
name: production
rules:
- if: $CI_COMMIT_BRANCH == $CI_DEFAULT_BRANCH
Now if two pipelines both reach deploy-production, the second waits for the first to finish. No more racing applies, no more corrupted state. The group name is just a string you choose — anything sharing a name shares the lock.
Pro Tip: Name the resource group after the thing being protected, not the job. Use production or terraform-state-prod, not deploy-production. Then a separate migrate-production job can join the same group and serialize against the deploy too, which is usually what you want.
Process modes: oldest, newest, or in-order
By default resource_group processes waiting jobs in an unspecified order. For deploys you usually want the newest to win — if three deploys queue up, deploying the oldest commit last is backwards. Set the process mode via the API or, more practically, design around it: use interruptible to cancel the stale pipelines so they never reach the queue. The two features complement each other.
interruptible: kill stale work automatically
interruptible: true marks a job as safe to cancel if a newer pipeline starts on the same ref. Combined with the project setting “Auto-cancel redundant pipelines”, GitLab cancels the obsolete runs:
.interruptible-defaults:
interruptible: true
build:
extends: .interruptible-defaults
stage: build
script:
- make build
test:
extends: .interruptible-defaults
stage: test
script:
- make test
Push five commits fast, and only the latest pipeline survives — the rest get canceled the moment a newer one starts. On a busy repo this reclaims an enormous amount of runner time.
The critical safety rule for interruptible
Here is the line that matters: a job is only interruptible if it’s safe to cancel mid-run. Build and test? Almost always safe. A terraform apply or a database migration? Absolutely not — canceling those mid-flight leaves the world half-changed.
So the pattern is: interruptible on everything up to and including the last idempotent, abortable stage, and not interruptible on deploys and migrations.
deploy-production:
stage: deploy
interruptible: false # never cancel a deploy mid-flight
resource_group: production
script:
- ./deploy.sh
GitLab also stops auto-canceling a pipeline once a non-interruptible job has started, which protects you — but only if you’ve correctly marked the deploy as interruptible: false. Get this wrong and you can cancel a half-finished production change. AI gets this exactly backwards with alarming frequency, so this is a line I always hand-check.
Combining both for a safe, lean pipeline
The full pattern on a real deploy pipeline:
stages: [build, test, deploy]
build:
stage: build
interruptible: true
script: ["make build"]
test:
stage: test
interruptible: true
script: ["make test"]
deploy-production:
stage: deploy
interruptible: false
resource_group: production
script: ["./deploy.sh"]
environment:
name: production
rules:
- if: $CI_COMMIT_BRANCH == $CI_DEFAULT_BRANCH
when: manual
Build and test get auto-canceled when superseded, saving minutes. The deploy is mutually exclusive (resource group) and never canceled mid-run (not interruptible). That combination is the sweet spot.
Concurrency limits at the runner and group level
resource_group and interruptible operate inside a project, but concurrency also has a fleet dimension. A shared runner has a concurrent setting that caps how many jobs run at once across all projects; set it too high and jobs thrash the host, too low and pipelines queue needlessly. That tuning lives in the runner’s config.toml, not your .gitlab-ci.yml, and it’s worth coordinating with whoever owns the runners.
Within a job you can also bound parallelism deliberately. A parallel: matrix that fans out to fifty jobs will happily saturate your runner fleet and starve every other team’s pipeline. When a job is greedy, I cap it and let it queue:
load-test:
stage: test
parallel: 4
resource_group: load-test-cluster
script:
- ./run-load-shard.sh "$CI_NODE_INDEX" "$CI_NODE_TOTAL"
Here four shards run, but the shared resource_group ensures they don’t collide with another pipeline’s load test against the same cluster. The interplay is the point: parallel: controls fan-out within a job, resource_group controls exclusivity across pipelines, and the runner’s concurrent setting is the global ceiling. Reason about all three together, or you’ll tune one and get surprised by another. AI can explain each in isolation but rarely reasons about their interaction correctly, so I sketch the desired behavior myself and use it only to draft the YAML.
How AI helps here
This is a great fast-junior-engineer task with one sharp caveat. AI is excellent at:
- Adding
interruptible: trueacross your build/test jobs via anextendsanchor. - Spotting jobs that share a protected resource and suggesting a common
resource_group. - Explaining the queueing behavior.
It is dangerously unreliable at the one thing that matters most: it will happily mark a terraform apply or migration as interruptible: true, which is precisely the mistake that corrupts state. So every AI suggestion here gets read with one question in mind — is this job actually safe to cancel mid-run? If the answer is no, I override it to false. Never delegate that judgment.
And the standing rule on any deploy pipeline: don’t hand AI your CI secrets. Share the YAML structure and behavior, never the kubeconfig, state-backend credentials, or tokens. For a careful second read of a concurrency-sensitive diff, the code review dashboard is worth the pass before merge.
My reusable prompt: “Add GitLab CI concurrency controls to this pipeline. Mark build/test jobs interruptible, give the production deploy a resource_group, and set the deploy interruptible: false. Explicitly flag any job you’re unsure is safe to cancel mid-run.” That last clause forces the model to surface the risky calls. More variants live in my prompt library and the platform prompt packs.
Conclusion
Concurrency bugs are silent until they corrupt something. resource_group is a mutex that serializes jobs touching a shared resource; interruptible reclaims runner minutes by canceling stale work — but only on jobs that are genuinely safe to cancel. Let AI add the boilerplate, then personally decide which jobs are abortable and keep your secrets out of the chat. More guides in the GitLab CI/CD category.
Download the Free 500-Prompt DevOps AI Toolkit
500 battle-tested, copy-paste AI prompts engineered by a senior systems engineer — every one with fill-in placeholders and safety/back-out notes. Drop your email and it's yours.
- 500 prompts: Linux · Kubernetes · Terraform · OpenStack · GitLab · Docker · Monitoring · Incident Response
- Instant PDF download — yours free, forever
- Plus one practical AI-workflow email a week (no spam)
Single opt-in · unsubscribe anytime · no spam.