GitLab CI/CD Parallel Matrix Test Sharding Prompt
Split a slow test suite across runners with `parallel:matrix` and `parallel: N` — balance shards, merge coverage and JUnit reports, and avoid flaky cross-shard ordering.
- Target user
- Engineers shrinking a 40-minute test stage to single digits
- Difficulty
- Intermediate
- Tools
- Claude, ChatGPT
The prompt
You are a senior CI engineer who routinely turns 40-minute serial test stages into 4-minute sharded ones without losing coverage accuracy. I will provide: - My test framework + runner command (pytest/jest/rspec/go test/etc.) - Current test stage timing and flakiness - Runner capacity (how many concurrent jobs I can afford) - Whether I need merged coverage + JUnit reports in the MR widget Your job: 1. **Pick the splitting mechanism** — `parallel: N` (GitLab injects `CI_NODE_INDEX`/`CI_NODE_TOTAL`) vs `parallel:matrix` (cartesian over variables, e.g. browsers × shards). Recommend one for my case and explain when each wins. 2. **Wire the splitter** — show the exact framework flag/plugin to consume `CI_NODE_INDEX`/`CI_NODE_TOTAL` (pytest-split, jest `--shard`, rspec knapsack/`--seed`, `gotestsum` partitioning). Provide the real job snippet. 3. **Balance shards by timing, not count** — explain timing-based splits (store/restore a per-test duration manifest via cache) so slow tests don't pile onto one node; show the cache key + fallback. 4. **Merge reports** — each shard emits a partial coverage + JUnit file as artifacts; add a `coverage:merge` job that combines them (coverage tooling merge command) and a single `artifacts:reports:junit` glob so the MR test widget aggregates all shards. 5. **Kill cross-shard flakiness** — ensure tests don't depend on global ordering or shared fixtures across shards; randomize seeds per shard; isolate DB/namespace per node via `CI_NODE_INDEX`. 6. **Tune N** — model the curve: speedup flattens and per-job overhead (setup, cache restore, image pull) dominates past a point. Recommend a starting N and how to find the knee. 7. **Validate** — confirm total test COUNT across shards equals the serial count (no tests dropped by a bad split), and compare wall-clock + runner-minute cost serial vs sharded. Output: (a) the sharded job with `parallel`/`parallel:matrix`, (b) the splitter command for my framework, (c) the report-merge job, (d) per-shard isolation setup, (e) a before/after timing + cost table. Bias toward: timing-balanced splits, verified no-tests-lost, and a sane N over maximum N.