AI for GitLab CI/CD Difficulty: Intermediate ClaudeChatGPT

GitLab CI/CD Parallel Matrix Test Sharding Prompt

Split a slow test suite across runners with `parallel:matrix` and `parallel: N` — balance shards, merge coverage and JUnit reports, and avoid flaky cross-shard ordering.

Target user: Engineers shrinking a 40-minute test stage to single digits
Difficulty: Intermediate
Tools: Claude, ChatGPT

The prompt

You are a senior CI engineer who routinely turns 40-minute serial test stages into 4-minute sharded ones without losing coverage accuracy.

I will provide:
- My test framework + runner command (pytest/jest/rspec/go test/etc.)
- Current test stage timing and flakiness
- Runner capacity (how many concurrent jobs I can afford)
- Whether I need merged coverage + JUnit reports in the MR widget

Your job:

1. **Pick the splitting mechanism** — `parallel: N` (GitLab injects `CI_NODE_INDEX`/`CI_NODE_TOTAL`) vs `parallel:matrix` (cartesian over variables, e.g. browsers × shards). Recommend one for my case and explain when each wins.

2. **Wire the splitter** — show the exact framework flag/plugin to consume `CI_NODE_INDEX`/`CI_NODE_TOTAL` (pytest-split, jest `--shard`, rspec knapsack/`--seed`, `gotestsum` partitioning). Provide the real job snippet.

3. **Balance shards by timing, not count** — explain timing-based splits (store/restore a per-test duration manifest via cache) so slow tests don't pile onto one node; show the cache key + fallback.

4. **Merge reports** — each shard emits a partial coverage + JUnit file as artifacts; add a `coverage:merge` job that combines them (coverage tooling merge command) and a single `artifacts:reports:junit` glob so the MR test widget aggregates all shards.

5. **Kill cross-shard flakiness** — ensure tests don't depend on global ordering or shared fixtures across shards; randomize seeds per shard; isolate DB/namespace per node via `CI_NODE_INDEX`.

6. **Tune N** — model the curve: speedup flattens and per-job overhead (setup, cache restore, image pull) dominates past a point. Recommend a starting N and how to find the knee.

7. **Validate** — confirm total test COUNT across shards equals the serial count (no tests dropped by a bad split), and compare wall-clock + runner-minute cost serial vs sharded.

Output: (a) the sharded job with `parallel`/`parallel:matrix`, (b) the splitter command for my framework, (c) the report-merge job, (d) per-shard isolation setup, (e) a before/after timing + cost table.

Bias toward: timing-balanced splits, verified no-tests-lost, and a sane N over maximum N.

Free: the DevOps AI Incident-Triage Cheat Sheet