API Fuzz and Coverage-Guided Testing in GitLab CI

Every test I write encodes an input I already thought of. The bug that takes down production is, almost by definition, the input I didn’t think of — the empty array, the negative length, the 10MB header, the Unicode that breaks the parser. Fuzz testing exists to generate those inputs automatically and throw them at your code until something breaks. GitLab has built-in support for both API fuzzing and coverage-guided fuzzing, and wiring it into a pipeline is more approachable than its reputation suggests. Here’s how I set it up, and where AI is a genuine accelerator versus a liability.

Two kinds of fuzzing, two different jobs

Don’t conflate them:

API fuzzing drives your running HTTP API with malformed and unexpected requests, derived from an OpenAPI spec, a HAR file, or a Postman collection. It tests the deployed surface.
Coverage-guided fuzzing runs inside your code with an instrumented harness, mutating inputs and using code-coverage feedback to reach new branches. It tests functions directly.

API fuzzing finds the request that 500s your endpoint. Coverage fuzzing finds the parser that panics on byte sequence 0xDEAD. You want both, eventually; start with whichever matches your biggest risk.

API fuzzing from an OpenAPI spec

The GitLab template does the heavy lifting; you point it at your spec and your running target:

include:
  - template: "Security/API-Fuzzing.gitlab-ci.yml"

variables:
  FUZZAPI_OPENAPI: "openapi.json"
  FUZZAPI_TARGET_URL: "https://api-under-test.internal.example.com"

The included apifuzzer_fuzz job reads openapi.json, generates a barrage of off-nominal requests for every endpoint, and reports anything that returns a server error, hangs, or otherwise misbehaves as a vulnerability on the MR. The results render in the security widget just like SAST or DAST.

The prerequisite is a running target. In CI that usually means standing your app up first — often as a service or in a prior job — and pointing FUZZAPI_TARGET_URL at it. A throwaway test instance, never production.

Pro Tip: Keep your OpenAPI spec accurate and in the repo. API fuzzing is only as thorough as the spec it reads — endpoints you forgot to document are endpoints it won’t fuzz. This is a great reason to generate the spec from code rather than hand-maintaining it.

Coverage-guided fuzzing with a harness

Coverage fuzzing needs a small harness that feeds fuzzer-generated bytes into the function under test. GitLab supports engines like libFuzzer (via go-fuzz, cargo-fuzz, etc.). A Go example:

include:
  - template: "Security/Coverage-Fuzzing.gitlab-ci.yml"

fuzz:
  extends: .fuzz_base
  image: golang:1.23
  script:
    - go install gitlab.com/gitlab-org/security-products/analyzers/fuzzers/go-fuzz/...@latest
    - go-fuzz-build -libfuzzer -o fuzz.a ./parser
    - clang -fsanitize=fuzzer fuzz.a -o fuzzer
    - ./gitlab-cov-fuzz run --regression="$REGRESSION" -- ./fuzzer

The --regression flag is the trick that makes fuzzing CI-friendly. In regression mode, the job re-runs only the previously discovered crashing inputs (the corpus) quickly, instead of fuzzing for hours. You fuzz long-form on a schedule and gate MRs on regression — fast and still protective.

Time-boxing so fuzzing fits a pipeline

Fuzzing is unbounded by nature; a pipeline is not. Time-box it:

fuzz-nightly:
  extends: .fuzz_base
  variables:
    COVFUZZ_ADDITIONAL_ARGS: "-max_total_time=600"
  rules:
    - if: $CI_PIPELINE_SOURCE == "schedule"

A 10-minute (600s) nightly fuzz run on a schedule discovers new crashing inputs and adds them to the corpus; the per-MR job then just regression-checks that corpus in seconds. This split — long fuzz on a schedule, fast regression on MRs — is the pattern that makes fuzzing sustainable instead of a pipeline that takes an hour.

Triaging what fuzzing finds

Fuzzers are prolific and produce noisy findings: some are real crashes, some are intended errors the fuzzer counts as failures, some are duplicates of one root cause. The corpus of crashing inputs is saved as artifacts so you can reproduce each one locally:

  artifacts:
    when: always
    paths:
      - corpus
      - crashes
    expire_in: 2 weeks

Download a crashing input, replay it against the harness locally, and you have a deterministic reproduction — which is gold for fixing the bug.

Seeding the corpus and growing coverage

A fuzzer starting from nothing wastes time rediscovering basic structure before it reaches the interesting branches. Seed it. The corpus directory is just a folder of example inputs, and feeding it real, valid samples — a few well-formed requests, a couple of valid files — lets coverage-guided fuzzing start from a known-good baseline and mutate toward the edges far faster:

fuzz:
  extends: .fuzz_base
  script:
    - mkdir -p corpus
    - cp testdata/valid-samples/* corpus/      # seed with real examples
    - ./gitlab-cov-fuzz run --regression="$REGRESSION" -- ./fuzzer corpus
  artifacts:
    when: always
    paths: ["corpus"]
    expire_in: 30 days

Because the corpus is persisted as an artifact and reused, it accumulates across runs — every crash the nightly job finds gets added, so the corpus grows richer over time and your regression check gets stronger automatically. That compounding is the quiet superpower of coverage-guided fuzzing: month three finds bugs month one couldn’t reach, with no extra effort from you.

For API fuzzing, the equivalent of seeding is a good spec and realistic example values. If your OpenAPI spec declares an enum or a format, the fuzzer uses it to generate smarter off-nominal inputs around the boundaries. Garbage-in still applies — a vague spec produces shallow fuzzing — so the spec quality and the test depth rise together.

Exactly where AI earns its place

This is the part fuzzing makes genuinely better. AI is a strong fast junior engineer for:

Writing the harness. Translating “fuzz my JSON parser” into a libFuzzer harness that decodes the bytes and calls your function is boilerplate AI handles well.
Triaging crash inputs. Paste a crashing input and the stack trace, and AI is good at explaining the likely root cause and suggesting a fix.
Drafting the pipeline YAML for the regression/nightly split.

Where it’s a liability:

It will sometimes write a harness that catches and swallows the very panic you’re trying to surface, making the fuzzer report clean while the bug remains. Read the harness.
It can misjudge whether a finding is a real vulnerability or expected input rejection. The fuzzer found something; you decide if it matters.

So every harness and every triage call gets human review before merge. And the rule that never bends: do not hand AI your secrets. When you paste a crash input or a stack trace for help, scrub any real tokens, internal hostnames, or credentials from it first — fuzzer inputs and traces can carry surprising data. Share the harness and the sanitized trace, never the CI secrets that let the fuzzed service run. For reviewing fuzzing-driven fixes, the code review dashboard gives the diff a careful pass, and when a fuzzer surfaces a live security issue, the incident response dashboard keeps the response disciplined.

My reusable prompt: “Write a libFuzzer harness in Go that feeds fuzz bytes into parser.Parse([]byte) without catching or suppressing panics, then draft a GitLab CI job using the Coverage-Fuzzing template that runs full fuzzing on a schedule and regression-only on MRs.” The “without suppressing panics” clause heads off the most common harness bug. More variants are in my prompt library, and the security-testing prompt packs include a fuzzing starter set.

Conclusion

Fuzzing throws the inputs your imagination missed, and GitLab gives you both API fuzzing (against the running surface) and coverage-guided fuzzing (against your functions) as pipeline-native jobs. Split long fuzzing onto a schedule and fast regression onto MRs, save crashing inputs for reproduction, and let AI draft harnesses and triage crashes — then review every harness so it isn’t secretly swallowing the bug, and keep secrets out of every paste. More in the GitLab CI/CD category.