Tuning Gitleaks for Precision: A Secret Scanner Developers

A secret scanner does not fail by missing secrets. It fails by crying wolf. The first time Gitleaks blocks a pull request over a test fixture, a UUID, or an example API key in the docs, a developer files a ticket. The fifth time, they learn the --no-verify flag. By the tenth, someone has added a blanket allowlist that silences the noise — and the real leak that comes through next quarter sails right past it. The entire value of secret scanning lives in its precision, because a gate developers respect is the only gate that works. This is a guide about tuning Gitleaks so it earns that respect.

Recall Is Easy; Precision Is the Job

Gitleaks out of the box has good recall — it will flag a lot of things that look like secrets. The problem is that “looks like a secret” catches a long tail of things that are not: high-entropy hashes, content digests, base64-encoded test data, generated IDs, and the example credentials that every README contains. Each false positive costs developer trust, and trust does not come back. Once a team believes the scanner is mostly noise, they stop reading its output, and a noisy scanner is functionally equivalent to no scanner.

So the goal of tuning is not to catch more. It is to catch the same real secrets while drastically cutting the false positives that drive bypass behavior. That means understanding which patterns in your repos generate noise, and suppressing exactly those — narrowly, with a reason — rather than loosening the scanner globally.

Custom Rules for the Secrets You Actually Handle

Gitleaks’ default rules cover common providers, but they miss the bespoke credential formats most organizations carry: an internal service token with a custom prefix, a signing key in a particular shape, a legacy format from a system you still run. Adding a targeted rule for those raises real coverage:

[[rules]]
id = "internal-svc-token"
description = "Internal service token (svc_ prefix, 40 hex chars)"
regex = '''svc_[0-9a-f]{40}'''
keywords = ["svc_"]
entropy = 3.5

[[rules.allowlist]]
description = "Documented placeholder token in examples"
regexes = ['''svc_0{40}''']

The keywords field is a cheap performance and precision win — Gitleaks only evaluates the expensive regex on lines containing the keyword. And the rule-scoped allowlist for the documented all-zeros placeholder is precision applied surgically: it silences exactly the known-fake value without weakening detection of real svc_ tokens.

Allowlists That Don’t Hide Real Leaks

The dangerous instinct, when fighting false positives, is the broad allowlist — a path glob or a loose regex that makes the noise stop. It also makes real secrets in those paths stop showing up, which is the opposite of what you want. Every allowlist entry should be as narrow as the false-positive pattern it targets, and it should carry a justification:

[allowlist]
description = "Global allowlist — each entry justified"
paths = [
  '''(.*?)test/fixtures/.*\.json$''',   # synthetic test data, never real creds
]
regexes = [
  '''EXAMPLE_[A-Z_]+_KEY''',            # doc placeholders, uppercase EXAMPLE_ convention
]

The difference between a path-scoped allowlist for a test/fixtures directory that by policy never contains real credentials, and a blanket *.json allowlist, is the difference between a tuned gate and a hole. The first suppresses a known-clean location; the second blinds the scanner to an entire file type. When in doubt, scope tighter and justify louder.

Entropy and Stopwords for High-Entropy Noise

Some repos are dominated by high-entropy strings that are not secrets — content-addressable hashes, lockfile integrity digests, generated IDs. Cranking the entropy threshold up reduces those false positives, but it is a genuine precision/recall trade: raise it too far and you start missing short, low-entropy-but-real secrets. The more targeted lever is stopwords, which let you keep a sensitive entropy threshold while excluding the specific non-secret patterns:

[[rules]]
id = "generic-high-entropy"
regex = '''[0-9a-zA-Z\-_]{32,}'''
entropy = 4.2
[rules.allowlist]
stopwords = ["sha256", "integrity", "checksum", "uuid"]

State the trade explicitly whenever you touch entropy. “We raised the threshold to silence lockfile digests, accepting that a sub-32-character token might now slip” is a decision a reviewer can weigh. A silent bump is a coverage cut nobody agreed to.

Where to Run It and What to Block

Placement shapes behavior. A fast pre-commit hook gives developers feedback before the secret ever leaves their machine — advisory, low-friction, and the best place to catch most leaks. The CI scan is the blocking backstop, and it should scan the diff on pull requests for speed and the full history on a schedule to catch what slipped in before scanning existed. The two run the same tuned config so a developer never sees a finding in CI that pre-commit did not also surface.

Pro Tip: When CI flags a secret that is genuinely live, resist the urge to “fix” it by scrubbing history first. The credential was compromised the moment it was committed and pushed — anyone with clone access has it. Rotate the secret immediately, then scrub history as cleanup. History surgery on an unrotated secret is theater.

Using AI to Triage the Findings Backlog

If you already have a wall of Gitleaks findings, an LLM is a fast way to bucket them and propose tuning — as a draft you review, never as an auto-apply:

Prompt: “Here are 40 Gitleaks findings, some labeled. Bucket each as true positive, false positive, or needs-verification, and name the pattern behind each false positive. Then propose Gitleaks TOML allowlist entries scoped as narrowly as possible, one justification per entry. Do not allowlist anything you can’t confirm is non-live, and flag any finding that looks like a real, live secret for immediate rotation.”

Output (excerpt): “Findings 1–12: false positives, all from test/fixtures/*.json synthetic data — propose a path-scoped allowlist for that directory only. Finding 19: NEEDS VERIFICATION — aws_secret_access_key = AKIA... in a config sample; the value isn’t an obvious placeholder. Do NOT allowlist. If this is live, rotate the IAM key now, then remove from history. Finding 27: real-looking 40-char hex with a svc_ prefix in app code — flag for rotation, not suppression.”

The model is fast at pattern-bucketing and good at drafting scoped allowlists, but the judgment call — is this value live? — stays with a human, and the rotation decision is never the AI’s to make. Pair this triage with a pre-commit secret-scanning gate so the tuned config catches leaks at the earliest possible point.

The Metric That Keeps It Honest

Tuning is not a one-time event; repos change and new false-positive patterns appear. Track the false-positive rate over time — what fraction of findings get allowlisted or dismissed — and treat a rising trend as a signal to tune again before developers start ignoring the gate. A scanner that stays precise stays trusted, and a trusted scanner is the only kind that actually stops secrets from shipping. If you want a structured pass over your current config and findings, the Gitleaks tuning prompt walks the triage, custom rules, and scoped allowlists with the rotation-first discipline built in.

Tuning Gitleaks for Precision: A Secret Scanner Developers Won't Bypass