Testing Your Policies: Why Your Conftest Rules Need Unit

There’s a category of policy-as-code bug that almost nobody catches until it’s blocking a deploy on a Friday afternoon: the policy that’s too broad. Someone writes a Rego rule, runs Conftest against one obviously-bad config, watches it get denied, and ships. What they never checked is whether the rule allows a good config — or whether it fires for the right reason. The first signal that the rule rejects everything, or accepts something it shouldn’t, arrives in production when a legitimate change can’t ship.

The fix is boring and well-established: unit-test your policies. OPA ships a test framework precisely for this, and yet most teams treat Rego as configuration rather than code, so it never gets tested. This guide shows how to write tests that prove a policy both blocks the bad and allows the good, using fixtures, and how to wire them into CI so an untested rule can’t merge.

A policy is only as good as its negative test

Consider a policy meant to block Kubernetes containers running as root:

package main

deny[msg] {
    input.kind == "Pod"
    some container in input.spec.containers
    not container.securityContext.runAsNonRoot
    msg := sprintf("container %q must set runAsNonRoot: true", [container.name])
}

It looks fine. But does it correctly allow a compliant pod? Does it fire when runAsNonRoot is false versus merely absent? Without tests you’re guessing. Here’s the test file that turns the guess into a fact:

package main

test_denies_root_container {
    deny[_] with input as {
        "kind": "Pod",
        "spec": {"containers": [{"name": "app", "securityContext": {"runAsNonRoot": false}}]},
    }
}

test_allows_nonroot_container {
    count(deny) == 0 with input as {
        "kind": "Pod",
        "spec": {"containers": [{"name": "app", "securityContext": {"runAsNonRoot": true}}]},
    }
}

test_denies_missing_security_context {
    deny[_] with input as {
        "kind": "Pod",
        "spec": {"containers": [{"name": "app"}]},
    }
}

Run it with opa test . -v. The third test is the one that matters most — it proves the rule treats absent securityContext as a violation, which is the case real manifests hit constantly. Without it, you’d be relying on the policy’s behavior for an input you never actually checked.

Assert on messages, not just counts

A test that only checks count(deny) == 1 has a subtle weakness: it passes even if the wrong rule fired. When a file has several policies, you want to know that the root-user rule denied the root-user pod, not that some unrelated rule happened to trip. Pin the message:

test_root_container_returns_correct_message {
    msgs := deny with input as {
        "kind": "Pod",
        "spec": {"containers": [{"name": "api", "securityContext": {"runAsNonRoot": false}}]},
    }
    msgs[_] == "container \"api\" must set runAsNonRoot: true"
}

Now a refactor that accidentally makes a different rule own this case fails the test instead of sliding through. The message assertions double as documentation: a new engineer reading the test suite learns exactly which rule produces which denial.

Letting AI draft fixtures, then sanity-checking them

Writing fixtures by hand is tedious, and tedium is where AI helps — with one critical caveat. A prompt like this gets you started fast:

You are a policy-as-code engineer. Here is a Rego policy used with Conftest. Write opa test cases covering: a violating input, a compliant input, and a boundary case. Assert on the denial message, not just the count. Flag any rule that has no test proving it ALLOWS valid input.

The model returned solid fixtures, and then added the caveat that earns its place in the loop:

I wrote a passing fixture for the compliant case, but I want to flag that your deny rule iterates input.spec.containers and ignores initContainers. My “compliant” fixture only has regular containers, so the suite would pass even though a root initContainer slips through. You likely want to extend the rule before I write a fixture asserting that.

That’s the human-verifies half of the loop doing real work. The model wrote correct tests for the rule as written and noticed the rule itself had a gap — but extending the policy to cover initContainers is a security decision a person signs off on. The danger to guard against is the opposite instinct: an LLM asked to “make the failing test pass” will happily weaken the policy. Never let test maintenance relax a control as a side effect.

Wiring it into CI

Tests that don’t run are decoration. Add two steps to your pipeline:

# Fail the build if any policy test fails
opa test policy/ -v

# Fail the build if any config violates policy
conftest verify --policy policy/
conftest test manifests/ --policy policy/

Run opa test first so a broken policy is caught before it’s ever applied to real configs. Treat a policy change with no accompanying test change as a reviewable smell — the same way you’d question a code change with no test diff. The goal is that the only way a rule reaches the deny gate is through a test proving it blocks the bad and allows the good.

The discipline pays for itself

Policy-as-code earns trust by being predictable. The moment a gate blocks a legitimate deploy because a rule was too broad, teams start routing around it — adding skip annotations, disabling checks “temporarily,” eroding the whole control. Tested policies don’t do that, because the negative fixtures prove the rule allows valid configs before it ever runs against real ones.

For generating the tests themselves, see our Conftest policy unit testing prompt, and pair it with the OPA and Conftest authoring prompt for writing the rules in the first place. The wider Infrastructure as Code category covers the rest of the policy-as-code toolchain. Test your guardrails like you test your application code — they’re load-bearing in exactly the same way.

Testing Your Policies: Why Your Conftest Rules Need Unit Tests Too