Making Admission Webhooks Cheaper With CEL matchConditions

I once profiled an apiserver that felt sluggish on every write and traced 30-something milliseconds of it to a validating webhook that was being invoked for absolutely everything — including its own controller’s writes, every request from kube-system, and every dry-run. The webhook allowed all of those instantly, but “instantly” still meant a network round-trip out to the webhook pod and back, on the critical path of every create and update in the cluster. The webhook was doing useful work for maybe five percent of the traffic it saw.

The fix isn’t a faster webhook server. It’s not calling the webhook in the first place for requests it’s going to allow anyway. CEL matchConditions let the apiserver make that decision in-process, before any network hop, and most teams don’t know the feature exists.

How filtering layers stack up

An admission webhook config has three filtering stages, and they run in order:

rules match coarsely by group/version/resource and verb.
namespaceSelector / objectSelector match by labels on the namespace or object.
matchConditions are CEL expressions evaluated by the apiserver on the request content, after the selectors pass but before the webhook is called.

A request has to satisfy all three to reach your webhook server. The first two are blunt — they can’t say “skip this if it’s a dry-run” or “skip requests from my own service account.” matchConditions can, because they see the actual request.

What belongs in a matchCondition

The rule of thumb: any check your webhook performs only to immediately return “allowed” should move up into a matchCondition. The usual suspects are system namespaces, the webhook’s own service account, and dry-run requests:

apiVersion: admissionregistration.k8s.io/v1
kind: ValidatingWebhookConfiguration
metadata:
  name: image-policy
webhooks:
  - name: validate.image.example.com
    rules:
      - apiGroups: [""]
        apiVersions: ["v1"]
        resources: ["pods"]
        operations: ["CREATE", "UPDATE"]
    matchConditions:
      - name: exclude-kube-system
        expression: "request.namespace != 'kube-system'"
      - name: skip-dry-run
        expression: "!has(request.dryRun) || !request.dryRun"
      - name: not-own-sa
        expression: "request.userInfo.username != 'system:serviceaccount:platform:image-policy'"
    failurePolicy: Fail
    clientConfig:
      service:
        name: image-policy
        namespace: platform
        path: /validate

Every request must match all matchConditions to be sent to the webhook. The available CEL variables are object, oldObject, request, authorizer, and namespaceObject — enough to filter on content, the requesting user, and the operation without ever leaving the apiserver.

The failurePolicy interaction is the dangerous part

This is where you can hurt yourself. If a matchCondition expression errors — not evaluates to false, but throws — the apiserver treats that error according to failurePolicy. With failurePolicy: Fail, an expression that errors on some objects will reject those writes entirely. A CEL expression that assumes a field exists, on an object where it doesn’t, is exactly that kind of bug:

# Risky: errors if request.dryRun is unset on some request shapes
- expression: "request.dryRun == false"

# Safe: total, never errors
- expression: "!has(request.dryRun) || !request.dryRun"

Keep matchCondition expressions total and side-effect-free, and test them against representative objects before they go anywhere near a Fail webhook. A buggy condition on a Fail policy is one of the few ways to make the cluster refuse writes for a whole category of resource.

Prompt: Here is a ValidatingWebhookConfiguration and the webhook server’s allow-fast-path logic. Move every check that exists only to immediately allow a request into CEL matchConditions, keep the expressions total so they never error, and flag any that would be risky under failurePolicy: Fail. Output the annotated config and the server lines that become dead code — no apply commands.

Output (excerpt): Moved three fast-allows to matchConditions: kube-system exclusion, dry-run skip, own-service-account skip. All use has() guards so they can’t error. Server lines 40-58 (the early return allowed block) are now unreachable and can be deleted or kept as defense-in-depth. None of the three risk a Fail-policy lockout because all are total.

This is well-suited to an AI assistant: it knows the CEL variable set and the failurePolicy semantics, and it produces a reviewable diff plus the dead server code the change creates. I keep it advisory — it never edits a live webhook config, because the blast radius of a wrong Fail webhook is the whole cluster’s write path. It drafts; I test against sample objects and apply.

When to skip the webhook entirely

If your validation logic can be expressed purely in CEL, you may not need a webhook server at all. A ValidatingAdmissionPolicy runs the same kind of CEL inside the apiserver with no external pod to deploy, secure, or keep available. matchConditions are the right tool when you still need a server for the hard cases but want to stop paying for the easy ones; an admission policy is the right tool when the whole check fits in CEL.

# Confirm the webhook is being called less after adding conditions
kubectl get --raw /metrics | grep apiserver_admission_webhook_request_total

Wrapping up

An admission webhook sits on every write in the cluster, so the cheapest webhook call is the one the apiserver never makes. CEL matchConditions push your fast-allow checks — system namespaces, dry-runs, your own service account — up into the apiserver where they cost a CEL evaluation instead of a network round-trip, shrinking both latency and the webhook’s blast radius. Just keep the expressions total so a Fail policy can’t turn a buggy condition into a cluster-wide write outage, and lean on an AI assistant to do the extraction while you test and apply. More admission-control patterns are in the Kubernetes & Helm guides, with reusable prompts in the prompt library.