AI-Assisted Kubernetes RBAC Least-Privilege Audits

Every cluster I’ve ever inherited has the same RBAC story: it started tight, then someone hit a Forbidden error at 2 a.m., bound a ServiceAccount to cluster-admin “just to unblock the deploy,” and never came back to fix it. Multiply that by three years and you have a cluster where half the workloads can delete every secret in every namespace.

Auditing RBAC by hand is miserable. The relationships sprawl across Roles, ClusterRoles, RoleBindings, and ClusterRoleBindings, and the verbs that matter (escalate, bind, impersonate) hide among the boring ones. This is exactly the kind of tedious, pattern-heavy work where an AI copilot earns its keep — as long as you remember it’s a fast junior engineer reasoning over text you gave it, not an authority that gets to touch the cluster.

Export the graph, don’t describe it

Don’t tell the model “we have some RBAC.” Give it the real objects. I dump everything to one file:

kubectl get clusterroles,roles \
  --all-namespaces -o yaml > rbac-roles.yaml
kubectl get clusterrolebindings,rolebindings \
  --all-namespaces -o yaml > rbac-bindings.yaml

Then I hand both files over with a precise question:

Here are all Roles, ClusterRoles, and their bindings. Build a list of every subject (user, group, ServiceAccount) and the union of permissions it actually has. Flag any subject that can read Secrets cluster-wide, any with * verbs, and any bound to cluster-admin.

The model is good at the graph traversal that humans get lost in: following a ServiceAccount through a RoleBinding to a ClusterRole and unioning the verbs.

Hunt the dangerous verbs first

Not all permissions are equal. A handful are privilege-escalation primitives, and they’re the ones worth finding first:

escalate on roles — lets a subject grant itself more than it has
bind on rolebindings — same idea, via binding
impersonate on users/groups/serviceaccounts
create on pods/exec or pods/attach — shell into anything
update/patch on validatingwebhookconfigurations — disable admission control
* on secrets

I ask the model directly:

Scan these Roles for the verbs escalate, bind, impersonate, and for create on pods/exec. List every subject that ends up with any of them, and explain the escalation path in one sentence each.

The “explain the path” part matters. A finding like “ServiceAccount ci-runner can create pods/exec in kube-system, so it can shell into the API server’s sidecar” is something I can act on. A bare list of verbs is not.

Pro Tip: The most dangerous binding is usually system:authenticated or system:anonymous accidentally granted something. Ask the model to specifically check for ClusterRoleBindings whose subject is one of those two groups — it’s a common copy-paste disaster.

Diff intended against actual

The audit that produces real fixes compares what a workload needs against what it has. I pull the ServiceAccount a Deployment uses, then ask:

This Deployment runs a webhook that only reads ConfigMaps in its own namespace. Here is the ClusterRole its ServiceAccount is bound to. List every permission it has that it does not need for that job.

That over-grant list is your remediation backlog. Nine times out of ten it’s a service that was handed a broad ClusterRole when it needed a five-line namespaced Role.

Have the model draft the tighter Role — then you verify

This is where the human-in-the-loop line is bright. The AI can draft a least-privilege Role:

apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: "configmap-reader"
  namespace: "webhooks"
rules:
  - apiGroups: [""]
    resources: ["configmaps"]
    verbs: ["get", "list", "watch"]

But the model doesn’t know your runtime. It can’t see that the controller also needs to update a Lease for leader election, because that wasn’t in the prompt. So I never apply a generated Role blind. I deploy it to staging, watch for Forbidden events, and let reality fill the gaps:

kubectl logs -n webhooks deploy/my-webhook | grep -i forbidden

The AI proposes; the cluster’s audit log disposes. You can also use kubectl auth can-i --list --as=system:serviceaccount:webhooks:my-sa to confirm the effective permissions after you apply, which is a great check to ask the model to interpret.

Watch for stale bindings to deleted subjects

A subtle RBAC smell is bindings pointing at ServiceAccounts or users that no longer exist. They’re harmless until someone recreates an SA with the same name and silently inherits old power. I ask:

Cross-reference these bindings against the list of ServiceAccounts that currently exist. Flag any RoleBinding or ClusterRoleBinding whose subject ServiceAccount is missing.

kubectl get serviceaccounts --all-namespaces -o yaml > sas.yaml

It’s the kind of join across two YAML dumps that’s tedious by hand and trivial for a model.

Keep credentials out of the prompt

RBAC YAML is safe to share with a model — it describes permissions, not secrets. But the moment your audit involves tokens, stop. Never paste a ServiceAccount token, never give the model a kubeconfig, and never let it run kubectl against prod. Everything in this workflow is kubectl get -o yaml on one side and kubectl apply (by a human, in staging first) on the other. The model lives entirely in the middle, reading text.

If you want a structured second opinion on the change before it lands, the code review dashboard handles the AI-flags-human-approves handoff for RBAC manifests too.

Aggregated ClusterRoles hide real power

One of the sneakiest RBAC features is ClusterRole aggregation. A ClusterRole with an aggregationRule automatically absorbs the rules of every other ClusterRole carrying a matching label, so its actual permissions aren’t written in its own rules block — they’re assembled at runtime from labels scattered across the cluster. You can read an aggregated role’s YAML, see an empty rules:, and completely miss that it grants delete on everything.

I make the model account for this explicitly:

Some of these ClusterRoles use aggregationRule. For each one, find every ClusterRole whose labels match the selector, and compute the union of rules the aggregated role actually receives. Don’t trust the empty rules block.

kubectl get clusterrole view -o jsonpath='{.aggregationRule}'
kubectl get clusterroles -l rbac.authorization.k8s.io/aggregate-to-view=true

This is the kind of multi-object join that’s miserable by hand — chase a label selector across forty ClusterRoles and union their rules — and exactly where the model’s patience beats mine. The built-in admin, edit, and view roles are aggregated, so anyone who adds an aggregation label to a custom role silently widens them. Auditing that drift is a recurring task worth a saved prompt.

Re-run the audit on a schedule, not once

RBAC isn’t a one-time cleanup; it rots continuously as people add bindings to unblock deploys. The audit is only useful if it’s repeatable, so I keep the kubectl get dumps and the standing prompts in a runbook and re-run the whole pass monthly. Comparing this month’s over-grant list against last month’s tells me whether we’re tightening or sprawling — and a sudden new cluster-admin binding shows up as a diff instead of hiding for a year. The model is happy to do that month-over-month comparison too: hand it both audit outputs and ask what permissions appeared since last time.

Conclusion

RBAC rots toward over-privilege because tightening it is tedious and risky. AI removes the tedium: it traverses the binding graph, surfaces escalation verbs, and drafts least-privilege Roles in seconds. What it can’t do is know your runtime or authorize a change — so you verify every generated Role in staging and keep all credentials far away from the model. Done that way, you can actually claw a sprawling cluster back toward least privilege.

For deeper RBAC fundamentals, Kubernetes RBAC without the headaches is a good companion, and the broader Kubernetes and Helm guides cover the surrounding hardening work.