kube-apiserver Audit Policy: Knowing Exactly What Happened in Your Cluster
When something changes in your cluster and nobody admits to it, the audit log has the answer. Learn to write a kube-apiserver audit policy that captures what matters without drowning in noise.
- #kubernetes
- #audit
- #security
- #api-server
- #compliance
Every cluster I’ve operated has eventually produced the same question: “who deleted that?” A Deployment vanishes, a Secret gets rotated at a suspicious hour, a namespace’s quota mysteriously triples. Without audit logging, the honest answer is “we have no idea,” and that’s a deeply uncomfortable place to be during an incident review. The kube-apiserver audit log is the cluster’s flight recorder — it captures every request that reaches the API server — and a good audit policy is what turns that firehose into something you can actually search.
This guide covers how audit logging works, how to write a policy that captures the important things without logging every health check, and where an AI assistant speeds up the tedious parts.
How audit logging is structured
Every request to the kube-apiserver passes through four possible stages, and your policy decides how much detail to record at each one:
- None — don’t log this request at all.
- Metadata — log who, what, when, and the response status, but not the request or response bodies.
- Request — log metadata plus the request body (the object being sent).
- RequestResponse — log everything, including what the API server returned.
The art is matching the level to the resource. You want RequestResponse on a Secret deletion and None on the kubelet’s endless get nodes heartbeat. A policy is an ordered list of rules; the first rule that matches a request wins, so ordering is everything.
A starter policy that isn’t noise
Here’s the shape of a policy that captures the security-relevant events without filling your disk:
apiVersion: audit.k8s.io/v1
kind: Policy
omitStages:
- "RequestReceived"
rules:
# Never log the noisy, read-only system traffic
- level: None
users: ["system:kube-proxy"]
verbs: ["watch"]
resources:
- group: ""
resources: ["endpoints", "services"]
- level: None
userGroups: ["system:nodes"]
verbs: ["get"]
# Capture full detail on secrets and configmaps
- level: RequestResponse
resources:
- group: ""
resources: ["secrets", "configmaps"]
# Capture changes to RBAC — who granted whom what
- level: RequestResponse
verbs: ["create", "update", "patch", "delete"]
resources:
- group: "rbac.authorization.k8s.io"
resources: ["roles", "rolebindings", "clusterroles", "clusterrolebindings"]
# Metadata for everything else that mutates state
- level: Metadata
verbs: ["create", "update", "patch", "delete"]
# Drop the rest (reads, watches)
- level: None
omitStages: ["RequestReceived"] halves your log volume immediately by skipping the entry written before a request is processed — you almost always only care about the response stage. The final catch-all level: None is deliberate: it means anything not explicitly matched (mostly reads) is dropped.
Pro Tip: Logging Secret bodies at RequestResponse writes the secret values into your audit log. That’s often what compliance wants, but it means your audit log is now as sensitive as the secrets themselves. Either drop secrets to Metadata level, or make absolutely sure the audit log backend is encrypted and access-controlled to the same standard. I’ve seen the “we logged everything” approach quietly become the biggest secret-leak in the environment.
Wiring the policy into the API server
The policy is passed to the kube-apiserver as flags. On a kubeadm cluster you edit the static pod manifest at /etc/kubernetes/manifests/kube-apiserver.yaml:
- --audit-policy-file=/etc/kubernetes/audit/policy.yaml
- --audit-log-path=/var/log/kubernetes/audit/audit.log
- --audit-log-maxage=30
- --audit-log-maxbackup=10
- --audit-log-maxsize=200
You also need hostPath volume mounts so the policy file and log directory are visible inside the static pod. The moment you save the manifest, the kubelet restarts the API server — which means a mistake here can take your control plane offline. On a managed cluster (EKS, GKE, AKS) you don’t touch the manifest at all; you enable audit logging through the provider’s control-plane settings, and logs flow to the provider’s logging backend instead of a local file.
Reading the log without going cross-eyed
Audit entries are JSON, one per line, and they’re verbose. jq is your friend:
# Every delete in the last log file, with who and what
jq 'select(.verb=="delete") |
{who: .user.username, what: .objectRef.resource,
name: .objectRef.name, when: .requestReceivedTimestamp}' \
/var/log/kubernetes/audit/audit.log
# Who has been touching secrets?
jq 'select(.objectRef.resource=="secrets" and .verb!="get") |
{user: .user.username, verb, name: .objectRef.name}' audit.log
That second query is the one that answers the “who rotated the Secret?” question in about ten seconds, assuming you wrote the policy to capture it.
Where AI fits — drafting and triage, not access
Two parts of this work are perfect for an AI assistant. First, writing the policy itself: describing in plain language what you want logged (“full detail on RBAC and secrets, metadata on all other writes, nothing on system reads”) and having the model produce the ordered rules is far faster than hand-crafting the YAML and getting the ordering right. Second, triaging the log: pasting a few hundred sanitized audit lines and asking the model to summarize unusual activity, group events by user, or spot a deletion spike.
I treat the assistant as a fast junior engineer here. It drafts the policy and reads the logs; it does not touch the cluster. The audit policy file controls control-plane behavior — editing it wrong can crash the API server — so it is firmly human-in-the-loop. I review every rule and apply it myself, and I never hand the model a kubeconfig or production credentials. When I want it to triage logs, I redact secret values and usernames first, then paste text, never a live connection. For incident-time log triage there’s a guided flow in the incident response dashboard, reusable prompts in the prompt library, and packaged sets in the prompt packs.
A triage prompt I keep handy:
Here are 300 redacted kube-apiserver audit log lines (JSON). Summarize:
unusual delete activity by user, any RBAC changes, any access to secrets
outside business hours. Group by user. Output a table only — no commands,
no kubeconfig, no live cluster actions.
Shipping logs somewhere durable
A local audit.log on the control-plane node is fine for forensics until that node dies. For anything resembling compliance, ship audit events off-box. The API server supports a webhook backend (--audit-webhook-config-file) that POSTs events to a collector, or you run a log shipper that tails the file and forwards to your SIEM. Either way, retention should outlive your incident-investigation window — 30 to 90 days is typical, longer if regulation demands it.
Wrapping up
An audit policy is cheap insurance you only appreciate during an incident. Match log levels to resource sensitivity, drop the system noise, mind that Secret bodies make the log sensitive, and ship events somewhere durable. Let an AI assistant draft the policy and triage the output while you keep your hands on the control-plane config and never expose live credentials. The first time someone asks “who deleted that?” and you answer in ten seconds, the setup pays for itself. For more on locking down clusters, see the Kubernetes & Helm guides.
Download the Free 500-Prompt DevOps AI Toolkit
500 battle-tested, copy-paste AI prompts engineered by a senior systems engineer — every one with fill-in placeholders and safety/back-out notes. Drop your email and it's yours.
- 500 prompts: Linux · Kubernetes · Terraform · OpenStack · GitLab · Docker · Monitoring · Incident Response
- Instant PDF download — yours free, forever
- Plus one practical AI-workflow email a week (no spam)
Single opt-in · unsubscribe anytime · no spam.