Generating Kubernetes Network Policies From Observed Traffic With AI
Stop guessing at NetworkPolicy rules. Capture real flow data, hand it to AI, and review a least-privilege policy you can actually trust before applying it.
- #kubernetes
- #network-policy
- #security
- #ai
Writing a NetworkPolicy from first principles is a guessing game. You think you know which services talk to which, you write a default-deny policy, you apply it, and half an hour later something you forgot about — a metrics scraper, a sidecar, a cron job — starts failing because you blocked a flow you didn’t know existed. The trick that finally made network policies tractable for me was to stop guessing and start observing: capture the traffic that’s actually happening, then let an AI model turn that observed data into a least-privilege policy I review line by line.
Why default-deny is the goal and why it’s scary
By default, Kubernetes lets every pod talk to every other pod. A single NetworkPolicy selecting a pod flips it to default-deny for the directions you specify, and then you allow back only what’s needed. That’s the right security posture — but the failure mode is that you under-allow and break legitimate traffic. The whole problem is knowing what’s legitimate. Observed traffic answers that question with data instead of memory.
Capture the real flows first
You need a source of flow data. If you run Cilium, Hubble gives you exactly this. I capture a representative window of traffic for the namespace I’m locking down:
hubble observe --namespace payments --output jsonpb > /tmp/flows.json
If you’re on Calico, flow logs serve the same purpose. Either way, the output is a list of source/destination pods, ports, and protocols. That flow data is metadata about connections — safe to share with a model after I scrub anything sensitive like external IPs I’d rather not disclose. Capturing it is read-only observation; I run the command, not the model, and the model never sees the cluster itself.
Hand the flows to the model
The prompt is straightforward because the data does the heavy lifting:
Here is Hubble flow data for the
paymentsnamespace as JSON. Generate a default-deny NetworkPolicy plus the minimal set of ingress and egress allow rules that cover exactly these observed flows. Use pod selectors based on theapplabel, not IP blocks, wherever the traffic is in-cluster. Add a comment above each rule naming the flow it permits.
Asking for label selectors over IP blocks matters — pod IPs churn, labels are stable. And asking for a comment per rule means the reviewer (me) can trace every allow back to a real observed flow.
What the generated policy looks like
For a service that takes ingress from an API gateway and makes egress to a database and to DNS, the draft came back like this:
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: payments-default-deny
namespace: payments
spec:
podSelector:
matchLabels:
app: payments
policyTypes: ["Ingress", "Egress"]
ingress:
# observed: api-gateway -> payments:8080
- from:
- podSelector:
matchLabels:
app: api-gateway
ports:
- protocol: TCP
port: 8080
egress:
# observed: payments -> postgres:5432
- to:
- podSelector:
matchLabels:
app: postgres
ports:
- protocol: TCP
port: 5432
# observed: payments -> kube-dns:53
- to:
- namespaceSelector:
matchLabels:
kubernetes.io/metadata.name: kube-system
podSelector:
matchLabels:
k8s-app: kube-dns
ports:
- protocol: UDP
port: 53
- protocol: TCP
port: 53
The DNS egress rule is the one everyone forgets when writing policies by hand — block port 53 and every name lookup in the pod fails. The model included it because the flow data showed it. That’s the payoff of observation over guesswork.
Pro Tip: The single most common self-inflicted outage from NetworkPolicy is forgetting egress to kube-dns. If your observed flows don’t show DNS traffic, your capture window was too short — extend it and re-capture before trusting the generated policy.
Verify against the data, not your assumptions
Generated policy is config, so verification is offline and cheap. First validate the YAML, then cross-check every rule against the flow data:
kubectl apply --dry-run=client -f payments-netpol.yaml
Then the human step that actually matters: I go rule by rule and confirm each one maps to an observed flow, and — critically — I scan the flow data for anything the policy doesn’t cover. The risk with generated policies isn’t usually a wrong rule, it’s a missing one because the capture window missed a periodic flow like a nightly batch job.
Roll out in audit mode, not enforce mode
This is where the human-in-the-loop discipline is non-negotiable, because applying a NetworkPolicy mutates traffic enforcement and can cause an outage. I never let the model apply it. I apply it myself, to a non-prod namespace first, and I watch for dropped flows before going anywhere near production:
# Apply to staging, then watch for drops
kubectl apply -f payments-netpol.yaml --namespace payments-staging
hubble observe --namespace payments-staging --verdict DROPPED
If DROPPED shows legitimate traffic, the policy is too tight — I add the missing flow and repeat. Only once staging is clean for a full business cycle (including any nightly jobs) does the policy graduate to production, applied by a human.
The mistakes to watch for
Reviewing AI-generated network policies, these come up repeatedly:
- IP blocks instead of selectors. If the flow data had raw IPs, the model sometimes hard-codes
ipBlockfor in-cluster traffic. Pod IPs rotate; rewrite to label selectors. - Missing namespace selectors. Cross-namespace traffic needs both a
namespaceSelectorand apodSelector; the model occasionally drops the namespace half. - Over-broad empty selectors. An empty
podSelector: {}selects every pod — the opposite of least privilege. Catch it on review.
Wrap up
Network policy is hard because it punishes incomplete knowledge of your own traffic. Observed flow data flips that: capture what’s real, let AI translate it into a least-privilege draft, verify every rule against the data, and roll out in stages with a human applying each change. The model reads flows and writes YAML — it never enforces, never holds prod credentials, and never reaches the cluster.
Continue with the Kubernetes & Helm guides, pull security prompts or the Kubernetes prompt pack, and watch enforcement closely from the monitoring alerts dashboard when a new policy hits staging.
Download the Free 500-Prompt DevOps AI Toolkit
500 battle-tested, copy-paste AI prompts engineered by a senior systems engineer — every one with fill-in placeholders and safety/back-out notes. Drop your email and it's yours.
- 500 prompts: Linux · Kubernetes · Terraform · OpenStack · GitLab · Docker · Monitoring · Incident Response
- Instant PDF download — yours free, forever
- Plus one practical AI-workflow email a week (no spam)
Single opt-in · unsubscribe anytime · no spam.