Skip to content
DevOps AI ToolKit
Newsletter
All guides
AI for DevOps Security & Hardening By James Joyner IV · · 11 min read

Reviewing Kubernetes NetworkPolicy for Default-Deny With AI

A flat cluster network is one compromised pod away from full lateral movement. Here's how I use AI to audit NetworkPolicies toward default-deny without breaking traffic.

  • #security
  • #hardening
  • #kubernetes
  • #networkpolicy
  • #ai

The default Kubernetes network is flat. Every pod can talk to every other pod, in every namespace, on every port, unless you say otherwise. Most clusters I inherit have never said otherwise. That means the day one pod gets popped, through a vulnerable dependency or a leaked token, the attacker can reach the database, the internal admin service, and the metadata endpoint without crossing a single network boundary. Lateral movement is free.

Fixing this means moving to a default-deny posture and then explicitly allowing the traffic that’s actually needed. The hard part isn’t writing a deny-all policy; that’s four lines. The hard part is figuring out what to allow without taking down production, and that’s a mapping exercise across dozens of services. I use AI as a fast junior network engineer to read the policies, model the effective reachability, and draft allow rules, while I verify every change against real traffic before applying it. Defensive segmentation only, and no live cluster secrets in the prompt.

Establish the default-deny baseline per namespace

The foundation is a policy that denies all ingress (and ideally egress) in a namespace, so that only explicit allow rules permit traffic. I have the AI draft it and explain the semantics, because NetworkPolicy logic is subtractive and easy to get backwards:

Write a NetworkPolicy that selects all pods in a namespace and denies all ingress. Explain exactly what traffic this blocks and what still works. Then explain what I need to add an egress default-deny safely.

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: default-deny-ingress
  namespace: payments
spec:
  podSelector: {}
  policyTypes:
    - Ingress

The empty podSelector: {} matches every pod; with no ingress rules, nothing is allowed in. I make the AI confirm that this is additive across policies, that a pod is denied only if some policy selects it, and that policies combine as a union of allows. Getting that mental model right before applying anything is the whole game.

Pro Tip: roll out default-deny egress more cautiously than ingress. The instant you deny egress, pods can’t reach kube-dns and DNS resolution breaks cluster-wide for that namespace. Always pair an egress default-deny with an explicit allow to the DNS service on UDP and TCP 53, and verify it in staging first.

Map what actually talks to what

You can’t write correct allow rules without knowing the real traffic graph. I gather observed flows, then have the AI help me turn them into policies. If you have a CNI with flow logs, or even just service dependency data, that’s the input:

Here is a sanitized list of observed pod-to-pod flows: source label, destination label, port. Group these into the minimal set of NetworkPolicy allow rules per destination. Flag any flow that looks unexpected and shouldn’t exist.

That last instruction matters. The AI sometimes flags a flow that has no business existing, like a frontend pod talking directly to the database when it’s supposed to go through an API. Those flags are findings, not just inputs, and I investigate each one. I never paste real internal hostnames or secrets; labels and ports are enough to reason about structure.

Generate least-privilege allow rules and read them carefully

Once the graph is clear, the model drafts the allow policies. A typical one scopes ingress to a specific source by label and port:

spec:
  podSelector:
    matchLabels:
      app: payments-db
  policyTypes:
    - Ingress
  ingress:
    - from:
        - podSelector:
            matchLabels:
              app: payments-api
      ports:
        - protocol: TCP
          port: 5432

I read every selector. The subtle bug is namespace scope: a bare podSelector only matches pods in the same namespace, so allowing cross-namespace traffic requires a namespaceSelector. The AI gets this right most of the time, but “most of the time” is exactly why a human verifies. A misplaced selector either opens too much or breaks the app.

Test in a dry run before you enforce

Applying NetworkPolicy blind is how you cause an outage. I stage policies in a non-production namespace that mirrors prod, generate the same traffic, and confirm nothing breaks. I also use kubectl to sanity-check what’s selected:

kubectl get networkpolicy -n payments
kubectl describe networkpolicy default-deny-ingress -n payments

For the connectivity proof itself, I exec into a pod and confirm allowed paths still work and denied paths now fail, comparing against what the AI predicted. When its prediction and reality diverge, the policy is wrong and I fix it, rather than trusting the model’s model of the cluster over the cluster itself.

Watch for the policies that don’t do what they claim

Two failure modes the AI is good at catching on review. First, a policy that selects pods but has no effect because the CNI in use doesn’t enforce NetworkPolicy at all, which silently means zero segmentation. I have it confirm the CNI supports enforcement before anyone trusts a policy. Second, an “allow” policy accidentally widening access because a namespaceSelector with empty matchLabels matches all namespaces. I ask the model to flag any empty selector and explain its scope, because empty selectors mean “everything” and that’s rarely the intent.

Make segmentation a reviewed, ongoing practice

Default-deny isn’t a one-time project; new services arrive and need allow rules, and old ones get decommissioned and leave stale openings. I treat NetworkPolicy changes as code review. The diff goes through the code review dashboard so a human approves the segmentation change with the AI’s reasoning attached, rather than the model applying YAML to a live cluster.

The reusable Kubernetes-segmentation prompts live in our prompts library, with the cluster-security set bundled in the DevOps security prompt pack. For the policy drafting and reachability reasoning I’ve leaned on Claude, which handles the additive NetworkPolicy semantics and YAML well.

The takeaway

A flat cluster network turns one compromised pod into full lateral movement, and default-deny NetworkPolicy is the boundary that stops it. AI makes the mapping and drafting work fast, acting as the junior network engineer that models reachability and proposes least-privilege allows, while you verify every selector against real traffic, test in staging, and own what reaches the cluster. Keep it defensive, keep live secrets out of the prompt, and pair every egress deny with a DNS allow. The rest of the security hardening category covers the pod-security and RBAC controls that round out cluster defense.

Free download · 368-page PDF

Download the Free 500-Prompt DevOps AI Toolkit

500 battle-tested, copy-paste AI prompts engineered by a senior systems engineer — every one with fill-in placeholders and safety/back-out notes. Drop your email and it's yours.

  • 500 prompts: Linux · Kubernetes · Terraform · OpenStack · GitLab · Docker · Monitoring · Incident Response
  • Instant PDF download — yours free, forever
  • Plus one practical AI-workflow email a week (no spam)

Single opt-in · unsubscribe anytime · no spam.