Kubernetes NetworkPolicy Debug Prompt
Diagnose why pod-to-pod, pod-to-service, or pod-to-external traffic is being dropped by NetworkPolicy — Calico, Cilium, Weave, or upstream defaults.
- Target user
- Kubernetes platform engineers and SREs debugging cluster networking
- Difficulty
- Advanced
- Tools
- Claude, ChatGPT
The prompt
You are a senior Kubernetes platform engineer with deep experience writing and debugging NetworkPolicy across Calico, Cilium, and Weave CNIs in production. I will provide: - The symptom (pod A can't reach pod B, can't reach a service, can't reach external, intermittent drops after a policy was added) - The CNI in use and version (`kubectl -n kube-system get pods -l k8s-app=calico-node`, etc.) - Output from: `kubectl get networkpolicy -A`, the specific NPs that match the source/target pod (`kubectl get np <name> -o yaml`) - The pods involved: `kubectl get pod <pod> -o wide --show-labels` for both ends - The Service definition if applicable - For Cilium: `cilium connectivity test` summary if available, or `cilium hubble observe --since 5m --to-pod ...` - For Calico: `calicoctl get globalnetworkpolicies` and `calicoctl get networkpolicies -A` Your job: 1. **Find every NetworkPolicy that matches the source pod**: `podSelector` matches AND namespace matches the policy's namespace. 2. **Find every NetworkPolicy that matches the target pod**: same logic for the target side. 3. **Determine the policy mode**: - In Kubernetes NetworkPolicy, **the moment any policy selects a pod, the default flips from "allow all" to "deny all" — for the direction(s) specified in `policyTypes`.** - Ingress restriction = applies to packets *to* the selected pod - Egress restriction = applies to packets *from* the selected pod - A pod selected by an Ingress-only policy has unrestricted egress UNTIL an Egress policy also selects it. 4. **For each direction**, list the union of allowed peers across all matching policies, then check whether the actual traffic matches. 5. **Check the protocol+port**: many policies forget to add UDP/53 for DNS, or only match TCP, breaking name resolution. 6. **Account for the CNI-specific extensions**: - Calico: `GlobalNetworkPolicy`, `Tier`, `order`, action-explicit (`Deny`) - Cilium: `CiliumNetworkPolicy`, `CiliumClusterwideNetworkPolicy`, L7 rules (HTTP/Kafka/gRPC), DNS-based egress - Both can drop traffic that a vanilla NetworkPolicy review wouldn't catch 7. **Walk the actual packet path**: - Pod IP → service ClusterIP (kube-proxy / IPVS DNAT) → backend pod IP - NetworkPolicy is evaluated *after* DNAT in most CNIs — your egress rule must allow the BACKEND IP, not the service IP 8. **Identify the missing rule** and propose the minimal-blast-radius fix: add ingress from `kube-dns`, add egress to `kube-apiserver`, add explicit allow for the namespace pair. 9. **Mark DESTRUCTIVE actions**: deleting policies, switching the default-deny tier, applying cluster-wide policies. Common traps: - **DNS broken** after first egress policy — every pod needs egress to `kube-dns` on UDP/53 (and TCP/53 for large responses). - **kube-apiserver unreachable** for in-cluster controllers — egress to API server (often on the host network) needs explicit allow. - **`namespaceSelector` without `podSelector`** allows ALL pods in that namespace — usually too broad. - **`ipBlock` for "external" misses egress through the service mesh** because traffic is DNAT'd to a pod IP first. - **Calico ordering**: explicit `Deny` in a higher-priority tier overrides a NetworkPolicy that would have allowed. - **Cilium FQDN policies**: DNS lookups must succeed first; FQDN cache lag causes intermittent drops. --- CNI + version: [Calico v3.27 / Cilium 1.15 / Weave / etc.] Symptom: [DESCRIBE] Source pod (`-o wide --show-labels`): ``` [PASTE] ``` Target pod / service (`-o wide --show-labels`): ``` [PASTE] ``` NetworkPolicies in source namespace: ```yaml [PASTE relevant np yamls] ``` NetworkPolicies in target namespace: ```yaml [PASTE] ``` CNI-specific policies (Calico GNP / Cilium CNP, if any): ```yaml [PASTE] ``` Reproduction (`kubectl exec` curl/nc output): ``` [PASTE] ```
Why this prompt works
NetworkPolicy debugging is unintuitive because the rules compose unusually: any policy selecting a pod flips that direction to default-deny, but only the directions in policyTypes. Two policies that each “allow” different things actually allow the union, not the intersection. Models routinely get this wrong without forcing them to enumerate all matching policies first.
How to use it
- List ALL policies matching each end before reasoning about allow/deny — not just the ones the user thinks are relevant.
- Always check DNS first. Once a pod has any egress policy, DNS lookups silently fail unless
kube-dnsegress is explicitly allowed. - Include the CNI:
calicoctlandciliumhave additional policy resources beyond whatkubectl get networkpolicyshows. - For Cilium, prefer Hubble flow logs over packet captures — Hubble shows the policy verdict per flow.
Useful commands
# Kubernetes NetworkPolicy view
kubectl get networkpolicy -A
kubectl get networkpolicy -n <ns> <name> -o yaml
kubectl describe networkpolicy -n <ns> <name>
# What policies select a pod?
POD_NS=foo; POD_NAME=bar
kubectl get networkpolicy -n $POD_NS -o json | \
jq '.items[] | select(.spec.podSelector.matchLabels as $sel | $sel == null or
($sel | to_entries | all(.value == (env.POD_LABELS | fromjson)[.key]))) | .metadata.name'
# (or simpler: read .spec.podSelector on each policy)
# Calico view
kubectl get globalnetworkpolicy
calicoctl get networkpolicies -A
calicoctl get globalnetworkpolicies -o wide
# Cilium view
kubectl get ciliumnetworkpolicy -A
kubectl get ciliumclusterwidenetworkpolicy
cilium status --verbose
cilium endpoint list
# Cilium flow observation (best tool for diagnosis)
cilium hubble observe --since 5m --to-pod <ns>/<pod>
cilium hubble observe --since 5m --from-pod <ns>/<pod> --verdict DROPPED
cilium hubble observe --pod <ns>/<pod> --output json | jq
# Reproduce from inside the source pod
kubectl exec -n <ns> <src-pod> -- nc -vz <target-svc> <port>
kubectl exec -n <ns> <src-pod> -- nslookup <svc>
kubectl exec -n <ns> <src-pod> -- curl -v --max-time 5 http://<svc>:<port>/
# Test bypassing NetworkPolicy temporarily (debug only, never in prod)
kubectl run -n <ns> debug --image=nicolaka/netshoot --rm -it --restart=Never -- bash
Common findings this catches
- DNS broken after applying first egress policy → add egress rule:
egress: - to: - namespaceSelector: matchLabels: kubernetes.io/metadata.name: kube-system podSelector: matchLabels: k8s-app: kube-dns ports: - protocol: UDP port: 53 - protocol: TCP port: 53 - In-cluster controller can’t reach apiserver → add egress to the apiserver endpoint (often
to: ipBlock: cidr: <node-cidr>+ port 443/6443). - Cross-namespace traffic works in dev, fails in prod → namespaces lack the label the
namespaceSelectormatches; in newer K8s,kubernetes.io/metadata.nameis auto-applied. - Policy “allows” service IP but traffic still drops → service IP gets DNAT’d to pod IP before policy evaluation; allow the backend pod IP/labels instead.
- Calico GlobalNetworkPolicy with
order: 0blocks everything else → explicit deny in a high-priority tier; check tier order.
Anti-patterns this catches
policyTypes: [Ingress]on a policy that includesegress:rules → the egress block is silently ignored.from: [](empty list) → matches NO pods, not all pods. Use- {}(one empty object) to match all.- Selecting
kube-systemwith default-deny — breaks the cluster. - Using
ipBlockfor in-cluster pod IPs — pod IPs are ephemeral; use selectors.
When to escalate
- Cluster-wide drops across multiple namespaces — likely a global policy or CNI control-plane issue, not per-app.
- Cilium control-plane issues (
cilium statusshows degraded) — engage platform team; this is bigger than one policy. - Cross-cluster service mesh traffic with NetworkPolicy involved — coordinate with mesh owner; verdicts may live in Envoy, not the CNI.
Related prompts
-
Kubernetes Ingress Troubleshooting Prompt
Diagnose Ingress routing failures, controller misconfiguration, TLS issues, 404/502/503 cascades, and path-vs-host mismatches across NGINX, Traefik, Contour, and HAProxy controllers.
-
Kubernetes Pod Troubleshooting Prompt
Diagnose any misbehaving pod — pending, evicted, networking-broken, storage-stuck, or just plain slow — with a structured AI walkthrough.
-
Kubernetes YAML Security Review Checklist Prompt
AI-driven security review of Kubernetes manifests — privilege, capabilities, network exposure, secret handling, and admission-policy compliance.