Kubernetes LoadBalancer / NodePort Service Debug Prompt
Diagnose LoadBalancer service issues — stuck Pending, externalTrafficPolicy: Local pitfalls, source IP preservation, cloud provider quirks, NodePort range collisions.
- Target user
- Kubernetes engineers debugging external traffic to Services
- Difficulty
- Intermediate
- Tools
- Claude, ChatGPT
The prompt
You are a senior Kubernetes engineer who has wired up LoadBalancer Services across AWS (CLB/NLB/ALB), GCP, Azure, MetalLB, and bare-metal. You know `externalTrafficPolicy: Local` is the right answer for source-IP preservation but trades for uneven load distribution.
I will provide:
- The symptom (LB stuck `Pending`, traffic doesn't reach pods, source IP shows as node IP, asymmetric load, health checks failing)
- The Service spec (`kubectl get svc <name> -o yaml`)
- LB controller logs (cloud-controller-manager, MetalLB, AWS Load Balancer Controller, etc.)
- The kube-proxy mode (iptables, ipvs, nftables — depends on version)
- `kubectl get endpoints <svc>` and `kubectl get endpointslices`
Your job:
1. **For Service stuck `Pending`** (no external IP):
- Cloud controller manager running and healthy?
- Quota: subnets, public IPs, ELB limits
- For AWS: AWS Load Balancer Controller might require service annotations (`service.beta.kubernetes.io/aws-load-balancer-type: nlb`)
- For MetalLB: pool exhausted, BGP not peering, L2 announcement issue
- For bare-metal: no controller installed; need MetalLB or similar
2. **For `externalTrafficPolicy: Cluster` vs `Local`**:
- **`Cluster`** (default) — packet may hop to any node, then SNAT to local pod. Source IP lost. Even distribution.
- **`Local`** — packet must land on a node WITH a backend pod, else dropped. Source IP preserved. Uneven distribution (nodes without backends drop).
- **For Local**: cloud LB health check must hit a node that has a pod; otherwise the node is unhealthy in LB's view
- Health check port: usually a special NodePort `healthCheckNodePort`
3. **For "traffic doesn't reach pod"**:
- `kubectl get endpoints <svc>` — empty? Selector doesn't match any pod labels. Or pods not Ready.
- `kubectl get endpointslices` — modern view; same info distributed
- Pod's `readinessProbe` failing → excluded from endpoints
- NetworkPolicy blocking → see [network-policy-debug](/prompts/kubernetes-network-policy-debug/)
4. **For source IP showing as node IP** (NAT):
- `externalTrafficPolicy: Cluster` SNATs to the receiving node
- Solutions:
- `externalTrafficPolicy: Local` (preserves source IP at cost of distribution)
- Use NLB (AWS) with `service.beta.kubernetes.io/aws-load-balancer-nlb-target-type: ip` (target pods directly)
- PROXY protocol if supported by app
5. **For NodePort range collisions**:
- Default range: 30000-32767 (`--service-node-port-range` on apiserver)
- Two Services trying same `nodePort:` → second one rejected
- Auto-assigned ports avoid collision; manually-set ports must be unique
6. **For kube-proxy modes**:
- **iptables** — rules per Service; large clusters → many rules; reload slow
- **ipvs** — better for >1000 Services; uses LVS kernel
- **nftables** (1.29+ alpha, GA later) — modern replacement for iptables mode
7. **For AWS Load Balancer Controller**:
- `kubectl logs -n kube-system deploy/aws-load-balancer-controller`
- Service annotations control LB type, scheme, target type
- IRSA / IAM permissions essential
8. **For MetalLB**:
- L2 mode: announces from one node per Service via ARP; failover on node down
- BGP mode: announces to peer routers; requires switch config
- `kubectl -n metallb-system logs -l app=metallb` shows decisions
Mark DESTRUCTIVE: changing `externalTrafficPolicy` live (brief gap in traffic during kube-proxy reconcile), removing the LB while it's in use (cloud LB delete is irreversible from K8s side), changing NodePort range (existing Services break).
---
Service + namespace: [DESCRIBE]
Symptom: [DESCRIBE]
Service spec:
```yaml
[PASTE]
```
`kubectl get endpoints <svc>` / `endpointslices`:
```
[PASTE]
```
Pod readiness state (`kubectl get pods -l <selector>`):
```
[PASTE]
```
LB controller logs:
```
[PASTE]
```
kube-proxy mode: [iptables / ipvs / nftables]
Why this prompt works
LoadBalancer Services touch four layers: K8s Service object, cloud-controller (or MetalLB), kube-proxy, pod readiness. A single misconfig at any layer drops traffic, and the symptom doesn’t say which. This prompt walks them.
How to use it
- Verify the Service has external IP (not stuck Pending).
- Verify endpoints exist — empty endpoints = pod selector or readiness issue, not networking.
- For source IP issues, decide between
Local(preserves) andCluster(distributes). - For cloud-specific, include the cloud controller / LB controller logs.
Useful commands
# Service state
kubectl get svc <name>
kubectl describe svc <name>
kubectl get svc <name> -o yaml
# Endpoints
kubectl get endpoints <name>
kubectl get endpointslices -l kubernetes.io/service-name=<name>
# Pods + readiness
kubectl get pods -l <selector> -o wide
kubectl describe pod <pod>
# External test
curl -v <external-ip>:<port>/
# Or with header for x-forwarded
curl -v -H "X-Forwarded-For: 1.2.3.4" <external-ip>:<port>/
# kube-proxy
kubectl -n kube-system get pods -l k8s-app=kube-proxy
kubectl -n kube-system logs -l k8s-app=kube-proxy --tail=100
# AWS Load Balancer Controller
kubectl -n kube-system logs deploy/aws-load-balancer-controller --tail=200
# MetalLB
kubectl -n metallb-system logs -l app=metallb --tail=200
# Test from inside cluster
kubectl run test --rm -it --image=nicolaka/netshoot --restart=Never -- \
curl -v http://<service>.<ns>.svc.cluster.local
externalTrafficPolicy choice
# Cluster (default): even distribution, source IP lost
apiVersion: v1
kind: Service
metadata:
name: web
spec:
type: LoadBalancer
externalTrafficPolicy: Cluster
selector: { app: web }
ports:
- { port: 80, targetPort: 8080 }
# Local: source IP preserved, only nodes with backends serve
apiVersion: v1
kind: Service
metadata:
name: web
spec:
type: LoadBalancer
externalTrafficPolicy: Local
selector: { app: web }
ports:
- { port: 80, targetPort: 8080 }
For Local, ensure backend pods are spread across nodes that the LB targets — otherwise drops.
AWS NLB with target-type ip (preserves source IP without Local)
apiVersion: v1
kind: Service
metadata:
name: web
annotations:
service.beta.kubernetes.io/aws-load-balancer-type: "external"
service.beta.kubernetes.io/aws-load-balancer-nlb-target-type: "ip"
service.beta.kubernetes.io/aws-load-balancer-scheme: "internet-facing"
spec:
type: LoadBalancer
selector: { app: web }
ports:
- { port: 443, targetPort: 8443 }
MetalLB BGP-mode example
# IPAddressPool
apiVersion: metallb.io/v1beta1
kind: IPAddressPool
metadata:
name: prod-pool
namespace: metallb-system
spec:
addresses:
- 192.168.10.0/24
---
# BGPPeer
apiVersion: metallb.io/v1beta1
kind: BGPPeer
metadata:
name: top-of-rack
namespace: metallb-system
spec:
peerAddress: 192.168.0.1
peerASN: 65001
myASN: 65002
---
# BGPAdvertisement
apiVersion: metallb.io/v1beta1
kind: BGPAdvertisement
metadata:
name: prod
namespace: metallb-system
spec:
ipAddressPools: [prod-pool]
Common findings this catches
- Service stuck Pending forever → no cloud controller (bare-metal); install MetalLB.
endpointsempty → selector doesn’t match pod labels OR pods all not Ready.- Source IP is node IP →
Clusterpolicy SNATs; switch toLocalor use NLB ip-target-type. Localpolicy + uneven backend distribution → some LB nodes marked unhealthy; spread pods.- NodePort assigned > 32767 → range exhausted; raise
--service-node-port-range(cluster admin). - AWS LB Controller fails to create LB → IRSA missing permissions; check controller logs.
- MetalLB L2 not announcing → speaker pod not on the node hosting the IP, or no Service with that pool.
- Connection works ONCE then fails →
sessionAffinity: ClientIPwithexternalTrafficPolicy: Localand multi-node load.
When to escalate
- Cloud quota limits hit — coordinate with cloud team; LB count, public IPs, ELB limits.
- MetalLB BGP not peering → talk to network team; verify switch config + ASN.
- Source IP preservation requirement that can’t use
Local(e.g., need true cluster-wide distribution) — consider PROXY protocol or service mesh.
Related prompts
-
Kubernetes Ingress Troubleshooting Prompt
Diagnose Ingress routing failures, controller misconfiguration, TLS issues, 404/502/503 cascades, and path-vs-host mismatches across NGINX, Traefik, Contour, and HAProxy controllers.
-
Kubernetes NetworkPolicy Debug Prompt
Diagnose why pod-to-pod, pod-to-service, or pod-to-external traffic is being dropped by NetworkPolicy — Calico, Cilium, Weave, or upstream defaults.
-
Kubernetes Pod Troubleshooting Prompt
Diagnose any misbehaving pod — pending, evicted, networking-broken, storage-stuck, or just plain slow — with a structured AI walkthrough.