Skip to content
CloudOps
All prompts
AI for Kubernetes & Helm Difficulty: Intermediate ClaudeChatGPT

Kubernetes LoadBalancer / NodePort Service Debug Prompt

Diagnose LoadBalancer service issues — stuck Pending, externalTrafficPolicy: Local pitfalls, source IP preservation, cloud provider quirks, NodePort range collisions.

Target user
Kubernetes engineers debugging external traffic to Services
Difficulty
Intermediate
Tools
Claude, ChatGPT

The prompt

You are a senior Kubernetes engineer who has wired up LoadBalancer Services across AWS (CLB/NLB/ALB), GCP, Azure, MetalLB, and bare-metal. You know `externalTrafficPolicy: Local` is the right answer for source-IP preservation but trades for uneven load distribution.

I will provide:
- The symptom (LB stuck `Pending`, traffic doesn't reach pods, source IP shows as node IP, asymmetric load, health checks failing)
- The Service spec (`kubectl get svc <name> -o yaml`)
- LB controller logs (cloud-controller-manager, MetalLB, AWS Load Balancer Controller, etc.)
- The kube-proxy mode (iptables, ipvs, nftables — depends on version)
- `kubectl get endpoints <svc>` and `kubectl get endpointslices`

Your job:

1. **For Service stuck `Pending`** (no external IP):
   - Cloud controller manager running and healthy?
   - Quota: subnets, public IPs, ELB limits
   - For AWS: AWS Load Balancer Controller might require service annotations (`service.beta.kubernetes.io/aws-load-balancer-type: nlb`)
   - For MetalLB: pool exhausted, BGP not peering, L2 announcement issue
   - For bare-metal: no controller installed; need MetalLB or similar
2. **For `externalTrafficPolicy: Cluster` vs `Local`**:
   - **`Cluster`** (default) — packet may hop to any node, then SNAT to local pod. Source IP lost. Even distribution.
   - **`Local`** — packet must land on a node WITH a backend pod, else dropped. Source IP preserved. Uneven distribution (nodes without backends drop).
   - **For Local**: cloud LB health check must hit a node that has a pod; otherwise the node is unhealthy in LB's view
   - Health check port: usually a special NodePort `healthCheckNodePort`
3. **For "traffic doesn't reach pod"**:
   - `kubectl get endpoints <svc>` — empty? Selector doesn't match any pod labels. Or pods not Ready.
   - `kubectl get endpointslices` — modern view; same info distributed
   - Pod's `readinessProbe` failing → excluded from endpoints
   - NetworkPolicy blocking → see [network-policy-debug](/prompts/kubernetes-network-policy-debug/)
4. **For source IP showing as node IP** (NAT):
   - `externalTrafficPolicy: Cluster` SNATs to the receiving node
   - Solutions:
     - `externalTrafficPolicy: Local` (preserves source IP at cost of distribution)
     - Use NLB (AWS) with `service.beta.kubernetes.io/aws-load-balancer-nlb-target-type: ip` (target pods directly)
     - PROXY protocol if supported by app
5. **For NodePort range collisions**:
   - Default range: 30000-32767 (`--service-node-port-range` on apiserver)
   - Two Services trying same `nodePort:` → second one rejected
   - Auto-assigned ports avoid collision; manually-set ports must be unique
6. **For kube-proxy modes**:
   - **iptables** — rules per Service; large clusters → many rules; reload slow
   - **ipvs** — better for >1000 Services; uses LVS kernel
   - **nftables** (1.29+ alpha, GA later) — modern replacement for iptables mode
7. **For AWS Load Balancer Controller**:
   - `kubectl logs -n kube-system deploy/aws-load-balancer-controller`
   - Service annotations control LB type, scheme, target type
   - IRSA / IAM permissions essential
8. **For MetalLB**:
   - L2 mode: announces from one node per Service via ARP; failover on node down
   - BGP mode: announces to peer routers; requires switch config
   - `kubectl -n metallb-system logs -l app=metallb` shows decisions

Mark DESTRUCTIVE: changing `externalTrafficPolicy` live (brief gap in traffic during kube-proxy reconcile), removing the LB while it's in use (cloud LB delete is irreversible from K8s side), changing NodePort range (existing Services break).

---

Service + namespace: [DESCRIBE]
Symptom: [DESCRIBE]
Service spec:
```yaml
[PASTE]
```
`kubectl get endpoints <svc>` / `endpointslices`:
```
[PASTE]
```
Pod readiness state (`kubectl get pods -l <selector>`):
```
[PASTE]
```
LB controller logs:
```
[PASTE]
```
kube-proxy mode: [iptables / ipvs / nftables]

Why this prompt works

LoadBalancer Services touch four layers: K8s Service object, cloud-controller (or MetalLB), kube-proxy, pod readiness. A single misconfig at any layer drops traffic, and the symptom doesn’t say which. This prompt walks them.

How to use it

  1. Verify the Service has external IP (not stuck Pending).
  2. Verify endpoints exist — empty endpoints = pod selector or readiness issue, not networking.
  3. For source IP issues, decide between Local (preserves) and Cluster (distributes).
  4. For cloud-specific, include the cloud controller / LB controller logs.

Useful commands

# Service state
kubectl get svc <name>
kubectl describe svc <name>
kubectl get svc <name> -o yaml

# Endpoints
kubectl get endpoints <name>
kubectl get endpointslices -l kubernetes.io/service-name=<name>

# Pods + readiness
kubectl get pods -l <selector> -o wide
kubectl describe pod <pod>

# External test
curl -v <external-ip>:<port>/
# Or with header for x-forwarded
curl -v -H "X-Forwarded-For: 1.2.3.4" <external-ip>:<port>/

# kube-proxy
kubectl -n kube-system get pods -l k8s-app=kube-proxy
kubectl -n kube-system logs -l k8s-app=kube-proxy --tail=100

# AWS Load Balancer Controller
kubectl -n kube-system logs deploy/aws-load-balancer-controller --tail=200

# MetalLB
kubectl -n metallb-system logs -l app=metallb --tail=200

# Test from inside cluster
kubectl run test --rm -it --image=nicolaka/netshoot --restart=Never -- \
  curl -v http://<service>.<ns>.svc.cluster.local

externalTrafficPolicy choice

# Cluster (default): even distribution, source IP lost
apiVersion: v1
kind: Service
metadata:
  name: web
spec:
  type: LoadBalancer
  externalTrafficPolicy: Cluster
  selector: { app: web }
  ports:
  - { port: 80, targetPort: 8080 }

# Local: source IP preserved, only nodes with backends serve
apiVersion: v1
kind: Service
metadata:
  name: web
spec:
  type: LoadBalancer
  externalTrafficPolicy: Local
  selector: { app: web }
  ports:
  - { port: 80, targetPort: 8080 }

For Local, ensure backend pods are spread across nodes that the LB targets — otherwise drops.

AWS NLB with target-type ip (preserves source IP without Local)

apiVersion: v1
kind: Service
metadata:
  name: web
  annotations:
    service.beta.kubernetes.io/aws-load-balancer-type: "external"
    service.beta.kubernetes.io/aws-load-balancer-nlb-target-type: "ip"
    service.beta.kubernetes.io/aws-load-balancer-scheme: "internet-facing"
spec:
  type: LoadBalancer
  selector: { app: web }
  ports:
  - { port: 443, targetPort: 8443 }

MetalLB BGP-mode example

# IPAddressPool
apiVersion: metallb.io/v1beta1
kind: IPAddressPool
metadata:
  name: prod-pool
  namespace: metallb-system
spec:
  addresses:
  - 192.168.10.0/24
---
# BGPPeer
apiVersion: metallb.io/v1beta1
kind: BGPPeer
metadata:
  name: top-of-rack
  namespace: metallb-system
spec:
  peerAddress: 192.168.0.1
  peerASN: 65001
  myASN: 65002
---
# BGPAdvertisement
apiVersion: metallb.io/v1beta1
kind: BGPAdvertisement
metadata:
  name: prod
  namespace: metallb-system
spec:
  ipAddressPools: [prod-pool]

Common findings this catches

  • Service stuck Pending forever → no cloud controller (bare-metal); install MetalLB.
  • endpoints empty → selector doesn’t match pod labels OR pods all not Ready.
  • Source IP is node IPCluster policy SNATs; switch to Local or use NLB ip-target-type.
  • Local policy + uneven backend distribution → some LB nodes marked unhealthy; spread pods.
  • NodePort assigned > 32767 → range exhausted; raise --service-node-port-range (cluster admin).
  • AWS LB Controller fails to create LB → IRSA missing permissions; check controller logs.
  • MetalLB L2 not announcing → speaker pod not on the node hosting the IP, or no Service with that pool.
  • Connection works ONCE then failssessionAffinity: ClientIP with externalTrafficPolicy: Local and multi-node load.

When to escalate

  • Cloud quota limits hit — coordinate with cloud team; LB count, public IPs, ELB limits.
  • MetalLB BGP not peering → talk to network team; verify switch config + ASN.
  • Source IP preservation requirement that can’t use Local (e.g., need true cluster-wide distribution) — consider PROXY protocol or service mesh.

Related prompts

Newsletter

Get weekly AI workflows for DevOps engineers

Practical prompts, automation ideas, and tool reviews for infrastructure engineers. One email per week. No spam.