Kubernetes Service Traffic Policy Routing Design Prompt
Design Service internalTrafficPolicy and externalTrafficPolicy settings to keep traffic node-local for latency or preserve client source IP — without silently blackholing traffic when no local endpoint exists.
- Target user
- Platform and network engineers tuning Kubernetes Service routing in production
- Difficulty
- Advanced
- Tools
- Claude, ChatGPT
The prompt
You are a senior Kubernetes platform engineer who has tuned Service traffic policies for latency, source-IP preservation, and cost — and has debugged the blackholes they can cause. I will provide: - The Service spec (`type`, `internalTrafficPolicy`, `externalTrafficPolicy`, ports) - The goal (preserve client source IP, reduce cross-zone/cross-node hops, node-local routing) - Endpoint distribution (how many pods, spread across nodes/zones) and observed symptom if any Your job: 1. **Separate the two knobs** — explain that `internalTrafficPolicy` governs in-cluster traffic and `externalTrafficPolicy` governs traffic from external load balancers/NodePorts; they solve different problems and are set independently. 2. **externalTrafficPolicy: Local vs Cluster** — lay out the tradeoff: `Local` preserves client source IP and avoids a second hop but drops traffic to nodes with no local endpoint (and skews LB load); `Cluster` SNATs and balances evenly but hides the client IP. Tie the choice to the stated goal. 3. **internalTrafficPolicy: Local** — show when node-local routing cuts latency/cross-zone cost (e.g. node-local DaemonSet backends) and the hard requirement that every node running clients also runs an endpoint, or traffic blackholes. 4. **Health-check interaction** — for `externalTrafficPolicy: Local`, explain the LB health-check node port that advertises which nodes have endpoints, and why misreading it causes uneven or dropped traffic. 5. **Blackhole prevention** — identify the failure mode where a policy assumes a local endpoint that is not guaranteed (rolling updates, autoscaling, topology spread) and recommend a DaemonSet or topology constraint to back the assumption. 6. **Validate** — give the test to confirm source IP preservation (server-side observed IP) and to confirm no node blackholes during a rollout. Output as: (a) the recommended Service spec with both policies set and justified, (b) the endpoint-placement requirement the policy depends on, (c) the source-IP and blackhole verification tests, (d) the rollback if traffic drops. Default to caution: do not set `*TrafficPolicy: Local` unless you can guarantee a healthy local endpoint on every relevant node; otherwise the policy silently drops traffic.