Zero-Drop Rollouts With ProxyTerminatingEndpoints

“We drop maybe two hundred requests every deploy, but it’s only a blip, so nobody’s prioritized it.” I’ve heard a version of that sentence on three different teams, and it’s almost always the same root cause hiding behind a shrug. The deploy succeeds, the new pods come up healthy, and a small number of in-flight requests get a connection reset because the old pod stopped serving before every node’s kube-proxy stopped sending it traffic. It’s a race, it’s narrow, and it’s completely fixable once you understand the termination sequence.

The fix is partly drain hygiene on the pod and — for the trickier externalTrafficPolicy: Local case — a feature called ProxyTerminatingEndpoints that stops kube-proxy from blackholing the last serving endpoint on a draining node.

The termination sequence, step by step

When a pod is deleted (which a rollout does to every old pod), several things happen roughly in parallel:

The pod is marked Terminating; the kubelet sends SIGTERM to the container.
The endpoint controller updates the pod’s EndpointSlice entry to ready: false, serving: <still true if it passes readiness>, terminating: true.
Every node’s kube-proxy watches EndpointSlices and, on seeing the update, stops sending new connections to that pod.

The drop happens in the gap between step 1 and step 3. If the application reacts to SIGTERM by closing its listener immediately, it stops accepting connections while some kube-proxies — which haven’t yet observed the EndpointSlice change — are still routing new ones to it. Those connections get refused.

The preStop sleep that closes the gap

The standard fix is to make the pod keep serving until every kube-proxy has caught up. You do that with a preStop hook that sleeps, delaying SIGTERM’s effect, combined with a terminationGracePeriodSeconds long enough to cover the sleep plus draining in-flight work:

spec:
  terminationGracePeriodSeconds: 45
  containers:
    - name: app
      lifecycle:
        preStop:
          exec:
            command: ["/bin/sh", "-c", "sleep 15"]

The sequence becomes: pod marked Terminating → preStop sleep starts (pod keeps serving) → EndpointSlice updated → all kube-proxies stop sending new traffic → sleep ends → SIGTERM → app drains in-flight requests → exits. The 15 seconds is a placeholder; size it to the EndpointSlice propagation you actually observe, not a guess. Too short and you still drop; too long and every pod shutdown drags, slowing rollouts.

The grace period has to exceed preStop duration plus drain time, or the kubelet SIGKILLs the pod mid-request — re-introducing the very drops you’re trying to remove.

The externalTrafficPolicy: Local blackhole

Drain hygiene fixes the common case. There’s a sharper one with externalTrafficPolicy: Local, which is popular because it preserves the client’s source IP by only routing to pods on the same node the traffic arrived at. The problem: if a node has exactly one endpoint for the Service and that pod starts terminating, kube-proxy on that node has nowhere local to send traffic and blackholes it — drops it on the floor — until the endpoint is fully gone and external load balancers stop sending to that node.

This is what ProxyTerminatingEndpoints fixes. When enabled, kube-proxy will route to terminating-but-still-serving local endpoints rather than blackholing, bridging the window until the load balancer’s health check pulls the node out of rotation. It’s the difference between a graceful drain and a cliff for Local-policy Services.

apiVersion: v1
kind: Service
metadata:
  name: web
spec:
  type: LoadBalancer
  externalTrafficPolicy: Local   # preserves client IP; needs terminating-endpoint routing to drain cleanly
  selector:
    app: web
  ports:
    - port: 80
      targetPort: 8080

If you can’t rely on terminating-endpoint routing and the source IP isn’t load-bearing, switching to externalTrafficPolicy: Cluster restores availability — at the cost of an extra hop and SNAT that hides the real client IP. That’s a real trade-off for anything that logs or allowlists by client address.

Verify it with a curl you can actually trust

Theory is cheap; the only proof is a continuous request stream through a real rollout. Run a tight curl loop against the Service and trigger a deploy:

while true; do
  curl -s -o /dev/null -w "%{http_code}\n" http://web.example.com/healthz
done | sort | uniq -c

Zero non-200s across a full rollout means the drain is working. Any resets or 5xx mean the preStop sleep is too short, the grace period is too tight, or a Local-policy node is blackholing.

Prompt: Here is a Service (externalTrafficPolicy: Local, type LoadBalancer) and its Deployment with no preStop and a 10s grace period. We drop ~200 requests per rollout. Walk the termination timeline, identify where the drop happens, and produce fixed manifests plus a verification curl loop. Explanation and YAML only — no apply.

Output (excerpt): Drop occurs because SIGTERM closes the listener before EndpointSlice propagation, and on single-endpoint nodes the Local policy blackholes. Fix: add a preStop sleep sized to observed propagation, raise terminationGracePeriodSeconds above preStop + drain, and rely on terminating-endpoint routing for the Local policy. Verify with a continuous curl loop through a rollout expecting zero non-200s.

This is a good fit for an AI assistant because the failure is a timeline-reasoning problem, not a code bug — the model lays out the sequence, points at the gap, and drafts the manifests. I keep it advisory: grace-period mistakes can leave pods hanging on shutdown, so I apply the changes and watch the curl loop myself. Related service-networking debugging is in the Kubernetes & Helm guides, and graceful-shutdown patterns are in the prompt library.

Wrapping up

The “we drop a few requests every deploy” problem is a race between pod termination and kube-proxy convergence, and it yields to two things: a preStop sleep that keeps the pod serving until every proxy has dropped it, sized to real propagation rather than a guess, and — for externalTrafficPolicy: Local Services — terminating-endpoint routing so a draining node’s last endpoint doesn’t blackhole. Prove the fix with a continuous curl through a rollout, and let an AI assistant reason through the timeline while you keep the disruptive changes in human hands. More rollout and networking guides are in the Kubernetes & Helm guides.