Kubernetes Error Guide: 'dial tcp <ip>:<port>: i/o timeout' Connection Timeout
Fix 'dial tcp <ip>:<port>: i/o timeout' in Kubernetes: NetworkPolicy denials, cloud security groups, CNI MTU mismatch, and cross-node pod connectivity.
- #kubernetes-helm
- #troubleshooting
- #errors
- #networking
Exact Error Message
An application, probe, or controller cannot reach another endpoint and the connection attempt hangs until it times out:
Error: failed calling webhook "validate.kyverno.svc": failed to call webhook:
Post "https://kyverno-svc.kyverno.svc:443/validate": dial tcp 10.244.3.17:443: i/o timeout
Liveness probe failed: Get "http://10.244.2.9:8080/healthz": dial tcp 10.244.2.9:8080: i/o timeout
dial tcp <ip>:<port>: i/o timeout means the client opened a socket and sent the TCP SYN, but no SYN-ACK ever came back before the deadline. The packet went into a black hole.
What the Error Means
A TCP connection starts with a SYN; the server replies SYN-ACK. i/o timeout means the SYN was sent but nothing returned in time — the connection never even reached the handshake’s second step. This is a silent drop, fundamentally different from connection refused (something actively rejected you) and no route to host (the network said the address is unreachable). Silence almost always means a firewall, policy, or path-MTU problem is dropping packets without sending a reset.
Because the target IP and port resolved fine and the SYN was emitted, DNS and the local stack are working. The failure is in the path between source and destination: a NetworkPolicy, a cloud security group, a CNI overlay issue, or an MTU mismatch that drops the handshake.
Common Causes
- NetworkPolicy denial — a default-deny or restrictive
NetworkPolicysilently drops traffic to the target pod/namespace. - Cloud security group / firewall — node or subnet security groups don’t allow the pod/overlay port (e.g. VXLAN 4789, WireGuard, or the app port between nodes).
- MTU mismatch — the CNI overlay MTU is larger than the underlying network allows, so larger packets (handshake with options, or the first data segment) are silently dropped.
- Cross-node overlay broken — the CNI tunnel between two nodes is down, so same-node traffic works but cross-node times out.
- Wrong target IP — a stale endpoint or hand-typed pod IP that no longer exists.
- Destination overloaded — the backend’s accept queue is full and SYNs are dropped under load.
How to Reproduce the Error
Apply a default-deny ingress policy, then try to reach a pod in that namespace:
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: default-deny
namespace: secure
spec:
podSelector: {}
policyTypes: ["Ingress"]
kubectl apply -f default-deny.yaml
kubectl run probe --image=nicolaka/netshoot --rm -it --restart=Never -- \
curl -m 5 http://<pod-ip-in-secure-ns>:8080/
curl: (28) Failed to connect to 10.244.4.5 port 8080 after 5001 ms: ... i/o timeout
The SYN is dropped by the policy and the dial times out.
Diagnostic Commands
# Does a NetworkPolicy select the destination pod/namespace?
kubectl get networkpolicy -A
kubectl describe networkpolicy -n <DEST-NS> <NAME>
# Confirm the target IP is a real, ready endpoint
kubectl get endpoints <SERVICE> -n <DEST-NS>
kubectl get pod -o wide -n <DEST-NS> | grep <DEST-IP>
# Test reachability from a debug pod on the SAME node, then a DIFFERENT node
kubectl run netshoot --image=nicolaka/netshoot --rm -it --restart=Never -- \
sh -c 'nc -vz -w3 <DEST-IP> <PORT>; ping -c2 -M do -s 1472 <DEST-IP>'
# Check the CNI MTU configured on the node interfaces (read-only)
ip link show | grep -i mtu
# CNI agent health on both source and destination nodes
kubectl get pods -n kube-system -o wide | grep -E 'calico|cilium|flannel'
The key experiment is same-node vs. cross-node: if same-node works and cross-node times out, suspect the overlay, security groups, or MTU rather than a policy.
Step-by-Step Resolution
1. Rule out NetworkPolicy first. List policies in the destination namespace. A podSelector: {} with policyTypes: [Ingress] and no matching ingress rule denies everything. Add an explicit ingress rule allowing the source pods/namespace, or scope the deny more narrowly.
2. Test same-node vs. cross-node. From a netshoot pod on the destination’s node, nc -vz the target. If that works but the same test from another node times out, the per-pod policy is fine and the problem is the inter-node path (steps 3-4).
3. Open security groups / firewall. Ensure node and subnet firewalls allow the CNI’s encapsulation port (VXLAN UDP 4789, Geneve, IP-in-IP/protocol 4, or WireGuard UDP) and the app port between nodes. Cloud security groups blocking the overlay port produce exactly this timeout for cross-node traffic.
4. Check MTU. A handshake usually fits in a small packet, but the first large data segment can be dropped if the overlay MTU is too high for the substrate. Test with a do-not-fragment ping at the payload size:
ping -c2 -M do -s 1472 <DEST-IP> # 1472+28 = 1500; lower until it succeeds
If small pings pass but large ones fail, lower the CNI MTU to fit the underlying network (overlay encapsulation needs ~50 bytes of headroom).
5. Verify the target is real and ready. Confirm the destination IP is an actual ready endpoint, not a stale pod IP. Use the Service/Endpoints, not a hard-coded address.
6. Re-test end to end. After the fix, the dial should complete promptly:
kubectl run netshoot --image=nicolaka/netshoot --rm -it --restart=Never -- nc -vz -w3 <DEST-IP> <PORT>
Prevention and Best Practices
- Adopt NetworkPolicies deliberately: when you add default-deny, add the matching allow rules in the same change and test connectivity immediately.
- Document and enforce the firewall/security-group ports your CNI overlay needs; treat them as cluster prerequisites.
- Set the CNI MTU correctly for your environment (especially nested overlays and VPNs) and validate with do-not-fragment pings.
- Always connect through Services/DNS, not hard-coded pod IPs, so endpoints stay current.
- Monitor cross-node pod connectivity synthetically so overlay breakage is caught before apps are. More in the Kubernetes & Helm guides.
Related Errors
- dial tcp connect: no route to host — the network actively reports the address unreachable, not a silent drop.
- connection to the server was refused — something actively rejected the connection.
- CNI request failed with status 400 — pods failing to get networking at all.
- lookup no such host on cluster.local — when the failure is DNS rather than TCP.
Frequently Asked Questions
How is i/o timeout different from connection refused? A timeout means your SYN got no answer at all — it was silently dropped (firewall, policy, or black-hole route). connection refused means a host answered with a TCP reset because nothing is listening on that port. Timeout points at the path; refused points at the destination port.
Same-node works but cross-node times out. What’s wrong? Almost always the inter-node overlay: a blocked encapsulation port in a security group, a broken CNI tunnel, or an MTU mismatch. Pod-scoped NetworkPolicy would block both, so a same/cross-node difference points away from policy.
Could DNS be the cause? No — by the time you see dial tcp <ip>:<port>, the name already resolved to an IP. If resolution were failing you would see a lookup ... no such host error instead.
Why does a small request succeed but a large one time out? Classic MTU mismatch. The small handshake fits, but the first full-size data packet exceeds the path MTU and is dropped (often because the do-not-fragment bit is set). Lowering the overlay MTU fixes it.
My NetworkPolicy looks permissive but traffic still drops. Why? Check whether another policy in the namespace is more restrictive — policies are additive and a single default-deny anywhere selecting the pod removes all not-explicitly-allowed traffic. Also confirm your CNI actually enforces NetworkPolicy; some setups need it enabled.
Download the Free 500-Prompt DevOps AI Toolkit
500 battle-tested, copy-paste AI prompts engineered by a senior systems engineer — every one with fill-in placeholders and safety/back-out notes. Drop your email and it's yours.
- 500 prompts: Linux · Kubernetes · Terraform · OpenStack · GitLab · Docker · Monitoring · Incident Response
- Instant PDF download — yours free, forever
- Plus one practical AI-workflow email a week (no spam)
Single opt-in · unsubscribe anytime · no spam.