Kubernetes Error Guide: 'dial tcp <ip>:<port>: i/o timeout'

Exact Error Message

An application, probe, or controller cannot reach another endpoint and the connection attempt hangs until it times out:

Error: failed calling webhook "validate.kyverno.svc": failed to call webhook:
Post "https://kyverno-svc.kyverno.svc:443/validate": dial tcp 10.244.3.17:443: i/o timeout

Liveness probe failed: Get "http://10.244.2.9:8080/healthz": dial tcp 10.244.2.9:8080: i/o timeout

dial tcp <ip>:<port>: i/o timeout means the client opened a socket and sent the TCP SYN, but no SYN-ACK ever came back before the deadline. The packet went into a black hole.

What the Error Means

A TCP connection starts with a SYN; the server replies SYN-ACK. i/o timeout means the SYN was sent but nothing returned in time — the connection never even reached the handshake’s second step. This is a silent drop, fundamentally different from connection refused (something actively rejected you) and no route to host (the network said the address is unreachable). Silence almost always means a firewall, policy, or path-MTU problem is dropping packets without sending a reset.

Because the target IP and port resolved fine and the SYN was emitted, DNS and the local stack are working. The failure is in the path between source and destination: a NetworkPolicy, a cloud security group, a CNI overlay issue, or an MTU mismatch that drops the handshake.

Common Causes

NetworkPolicy denial — a default-deny or restrictive NetworkPolicy silently drops traffic to the target pod/namespace.
Cloud security group / firewall — node or subnet security groups don’t allow the pod/overlay port (e.g. VXLAN 4789, WireGuard, or the app port between nodes).
MTU mismatch — the CNI overlay MTU is larger than the underlying network allows, so larger packets (handshake with options, or the first data segment) are silently dropped.
Cross-node overlay broken — the CNI tunnel between two nodes is down, so same-node traffic works but cross-node times out.
Wrong target IP — a stale endpoint or hand-typed pod IP that no longer exists.
Destination overloaded — the backend’s accept queue is full and SYNs are dropped under load.

How to Reproduce the Error

Apply a default-deny ingress policy, then try to reach a pod in that namespace:

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: default-deny
  namespace: secure
spec:
  podSelector: {}
  policyTypes: ["Ingress"]

kubectl apply -f default-deny.yaml
kubectl run probe --image=nicolaka/netshoot --rm -it --restart=Never -- \
  curl -m 5 http://<pod-ip-in-secure-ns>:8080/

curl: (28) Failed to connect to 10.244.4.5 port 8080 after 5001 ms: ... i/o timeout

The SYN is dropped by the policy and the dial times out.

Diagnostic Commands

# Does a NetworkPolicy select the destination pod/namespace?
kubectl get networkpolicy -A
kubectl describe networkpolicy -n <DEST-NS> <NAME>

# Confirm the target IP is a real, ready endpoint
kubectl get endpoints <SERVICE> -n <DEST-NS>
kubectl get pod -o wide -n <DEST-NS> | grep <DEST-IP>

# Test reachability from a debug pod on the SAME node, then a DIFFERENT node
kubectl run netshoot --image=nicolaka/netshoot --rm -it --restart=Never -- \
  sh -c 'nc -vz -w3 <DEST-IP> <PORT>; ping -c2 -M do -s 1472 <DEST-IP>'

# Check the CNI MTU configured on the node interfaces (read-only)
ip link show | grep -i mtu

# CNI agent health on both source and destination nodes
kubectl get pods -n kube-system -o wide | grep -E 'calico|cilium|flannel'

The key experiment is same-node vs. cross-node: if same-node works and cross-node times out, suspect the overlay, security groups, or MTU rather than a policy.

Step-by-Step Resolution

1. Rule out NetworkPolicy first. List policies in the destination namespace. A podSelector: {} with policyTypes: [Ingress] and no matching ingress rule denies everything. Add an explicit ingress rule allowing the source pods/namespace, or scope the deny more narrowly.

2. Test same-node vs. cross-node. From a netshoot pod on the destination’s node, nc -vz the target. If that works but the same test from another node times out, the per-pod policy is fine and the problem is the inter-node path (steps 3-4).

3. Open security groups / firewall. Ensure node and subnet firewalls allow the CNI’s encapsulation port (VXLAN UDP 4789, Geneve, IP-in-IP/protocol 4, or WireGuard UDP) and the app port between nodes. Cloud security groups blocking the overlay port produce exactly this timeout for cross-node traffic.

4. Check MTU. A handshake usually fits in a small packet, but the first large data segment can be dropped if the overlay MTU is too high for the substrate. Test with a do-not-fragment ping at the payload size:

ping -c2 -M do -s 1472 <DEST-IP>   # 1472+28 = 1500; lower until it succeeds

If small pings pass but large ones fail, lower the CNI MTU to fit the underlying network (overlay encapsulation needs ~50 bytes of headroom).

5. Verify the target is real and ready. Confirm the destination IP is an actual ready endpoint, not a stale pod IP. Use the Service/Endpoints, not a hard-coded address.

6. Re-test end to end. After the fix, the dial should complete promptly:

kubectl run netshoot --image=nicolaka/netshoot --rm -it --restart=Never -- nc -vz -w3 <DEST-IP> <PORT>

Prevention and Best Practices

Adopt NetworkPolicies deliberately: when you add default-deny, add the matching allow rules in the same change and test connectivity immediately.
Document and enforce the firewall/security-group ports your CNI overlay needs; treat them as cluster prerequisites.
Set the CNI MTU correctly for your environment (especially nested overlays and VPNs) and validate with do-not-fragment pings.
Always connect through Services/DNS, not hard-coded pod IPs, so endpoints stay current.
Monitor cross-node pod connectivity synthetically so overlay breakage is caught before apps are. More in the Kubernetes & Helm guides.

dial tcp connect: no route to host — the network actively reports the address unreachable, not a silent drop.
connection to the server was refused — something actively rejected the connection.
CNI request failed with status 400 — pods failing to get networking at all.
lookup no such host on cluster.local — when the failure is DNS rather than TCP.

Frequently Asked Questions

How is i/o timeout different from connection refused? A timeout means your SYN got no answer at all — it was silently dropped (firewall, policy, or black-hole route). connection refused means a host answered with a TCP reset because nothing is listening on that port. Timeout points at the path; refused points at the destination port.

Same-node works but cross-node times out. What’s wrong? Almost always the inter-node overlay: a blocked encapsulation port in a security group, a broken CNI tunnel, or an MTU mismatch. Pod-scoped NetworkPolicy would block both, so a same/cross-node difference points away from policy.

Could DNS be the cause? No — by the time you see dial tcp <ip>:<port>, the name already resolved to an IP. If resolution were failing you would see a lookup ... no such host error instead.

Why does a small request succeed but a large one time out? Classic MTU mismatch. The small handshake fits, but the first full-size data packet exceeds the path MTU and is dropped (often because the do-not-fragment bit is set). Lowering the overlay MTU fixes it.

My NetworkPolicy looks permissive but traffic still drops. Why? Check whether another policy in the namespace is more restrictive — policies are additive and a single default-deny anywhere selecting the pod removes all not-explicitly-allowed traffic. Also confirm your CNI actually enforces NetworkPolicy; some setups need it enabled.

Kubernetes Error Guide: 'dial tcp <ip>:<port>: i/o timeout' Connection Timeout

Exact Error Message

What the Error Means

Common Causes

How to Reproduce the Error

Diagnostic Commands

Step-by-Step Resolution

Prevention and Best Practices

Frequently Asked Questions

Download the Free 500-Prompt DevOps AI Toolkit

Exact Error Message

What the Error Means

Common Causes

How to Reproduce the Error

Diagnostic Commands

Step-by-Step Resolution

Prevention and Best Practices

Related Errors

Frequently Asked Questions

Download the Free 500-Prompt DevOps AI Toolkit