Kubernetes Error Guide: 'context deadline exceeded' API & Webhook Timeouts
Fix 'context deadline exceeded' in Kubernetes: diagnose slow API servers, sluggish etcd, down admission webhooks, network latency, low client timeouts, and DNS.
- #kubernetes
- #troubleshooting
- #errors
- #api
Exact Error Message
context deadline exceeded is Go’s way of saying a request ran past its deadline before it completed. In Kubernetes it surfaces in several places — the kubectl client, the API server, admission webhooks, and controllers — but the wording is consistent:
Unable to connect to the server: context deadline exceeded (Client.Timeout exceeded while awaiting headers)
Error from server (Timeout): the server was unable to return a response in the time allotted, but may still be processing the request
Internal error occurred: failed calling webhook "validate.kyverno.svc": failed to call webhook: Post "https://kyverno-svc.kyverno.svc:443/validate?timeout=10s": context deadline exceeded
E0625 14:02:11.337182 1 leaderelection.go:332] error retrieving resource lock kube-system/cert-manager-controller: context deadline exceeded
This guide covers the broad API/client/webhook/etcd timeout. The probe-specific case — readiness and liveness checks failing with this same wording — is covered separately in Kubernetes probe failed: context deadline exceeded.
What the Error Means
A context.Context in Go carries a deadline. When a Kubernetes component makes a network call — kubectl to the API server, the API server to a webhook, the API server to etcd — it attaches a deadline derived from a timeout. If the call does not return before the deadline, the context is cancelled and the in-flight operation fails with context deadline exceeded.
The key thing to identify is which call timed out:
- Client to API server —
kubectlor a controller could not get a response in time. The cause is in the path between client and apiserver, or the apiserver itself is slow. - API server to webhook — an admission or conversion webhook backend did not answer within its
timeoutSeconds, so the apiserver gave up. Every create/update for the matching resource is now blocked. - API server to etcd — the apiserver’s read/write against etcd ran past its deadline; the apiserver returns
Timeoutto its caller.
The wording is generic, so always read the full message (and the URL inside webhook errors) to learn the hop that failed.
Common Causes
- API server overloaded or slow — high request volume, expensive
LISTcalls without field/label selectors, or a memory-pressured apiserver increases latency until requests exceed their deadline. - etcd slow — the apiserver’s backing store is the most common hidden cause. Slow disk fsync or a leader election stalls reads and writes, which the apiserver surfaces as
Timeout/context deadline exceeded. - Admission/conversion webhook backend down or slow — a validating, mutating, or CRD conversion webhook whose pods are down, scaled to zero, or slow to respond blocks the apiserver call. With
failurePolicy: Fail, this hard-fails every matching request. - Network latency or MTU mismatch — packet loss, a broken MTU between nodes/overlay, or a saturated link adds enough latency that requests exceed their deadline intermittently.
- Client timeout set too low —
kubectl --request-timeoutor a controller’s configured timeout is shorter than the operation legitimately needs (large list, slow webhook). - DNS resolution failures — the apiserver cannot resolve a webhook’s in-cluster Service name, or the client cannot resolve the apiserver host, so the connection stalls until the deadline.
How to Reproduce the Error
Register a validating webhook that points at a backend that does not exist (or is too slow), with failurePolicy: Fail. Every matching write then times out.
apiVersion: admissionregistration.k8s.io/v1
kind: ValidatingWebhookConfiguration
metadata:
name: slow-validator
webhooks:
- name: validate.example.svc
clientConfig:
service:
name: validator # Service has no ready endpoints
namespace: default
path: /validate
rules:
- apiGroups: [""]
apiVersions: ["v1"]
operations: ["CREATE"]
resources: ["configmaps"]
admissionReviewVersions: ["v1"]
sideEffects: None
failurePolicy: Fail
timeoutSeconds: 5
kubectl create configmap repro-cm --from-literal=a=b
Error from server (InternalError): error when creating "STDIN": Internal error occurred: failed calling webhook "validate.example.svc": failed to call webhook: Post "https://validator.default.svc:443/validate?timeout=5s": context deadline exceeded
Diagnostic Commands
Turn up kubectl verbosity to see exactly which request stalls and how long it waited:
kubectl get nodes --v=8 --request-timeout=10s
GET https://10.0.0.10:6443/api/v1/nodes
...
I0625 14:05:31.882 round_trippers.go:466] curl -v -XGET ... 'https://10.0.0.10:6443/api/v1/nodes'
... (no response) ...
Unable to connect to the server: context deadline exceeded (Client.Timeout exceeded while awaiting headers)
Hit the apiserver’s health endpoints directly, bypassing webhooks and most controllers:
kubectl get --raw='/readyz?verbose'
kubectl get --raw='/healthz/etcd'
kubectl get --raw='/livez/ping'
[+]ping ok
[+]etcd ok
[-]etcd-readiness failed: reason withheld
readyz check failed
List webhook configurations and their failure policies — a misbehaving webhook is the most common blocker:
kubectl get validatingwebhookconfigurations -o custom-columns=NAME:.metadata.name,FAIL:'.webhooks[*].failurePolicy',TIMEOUT:'.webhooks[*].timeoutSeconds'
kubectl get mutatingwebhookconfigurations -o wide
kubectl get apiservices | grep -v Local | grep -i false # aggregated APIs that are unavailable
Check apiserver and etcd request latency from the metrics endpoint:
kubectl get --raw='/metrics' | grep -E 'apiserver_request_duration_seconds_bucket|etcd_request_duration_seconds' | tail
kubectl get --raw='/metrics' | grep apiserver_admission_webhook_admission_duration_seconds_sum
Step-by-Step Resolution
-
Identify the failing hop. Run the failing command with
--v=8. Akubectl-sideClient.Timeout exceeded while awaiting headerspoints at the client→apiserver path; anError from server (Timeout)points at apiserver→etcd; afailed calling webhookmessage points at apiserver→webhook. -
If it is a webhook, find the backend and confirm it is reachable:
kubectl get endpoints -n kyverno kyverno-svc kubectl get pods -n kyverno -l app.kubernetes.io/name=kyvernoIf the backend has no ready endpoints, scale or fix it. As an emergency unblock, relax or delete the webhook so cluster writes can proceed:
kubectl delete validatingwebhookconfiguration slow-validatorThen restore it with
failurePolicy: Ignoreor anamespaceSelectorthat excludes critical namespaces (kube-system) once the backend is healthy. -
If it is etcd, check etcd health and disk latency (see the dedicated etcdserver: request timed out guide). On a managed cluster, this is the provider’s responsibility — open a ticket with the latency evidence.
-
If it is the apiserver under load, find the offending caller and reduce expensive calls:
kubectl get --raw='/metrics' | grep apiserver_request_total | sort -t'"' -k0 | tailAdd field/label selectors to hot
LIST/WATCHcontrollers, enable API Priority and Fairness limits for noisy clients, and scale the control plane if it is self-managed. -
If it is the client, raise the timeout for legitimately slow operations and verify connectivity:
kubectl get pods --all-namespaces --request-timeout=60s curl -k --max-time 5 https://10.0.0.10:6443/livez -
If it is DNS, confirm the apiserver can resolve in-cluster Service names used by webhooks, and that CoreDNS is healthy:
kubectl -n kube-system get pods -l k8s-app=kube-dns kubectl run dnstest --rm -it --image=busybox --restart=Never -- nslookup kyverno-svc.kyverno.svc
Prevention and Best Practices
- Set webhook
failurePolicy: Ignorefor non-critical policies, and always add anamespaceSelectorthat excludeskube-systemso a webhook outage cannot brick the control plane. - Keep webhook
timeoutSecondslow (1-5s) and run webhook backends with multiple replicas, a PodDisruptionBudget, and resource requests so they stay responsive. - Monitor
etcd_disk_wal_fsync_duration_secondsandapiserver_request_duration_secondswith alerts; etcd disk latency is the leading cause of cluster-widecontext deadline exceeded. - Use API Priority and Fairness and avoid unbounded
LISTcalls in controllers; add label/field selectors and use informers with resync, not polling. - Pin a sane
--request-timeoutin automation, but never paper over a real latency problem by simply raising it.
Related Errors
- Kubernetes probe failed: context deadline exceeded — the same wording from readiness/liveness probes rather than the API path.
- The connection to the server was refused — the apiserver is down entirely rather than slow.
- etcdserver: request timed out — the etcd backend that frequently causes apiserver timeouts.
Error from server (ServiceUnavailable)— an aggregated API or webhook backend is unreachable.
Frequently Asked Questions
Does context deadline exceeded mean my change was rejected?
Not necessarily. For writes, the Error from server (Timeout) message explicitly warns the request “may still be processing.” The apiserver may have persisted the change to etcd even though it could not return a response in time. Re-check the object’s state with kubectl get before retrying a create.
A webhook is timing out and I can’t apply anything — how do I recover fast?
Delete or relax the offending ValidatingWebhookConfiguration/MutatingWebhookConfiguration. With failurePolicy: Fail, a down webhook blocks every matching write, including the fix itself. Removing the webhook configuration is the standard emergency break-glass, then you restore it once the backend is healthy.
Why does the same kubectl command work sometimes and fail other times?
Intermittent timeouts usually mean variable latency — etcd fsync spikes, periodic leader elections, network packet loss, or an autoscaling webhook backend. Look at the distribution of apiserver_request_duration_seconds, not just averages, and correlate failures with etcd and webhook latency.
Should I just raise --request-timeout?
Only for operations that are legitimately slow, like a cluster-wide LIST of many objects. If a normally fast call is timing out, a higher client timeout hides the symptom while the underlying apiserver, etcd, or webhook latency keeps degrading. Fix the slow hop first.
Can DNS really cause this?
Yes. If the apiserver cannot resolve a webhook’s Service DNS name, the connection hangs until the webhook deadline and fails as context deadline exceeded. Likewise, if a client cannot resolve the apiserver host it stalls until the client deadline. Always test name resolution along the failing hop.
Download the Free 500-Prompt DevOps AI Toolkit
500 battle-tested, copy-paste AI prompts engineered by a senior systems engineer — every one with fill-in placeholders and safety/back-out notes. Drop your email and it's yours.
- 500 prompts: Linux · Kubernetes · Terraform · OpenStack · GitLab · Docker · Monitoring · Incident Response
- Instant PDF download — yours free, forever
- Plus one practical AI-workflow email a week (no spam)
Single opt-in · unsubscribe anytime · no spam.