Skip to content
DevOps AI ToolKit
Newsletter
All guides
AI for Kubernetes & Helm By James Joyner IV · · 9 min read

Kubernetes Error Guide: 'context deadline exceeded' API & Webhook Timeouts

Fix 'context deadline exceeded' in Kubernetes: diagnose slow API servers, sluggish etcd, down admission webhooks, network latency, low client timeouts, and DNS.

  • #kubernetes
  • #troubleshooting
  • #errors
  • #api

Exact Error Message

context deadline exceeded is Go’s way of saying a request ran past its deadline before it completed. In Kubernetes it surfaces in several places — the kubectl client, the API server, admission webhooks, and controllers — but the wording is consistent:

Unable to connect to the server: context deadline exceeded (Client.Timeout exceeded while awaiting headers)
Error from server (Timeout): the server was unable to return a response in the time allotted, but may still be processing the request
Internal error occurred: failed calling webhook "validate.kyverno.svc": failed to call webhook: Post "https://kyverno-svc.kyverno.svc:443/validate?timeout=10s": context deadline exceeded
E0625 14:02:11.337182  1 leaderelection.go:332] error retrieving resource lock kube-system/cert-manager-controller: context deadline exceeded

This guide covers the broad API/client/webhook/etcd timeout. The probe-specific case — readiness and liveness checks failing with this same wording — is covered separately in Kubernetes probe failed: context deadline exceeded.

What the Error Means

A context.Context in Go carries a deadline. When a Kubernetes component makes a network call — kubectl to the API server, the API server to a webhook, the API server to etcd — it attaches a deadline derived from a timeout. If the call does not return before the deadline, the context is cancelled and the in-flight operation fails with context deadline exceeded.

The key thing to identify is which call timed out:

  • Client to API serverkubectl or a controller could not get a response in time. The cause is in the path between client and apiserver, or the apiserver itself is slow.
  • API server to webhook — an admission or conversion webhook backend did not answer within its timeoutSeconds, so the apiserver gave up. Every create/update for the matching resource is now blocked.
  • API server to etcd — the apiserver’s read/write against etcd ran past its deadline; the apiserver returns Timeout to its caller.

The wording is generic, so always read the full message (and the URL inside webhook errors) to learn the hop that failed.

Common Causes

  • API server overloaded or slow — high request volume, expensive LIST calls without field/label selectors, or a memory-pressured apiserver increases latency until requests exceed their deadline.
  • etcd slow — the apiserver’s backing store is the most common hidden cause. Slow disk fsync or a leader election stalls reads and writes, which the apiserver surfaces as Timeout / context deadline exceeded.
  • Admission/conversion webhook backend down or slow — a validating, mutating, or CRD conversion webhook whose pods are down, scaled to zero, or slow to respond blocks the apiserver call. With failurePolicy: Fail, this hard-fails every matching request.
  • Network latency or MTU mismatch — packet loss, a broken MTU between nodes/overlay, or a saturated link adds enough latency that requests exceed their deadline intermittently.
  • Client timeout set too lowkubectl --request-timeout or a controller’s configured timeout is shorter than the operation legitimately needs (large list, slow webhook).
  • DNS resolution failures — the apiserver cannot resolve a webhook’s in-cluster Service name, or the client cannot resolve the apiserver host, so the connection stalls until the deadline.

How to Reproduce the Error

Register a validating webhook that points at a backend that does not exist (or is too slow), with failurePolicy: Fail. Every matching write then times out.

apiVersion: admissionregistration.k8s.io/v1
kind: ValidatingWebhookConfiguration
metadata:
  name: slow-validator
webhooks:
  - name: validate.example.svc
    clientConfig:
      service:
        name: validator        # Service has no ready endpoints
        namespace: default
        path: /validate
    rules:
      - apiGroups: [""]
        apiVersions: ["v1"]
        operations: ["CREATE"]
        resources: ["configmaps"]
    admissionReviewVersions: ["v1"]
    sideEffects: None
    failurePolicy: Fail
    timeoutSeconds: 5
kubectl create configmap repro-cm --from-literal=a=b
Error from server (InternalError): error when creating "STDIN": Internal error occurred: failed calling webhook "validate.example.svc": failed to call webhook: Post "https://validator.default.svc:443/validate?timeout=5s": context deadline exceeded

Diagnostic Commands

Turn up kubectl verbosity to see exactly which request stalls and how long it waited:

kubectl get nodes --v=8 --request-timeout=10s
GET https://10.0.0.10:6443/api/v1/nodes
...
I0625 14:05:31.882 round_trippers.go:466] curl -v -XGET ... 'https://10.0.0.10:6443/api/v1/nodes'
... (no response) ...
Unable to connect to the server: context deadline exceeded (Client.Timeout exceeded while awaiting headers)

Hit the apiserver’s health endpoints directly, bypassing webhooks and most controllers:

kubectl get --raw='/readyz?verbose'
kubectl get --raw='/healthz/etcd'
kubectl get --raw='/livez/ping'
[+]ping ok
[+]etcd ok
[-]etcd-readiness failed: reason withheld
readyz check failed

List webhook configurations and their failure policies — a misbehaving webhook is the most common blocker:

kubectl get validatingwebhookconfigurations -o custom-columns=NAME:.metadata.name,FAIL:'.webhooks[*].failurePolicy',TIMEOUT:'.webhooks[*].timeoutSeconds'
kubectl get mutatingwebhookconfigurations -o wide
kubectl get apiservices | grep -v Local | grep -i false   # aggregated APIs that are unavailable

Check apiserver and etcd request latency from the metrics endpoint:

kubectl get --raw='/metrics' | grep -E 'apiserver_request_duration_seconds_bucket|etcd_request_duration_seconds' | tail
kubectl get --raw='/metrics' | grep apiserver_admission_webhook_admission_duration_seconds_sum

Step-by-Step Resolution

  1. Identify the failing hop. Run the failing command with --v=8. A kubectl-side Client.Timeout exceeded while awaiting headers points at the client→apiserver path; an Error from server (Timeout) points at apiserver→etcd; a failed calling webhook message points at apiserver→webhook.

  2. If it is a webhook, find the backend and confirm it is reachable:

    kubectl get endpoints -n kyverno kyverno-svc
    kubectl get pods -n kyverno -l app.kubernetes.io/name=kyverno

    If the backend has no ready endpoints, scale or fix it. As an emergency unblock, relax or delete the webhook so cluster writes can proceed:

    kubectl delete validatingwebhookconfiguration slow-validator

    Then restore it with failurePolicy: Ignore or a namespaceSelector that excludes critical namespaces (kube-system) once the backend is healthy.

  3. If it is etcd, check etcd health and disk latency (see the dedicated etcdserver: request timed out guide). On a managed cluster, this is the provider’s responsibility — open a ticket with the latency evidence.

  4. If it is the apiserver under load, find the offending caller and reduce expensive calls:

    kubectl get --raw='/metrics' | grep apiserver_request_total | sort -t'"' -k0 | tail

    Add field/label selectors to hot LIST/WATCH controllers, enable API Priority and Fairness limits for noisy clients, and scale the control plane if it is self-managed.

  5. If it is the client, raise the timeout for legitimately slow operations and verify connectivity:

    kubectl get pods --all-namespaces --request-timeout=60s
    curl -k --max-time 5 https://10.0.0.10:6443/livez
  6. If it is DNS, confirm the apiserver can resolve in-cluster Service names used by webhooks, and that CoreDNS is healthy:

    kubectl -n kube-system get pods -l k8s-app=kube-dns
    kubectl run dnstest --rm -it --image=busybox --restart=Never -- nslookup kyverno-svc.kyverno.svc

Prevention and Best Practices

  • Set webhook failurePolicy: Ignore for non-critical policies, and always add a namespaceSelector that excludes kube-system so a webhook outage cannot brick the control plane.
  • Keep webhook timeoutSeconds low (1-5s) and run webhook backends with multiple replicas, a PodDisruptionBudget, and resource requests so they stay responsive.
  • Monitor etcd_disk_wal_fsync_duration_seconds and apiserver_request_duration_seconds with alerts; etcd disk latency is the leading cause of cluster-wide context deadline exceeded.
  • Use API Priority and Fairness and avoid unbounded LIST calls in controllers; add label/field selectors and use informers with resync, not polling.
  • Pin a sane --request-timeout in automation, but never paper over a real latency problem by simply raising it.

Frequently Asked Questions

Does context deadline exceeded mean my change was rejected? Not necessarily. For writes, the Error from server (Timeout) message explicitly warns the request “may still be processing.” The apiserver may have persisted the change to etcd even though it could not return a response in time. Re-check the object’s state with kubectl get before retrying a create.

A webhook is timing out and I can’t apply anything — how do I recover fast? Delete or relax the offending ValidatingWebhookConfiguration/MutatingWebhookConfiguration. With failurePolicy: Fail, a down webhook blocks every matching write, including the fix itself. Removing the webhook configuration is the standard emergency break-glass, then you restore it once the backend is healthy.

Why does the same kubectl command work sometimes and fail other times? Intermittent timeouts usually mean variable latency — etcd fsync spikes, periodic leader elections, network packet loss, or an autoscaling webhook backend. Look at the distribution of apiserver_request_duration_seconds, not just averages, and correlate failures with etcd and webhook latency.

Should I just raise --request-timeout? Only for operations that are legitimately slow, like a cluster-wide LIST of many objects. If a normally fast call is timing out, a higher client timeout hides the symptom while the underlying apiserver, etcd, or webhook latency keeps degrading. Fix the slow hop first.

Can DNS really cause this? Yes. If the apiserver cannot resolve a webhook’s Service DNS name, the connection hangs until the webhook deadline and fails as context deadline exceeded. Likewise, if a client cannot resolve the apiserver host it stalls until the client deadline. Always test name resolution along the failing hop.

Free download · 368-page PDF

Download the Free 500-Prompt DevOps AI Toolkit

500 battle-tested, copy-paste AI prompts engineered by a senior systems engineer — every one with fill-in placeholders and safety/back-out notes. Drop your email and it's yours.

  • 500 prompts: Linux · Kubernetes · Terraform · OpenStack · GitLab · Docker · Monitoring · Incident Response
  • Instant PDF download — yours free, forever
  • Plus one practical AI-workflow email a week (no spam)

Single opt-in · unsubscribe anytime · no spam.