Kubernetes Error Guide: 'failed calling webhook' Admission Webhook Backend Failures
Fix 'Internal error occurred: failed calling webhook' in Kubernetes: diagnose webhook backend down, context deadline exceeded, connection refused, and x509 failures.
- #kubernetes-helm
- #troubleshooting
- #errors
- #admission
Exact Error Message
When the apiserver cannot reach (or cannot trust) an admission webhook backend, the request to create or update an object fails with failed calling webhook. The trailing clause names the transport-level cause:
Internal error occurred: failed calling webhook "vpod.kb.io": failed to call webhook: Post "https://webhook-svc.platform.svc:443/validate?timeout=10s": context deadline exceeded
Other backends produce the same wrapper with a different tail:
Internal error occurred: failed calling webhook "vpod.kb.io": failed to call webhook: Post "https://webhook-svc.platform.svc:443/validate": dial tcp 10.96.4.21:443: connect: connection refused
Internal error occurred: failed calling webhook "vpod.kb.io": failed to call webhook: Post "https://webhook-svc.platform.svc:443/validate": x509: certificate signed by unknown authority
The constant is failed calling webhook "<name>": failed to call webhook. The variable — context deadline exceeded, connection refused, no endpoints available, or x509: ... — is the actual diagnosis.
What the Error Means
A ValidatingWebhookConfiguration or MutatingWebhookConfiguration registers an external HTTPS endpoint that the apiserver must call during admission, before an object is persisted. When you create a Pod (or whatever the webhook matches), the apiserver makes an outbound TLS POST to the webhook service and waits for an allow/deny (or a patch) response.
failed calling webhook is not the webhook saying “denied” — that would be admission webhook ... denied the request. This error means the apiserver could not get an answer at all: it could not connect, the call timed out, or it could not verify the backend’s TLS certificate. Whether this blocks your request depends on the webhook’s failurePolicy: Fail (default) turns an unreachable webhook into a hard error that breaks admissions; Ignore lets the request proceed.
The danger is operational: a webhook with failurePolicy: Fail that matches core resources can wedge the cluster if its backend is down — you cannot create the very pods needed to bring the backend back. Reading the trailing clause tells you whether to fix networking, the backend, the timeout, or the CA bundle.
Common Causes
- Backend pods down / no endpoints — the webhook Service has zero ready endpoints (
no endpoints available for service). connection refused— backend is up but not listening on the target port, or a NetworkPolicy blocks the apiserver.context deadline exceeded— backend is slow; response exceeds the webhook’stimeoutSeconds(max 30s).x509: certificate signed by unknown authority— thecaBundlein the webhook config does not match the backend’s serving cert (rotation drift, cert-manager re-issue).- Expired serving cert —
x509: certificate has expiredon the webhook’s TLS cert. - Wrong Service/path/port —
clientConfigpoints at a nonexistent service or wrong path. - Namespace/object selector too broad — the webhook intercepts system namespaces and self-deadlocks.
How to Reproduce the Error
Register a webhook that points at a backend that does not exist, with the default failurePolicy: Fail:
apiVersion: admissionregistration.k8s.io/v1
kind: ValidatingWebhookConfiguration
metadata:
name: dead-webhook
webhooks:
- name: vpod.kb.io
admissionReviewVersions: ["v1"]
sideEffects: None
failurePolicy: Fail
rules:
- apiGroups: [""]
apiVersions: ["v1"]
operations: ["CREATE"]
resources: ["pods"]
clientConfig:
service:
name: nonexistent-svc
namespace: default
path: /validate
kubectl apply -f dead-webhook.yaml
kubectl run probe --image=busybox:1.36 --restart=Never -- true
Error from server (InternalError): Internal error occurred: failed calling webhook "vpod.kb.io": failed to call webhook: Post "https://nonexistent-svc.default.svc:443/validate?timeout=10s": no endpoints available for service "nonexistent-svc"
Every pod create now fails until you delete or fix the webhook.
Diagnostic Commands
# List webhook configs and their failurePolicy / target service
kubectl get validatingwebhookconfigurations,mutatingwebhookconfigurations
kubectl get validatingwebhookconfiguration <NAME> -o yaml | grep -A8 clientConfig
# Does the backend Service have ready endpoints?
kubectl get endpoints -n <NS> <WEBHOOK_SVC>
kubectl get pods -n <NS> -l <BACKEND_SELECTOR> -o wide
# Backend health and TLS
kubectl logs -n <NS> deploy/<WEBHOOK_BACKEND> --tail=50
# Compare the configured caBundle against the cert the backend actually serves
kubectl get validatingwebhookconfiguration <NAME> \
-o jsonpath='{.webhooks[0].clientConfig.caBundle}' | base64 -d | openssl x509 -noout -subject -issuer -dates
# apiserver-side view of the failed call
journalctl -u kube-apiserver --no-pager 2>/dev/null | grep -i 'failed calling webhook'
First check endpoints — a webhook Service with zero ready endpoints is the most common cause. If endpoints exist, the trailing clause (x509 vs deadline vs refused) tells you whether to look at certs, latency, or networking.
Step-by-Step Resolution
1. Read the trailing clause. It is the diagnosis. Route on it: no endpoints/refused → backend or network; deadline exceeded → latency/timeout; x509 → CA bundle or cert.
2. Restore the backend. If endpoints are empty, the webhook pods are down or unschedulable. Check why (CrashLoop, FailedScheduling) and get them Ready. Endpoints appearing fixes no endpoints available.
3. Fix connection refused. Backend is running but not reachable on the port. Confirm the container listens on the clientConfig port and that no NetworkPolicy blocks apiserver-origin traffic to that namespace.
4. Fix context deadline exceeded. The backend is too slow. Profile it, scale it for HA, and raise timeoutSeconds modestly (cap is 30s). A slow webhook on a hot resource adds latency to every matching admission.
5. Fix x509 failures. The caBundle no longer matches the serving cert — common after cert-manager rotation. Re-sync the CA bundle (cert-manager’s ca-injector or your automation), and confirm the dates with openssl x509. For “expired”, renew the serving cert.
6. Break a deadlock. If failurePolicy: Fail plus a broad selector has wedged the cluster, delete the webhook configuration to restore admissions, then re-apply it correctly with namespaceSelector exclusions for kube-system and an Ignore policy where appropriate.
Prevention and Best Practices
- Run webhook backends with multiple replicas and a PodDisruptionBudget so a node drain never zeroes endpoints.
- Scope webhooks tightly with
namespaceSelector/objectSelector; never let them interceptkube-systemor their own namespace. - Use
failurePolicy: Ignorefor non-critical webhooks so a backend outage degrades gracefully instead of blocking admissions. - Automate caBundle injection (cert-manager ca-injector) and alert on cert expiry to prevent
x509drift. - Keep
timeoutSecondslow and the backend fast; webhooks are in the synchronous admission path of every matching request. - Test webhook outages in staging — confirm the cluster still functions when the backend is down. More in our Kubernetes & Helm guides.
Related Errors
- admission webhook denied the request — the webhook answered with a denial; a different, intentional outcome.
- x509: certificate signed by unknown authority — the trust failure behind the
x509webhook variant. - probe failed: context deadline exceeded — the same timeout symptom in a different subsystem.
Frequently Asked Questions
Is failed calling webhook the same as the webhook denying my request? No. A denial is admission webhook "<name>" denied the request: <reason> and means the backend ran and said no. failed calling webhook means the apiserver never got a usable answer — a transport, timeout, or TLS failure.
Why did deleting one webhook config unblock my whole cluster? A webhook with failurePolicy: Fail matching core resources blocks every create/update when its backend is unreachable — including the pods needed to repair the backend. Removing the configuration restores admissions so you can fix it safely.
My webhook worked yesterday and now throws x509. What changed? Almost always a certificate rotation: cert-manager re-issued the backend’s serving cert but the caBundle in the webhook config was not updated to match. Re-sync the CA bundle and the handshake succeeds again.
Should I just set failurePolicy: Ignore everywhere? Only for non-security-critical webhooks. Ignore lets requests through when the backend is down, which is safer for availability but unsafe for policy/security webhooks where bypass is unacceptable. Choose per webhook.
Why does a slow webhook affect unrelated workloads? Admission webhooks run synchronously in the request path. A webhook matching pods adds its latency to every pod create cluster-wide, so a slow backend degrades all matching admissions, not just one app.
Download the Free 500-Prompt DevOps AI Toolkit
500 battle-tested, copy-paste AI prompts engineered by a senior systems engineer — every one with fill-in placeholders and safety/back-out notes. Drop your email and it's yours.
- 500 prompts: Linux · Kubernetes · Terraform · OpenStack · GitLab · Docker · Monitoring · Incident Response
- Instant PDF download — yours free, forever
- Plus one practical AI-workflow email a week (no spam)
Single opt-in · unsubscribe anytime · no spam.