Kubernetes Error Guide: 'failed calling webhook' Admission

Exact Error Message

When the apiserver cannot reach (or cannot trust) an admission webhook backend, the request to create or update an object fails with failed calling webhook. The trailing clause names the transport-level cause:

Internal error occurred: failed calling webhook "vpod.kb.io": failed to call webhook: Post "https://webhook-svc.platform.svc:443/validate?timeout=10s": context deadline exceeded

Other backends produce the same wrapper with a different tail:

Internal error occurred: failed calling webhook "vpod.kb.io": failed to call webhook: Post "https://webhook-svc.platform.svc:443/validate": dial tcp 10.96.4.21:443: connect: connection refused

Internal error occurred: failed calling webhook "vpod.kb.io": failed to call webhook: Post "https://webhook-svc.platform.svc:443/validate": x509: certificate signed by unknown authority

The constant is failed calling webhook "<name>": failed to call webhook. The variable — context deadline exceeded, connection refused, no endpoints available, or x509: ... — is the actual diagnosis.

What the Error Means

A ValidatingWebhookConfiguration or MutatingWebhookConfiguration registers an external HTTPS endpoint that the apiserver must call during admission, before an object is persisted. When you create a Pod (or whatever the webhook matches), the apiserver makes an outbound TLS POST to the webhook service and waits for an allow/deny (or a patch) response.

failed calling webhook is not the webhook saying “denied” — that would be admission webhook ... denied the request. This error means the apiserver could not get an answer at all: it could not connect, the call timed out, or it could not verify the backend’s TLS certificate. Whether this blocks your request depends on the webhook’s failurePolicy: Fail (default) turns an unreachable webhook into a hard error that breaks admissions; Ignore lets the request proceed.

The danger is operational: a webhook with failurePolicy: Fail that matches core resources can wedge the cluster if its backend is down — you cannot create the very pods needed to bring the backend back. Reading the trailing clause tells you whether to fix networking, the backend, the timeout, or the CA bundle.

Common Causes

Backend pods down / no endpoints — the webhook Service has zero ready endpoints (no endpoints available for service).
connection refused — backend is up but not listening on the target port, or a NetworkPolicy blocks the apiserver.
context deadline exceeded — backend is slow; response exceeds the webhook’s timeoutSeconds (max 30s).
x509: certificate signed by unknown authority — the caBundle in the webhook config does not match the backend’s serving cert (rotation drift, cert-manager re-issue).
Expired serving cert — x509: certificate has expired on the webhook’s TLS cert.
Wrong Service/path/port — clientConfig points at a nonexistent service or wrong path.
Namespace/object selector too broad — the webhook intercepts system namespaces and self-deadlocks.

How to Reproduce the Error

apiVersion: admissionregistration.k8s.io/v1
kind: ValidatingWebhookConfiguration
metadata:
  name: dead-webhook
webhooks:
  - name: vpod.kb.io
    admissionReviewVersions: ["v1"]
    sideEffects: None
    failurePolicy: Fail
    rules:
      - apiGroups: [""]
        apiVersions: ["v1"]
        operations: ["CREATE"]
        resources: ["pods"]
    clientConfig:
      service:
        name: nonexistent-svc
        namespace: default
        path: /validate

kubectl apply -f dead-webhook.yaml
kubectl run probe --image=busybox:1.36 --restart=Never -- true

Error from server (InternalError): Internal error occurred: failed calling webhook "vpod.kb.io": failed to call webhook: Post "https://nonexistent-svc.default.svc:443/validate?timeout=10s": no endpoints available for service "nonexistent-svc"

Every pod create now fails until you delete or fix the webhook.

Diagnostic Commands

# List webhook configs and their failurePolicy / target service
kubectl get validatingwebhookconfigurations,mutatingwebhookconfigurations
kubectl get validatingwebhookconfiguration <NAME> -o yaml | grep -A8 clientConfig

# Does the backend Service have ready endpoints?
kubectl get endpoints -n <NS> <WEBHOOK_SVC>
kubectl get pods -n <NS> -l <BACKEND_SELECTOR> -o wide

# Backend health and TLS
kubectl logs -n <NS> deploy/<WEBHOOK_BACKEND> --tail=50

# Compare the configured caBundle against the cert the backend actually serves
kubectl get validatingwebhookconfiguration <NAME> \
  -o jsonpath='{.webhooks[0].clientConfig.caBundle}' | base64 -d | openssl x509 -noout -subject -issuer -dates

# apiserver-side view of the failed call
journalctl -u kube-apiserver --no-pager 2>/dev/null | grep -i 'failed calling webhook'

First check endpoints — a webhook Service with zero ready endpoints is the most common cause. If endpoints exist, the trailing clause (x509 vs deadline vs refused) tells you whether to look at certs, latency, or networking.

Step-by-Step Resolution

1. Read the trailing clause. It is the diagnosis. Route on it: no endpoints/refused → backend or network; deadline exceeded → latency/timeout; x509 → CA bundle or cert.

2. Restore the backend. If endpoints are empty, the webhook pods are down or unschedulable. Check why (CrashLoop, FailedScheduling) and get them Ready. Endpoints appearing fixes no endpoints available.

3. Fix connection refused. Backend is running but not reachable on the port. Confirm the container listens on the clientConfig port and that no NetworkPolicy blocks apiserver-origin traffic to that namespace.

4. Fix context deadline exceeded. The backend is too slow. Profile it, scale it for HA, and raise timeoutSeconds modestly (cap is 30s). A slow webhook on a hot resource adds latency to every matching admission.

5. Fix x509 failures. The caBundle no longer matches the serving cert — common after cert-manager rotation. Re-sync the CA bundle (cert-manager’s ca-injector or your automation), and confirm the dates with openssl x509. For “expired”, renew the serving cert.

6. Break a deadlock. If failurePolicy: Fail plus a broad selector has wedged the cluster, delete the webhook configuration to restore admissions, then re-apply it correctly with namespaceSelector exclusions for kube-system and an Ignore policy where appropriate.

Prevention and Best Practices

Run webhook backends with multiple replicas and a PodDisruptionBudget so a node drain never zeroes endpoints.
Scope webhooks tightly with namespaceSelector/objectSelector; never let them intercept kube-system or their own namespace.
Use failurePolicy: Ignore for non-critical webhooks so a backend outage degrades gracefully instead of blocking admissions.
Automate caBundle injection (cert-manager ca-injector) and alert on cert expiry to prevent x509 drift.
Keep timeoutSeconds low and the backend fast; webhooks are in the synchronous admission path of every matching request.
Test webhook outages in staging — confirm the cluster still functions when the backend is down. More in our Kubernetes & Helm guides.

admission webhook denied the request — the webhook answered with a denial; a different, intentional outcome.
x509: certificate signed by unknown authority — the trust failure behind the x509 webhook variant.
probe failed: context deadline exceeded — the same timeout symptom in a different subsystem.

Frequently Asked Questions

Is failed calling webhook the same as the webhook denying my request? No. A denial is admission webhook "<name>" denied the request: <reason> and means the backend ran and said no. failed calling webhook means the apiserver never got a usable answer — a transport, timeout, or TLS failure.

Why did deleting one webhook config unblock my whole cluster? A webhook with failurePolicy: Fail matching core resources blocks every create/update when its backend is unreachable — including the pods needed to repair the backend. Removing the configuration restores admissions so you can fix it safely.

My webhook worked yesterday and now throws x509. What changed? Almost always a certificate rotation: cert-manager re-issued the backend’s serving cert but the caBundle in the webhook config was not updated to match. Re-sync the CA bundle and the handshake succeeds again.

Should I just set failurePolicy: Ignore everywhere? Only for non-security-critical webhooks. Ignore lets requests through when the backend is down, which is safer for availability but unsafe for policy/security webhooks where bypass is unacceptable. Choose per webhook.

Why does a slow webhook affect unrelated workloads? Admission webhooks run synchronously in the request path. A webhook matching pods adds its latency to every pod create cluster-wide, so a slow backend degrades all matching admissions, not just one app.

Kubernetes Error Guide: 'failed calling webhook' Admission Webhook Backend Failures

Exact Error Message

What the Error Means

Common Causes

How to Reproduce the Error

Diagnostic Commands

Step-by-Step Resolution

Prevention and Best Practices

Frequently Asked Questions

Download the Free 500-Prompt DevOps AI Toolkit

Exact Error Message

What the Error Means

Common Causes

How to Reproduce the Error

Diagnostic Commands

Step-by-Step Resolution

Prevention and Best Practices

Related Errors

Frequently Asked Questions

Download the Free 500-Prompt DevOps AI Toolkit