Kubernetes Troubleshooting Toolkit
Fix Kubernetes deployment, pod, ingress, YAML, and cluster issues faster with prompts, validators, runbooks, and incident workflows built for real DevOps engineers.
Top Kubernetes errors
Start with the most common production issues and troubleshooting paths.
Client.Timeout exceeded while awaiting headers
Fix 'net/http: request canceled (Client.Timeout exceeded while awaiting headers)' in Kubernetes: diagnose slow apiserver, load…
CNI request failed with status 400
Fix 'networkPlugin cni failed: CNI request failed with status 400: failed to delegate add' in Kubernetes: Calico/Cilium/Flannel…
connect: connection refused
Fix dial tcp connection refused between pods and Services: app not listening, wrong targetPort, and readiness gaps. Distinct fr…
SERVFAIL
Fix CoreDNS SERVFAIL in Kubernetes: broken upstream resolvers, the loop plugin, and forward misconfiguration. Distinct from NXD…
rpc error: code = DeadlineExceeded
Fix 'rpc error: code = DeadlineExceeded, context deadline exceeded' in Kubernetes CSI attach/mount: slow cloud APIs, throttling…
DaemonSet does not have minimum availability
Fix a DaemonSet that won't run on every node: untolerated taints, nodeSelector mismatches, insufficient resources, and maxUnava…
dial tcp <ip>:<port>: i/o timeout
Fix 'dial tcp <ip>:<port>: i/o timeout' in Kubernetes: NetworkPolicy denials, cloud security groups, CNI MTU mismatch, and cros…
Error from server (AlreadyExists)
Fix Error from server (AlreadyExists) in kubectl: create vs apply, leftover resources, immutable fields, and ownership conflict…
Best Kubernetes prompts
Use these prompts to turn symptoms, logs, and config into a structured troubleshooting plan.
Kubernetes etcd Health, Backup & Restore
Operate the etcd backing store — health checks, snapshot backup, defragmentation, leader election issues, restore from snapshot.
Kubernetes Cluster Upgrade Pre-Flight Planning
Pre-upgrade safety review of a Kubernetes cluster going N → N+1 (or N+2 skip) — deprecated APIs, removed features, control-plane & node ordering, workload compatibility.
Helm Chart Review
Get a senior-engineer review of a Helm chart — values hygiene, template correctness, security defaults, upgrade safety.
Kubernetes Pod Troubleshooting
Diagnose any misbehaving pod — pending, evicted, networking-broken, storage-stuck, or just plain slow — with a structured AI walkthrough.
Free Kubernetes tools
Validate, troubleshoot, or analyze your configuration before production changes.
Kubernetes manifest validator
Catch missing fields, deprecated APIs, and bad Service/Ingress structure before you apply.
Open validatorYAML validator
Line-accurate YAML syntax and indentation checks for any manifest or values file.
Open validatorAI Incident Response Assistant
Paste pod events and logs, get a structured triage plan.
Start triageKubernetes runbook
Use a repeatable checklist for production troubleshooting.
A repeatable path for pods that won’t start, schedule, or stay healthy.
- 1 Confirm pod state and events (kubectl get pods -A, kubectl describe pod)
- 2 Check deployment rollout status and replica counts
- 3 Inspect container logs and readiness/liveness probes
- 4 Validate the manifest YAML (indentation, apiVersion, required fields)
- 5 Confirm Service/Ingress routing and DNS resolution in-cluster