You are a senior Kubernetes engineer who has debugged DNS issues across CoreDNS configurations — `ndots` traps, search domain explosion, cache misses, NXDOMAIN floods. You know that "DNS slow" in K8s is usually 5 sequential lookups, not a slow upstream. I will provide: - The symptom (resolution fails, slow lookups, intermittent NXDOMAIN, lookups for `<svc>.cluster.local.<something>.cluster.local`) - `kubectl -n kube-system get pods -l k8s-app=kube-dns` and `kubectl -n kube-system get cm coredns -o yaml` - A reproduction from a pod: `kubectl exec <pod> -- nslookup <name>` or `dig <name>` - The pod's `/etc/resolv.conf` content - CoreDNS logs: `kubectl -n kube-system logs -l k8s-app=kube-dns --tail=200` - Pod's `dnsPolicy` (`ClusterFirst`, `Default`, `ClusterFirstWithHostNet`, `None`) Your job: 1. **Understand the lookup mechanics**: - Pod's `/etc/resolv.conf` typically has: ``` nameserver 10.96.0.10 # CoreDNS Service ClusterIP search <ns>.svc.cluster.local svc.cluster.local cluster.local <node-search> options ndots:5 ``` - **`ndots:5`** means: if the name has fewer than 5 dots, append each search domain and try - **Looking up `google.com`** with `ndots:5` → 4 NXDOMAIN attempts (`google.com.foo.svc.cluster.local`, `google.com.svc.cluster.local`, `google.com.cluster.local`, then `google.com`) - Slow external lookups in K8s are usually this. Either lower `ndots` (to 2 or 1) or use FQDN (`google.com.`) 2. **Common failure modes**: - **CoreDNS pods CrashLoopBackOff** → entire cluster DNS down; check pod logs - **CoreDNS pods OK but resolution fails** → pod's nameserver wrong, or network policy blocking - **Intermittent failures** → conntrack race on UDP DNS (well-known issue); fix with TCP DNS or NodeLocal DNS - **NXDOMAIN floods on external names** → ndots:5 + search domains; reduce ndots - **Slow startup** for apps doing many DNS lookups → search domain explosion 3. **For "service can't reach service"**: - In-cluster name: `<svc>` (in same NS), `<svc>.<ns>`, or `<svc>.<ns>.svc.cluster.local` - Verify service exists: `kubectl get svc <svc> -n <ns>` - Verify endpoints (`kubectl get endpoints <svc>`) — empty means selector mismatch 4. **For NodeLocal DNS Cache** (recommended): - DaemonSet `node-local-dns` runs on each node - Pod's `/etc/resolv.conf` points to local DNS (e.g., `169.254.20.10`) - Eliminates UDP-conntrack races; caches negative responses - Configured via `--cluster-dns` on kubelet for new nodes 5. **CoreDNS Corefile** common config: ``` .:53 { errors health ready kubernetes cluster.local in-addr.arpa ip6.arpa { pods insecure fallthrough in-addr.arpa ip6.arpa ttl 30 } prometheus :9153 forward . /etc/resolv.conf { max_concurrent 1000 } cache 30 loop reload loadbalance } ``` - Tune `cache` (default 30s) for hot lookups - `forward` defines upstream; can be `8.8.8.8` instead of `/etc/resolv.conf` 6. **For `dnsPolicy: ClusterFirstWithHostNet`** — required for pods with `hostNetwork: true` to use CoreDNS instead of node's DNS 7. **For `dnsConfig: nameservers:` override** — pod-level DNS config; useful for specific apps Mark DESTRUCTIVE: editing CoreDNS Corefile without test (whole-cluster DNS outage if broken), deleting CoreDNS pods without ensuring others are healthy. --- Symptom: [DESCRIBE] Pod's `/etc/resolv.conf`: ``` [PASTE — from kubectl exec <pod> -- cat /etc/resolv.conf] ``` Reproduction (nslookup output): ``` [PASTE] ``` CoreDNS pod status and logs: ``` [PASTE] ``` CoreDNS Corefile: ``` [PASTE `kubectl -n kube-system get cm coredns -o yaml`] ``` Pod's `dnsPolicy` + `dnsConfig`: ```yaml [PASTE] ```

Why this prompt works

DNS issues in K8s have specific cluster-side causes (CoreDNS, ndots, search domains) layered on top of normal DNS. Models tend to chase “the DNS server is slow” when the issue is the resolver doing 5 lookups per name. This prompt walks the resolver path.

How to use it

Always include the pod’s /etc/resolv.conf — it reveals ndots, search, nameserver.
Test with dig +short for clean output; with dig +trace for debugging.
Test BOTH in-cluster name and external name — different failure paths.
Check CoreDNS logs, not just the application’s logs.

Useful commands

# In-pod diagnostics
kubectl exec -n <ns> <pod> -- cat /etc/resolv.conf
kubectl exec -n <ns> <pod> -- nslookup kubernetes.default
kubectl exec -n <ns> <pod> -- nslookup kubernetes.default 10.96.0.10
kubectl exec -n <ns> <pod> -- nslookup google.com
kubectl exec -n <ns> <pod> -- getent hosts <name>

# Debug pod with dig
kubectl run dns-test --rm -it --image=nicolaka/netshoot --restart=Never -- bash
# Inside:
# dig +short kubernetes.default.svc.cluster.local
# dig +trace google.com
# nslookup -type=any kubernetes.default

# CoreDNS health
kubectl -n kube-system get pods -l k8s-app=kube-dns
kubectl -n kube-system describe pods -l k8s-app=kube-dns
kubectl -n kube-system logs -l k8s-app=kube-dns --tail=200
kubectl -n kube-system get cm coredns -o yaml

# CoreDNS metrics (Prometheus endpoint :9153)
kubectl -n kube-system port-forward <coredns-pod> 9153:9153
curl localhost:9153/metrics | grep coredns

# Service / endpoint
kubectl get svc <svc> -n <ns>
kubectl get endpoints <svc> -n <ns>

# NodeLocal DNS
kubectl -n kube-system get pods -l k8s-app=node-local-dns
kubectl -n kube-system get cm node-local-dns -o yaml

# Conntrack (DNS UDP issues)
kubectl exec -n <ns> <pod> -- conntrack -L | grep ":53"
# (requires conntrack tools in container; usually need debug pod)

Resolution test pattern

# Comprehensive DNS test
kubectl run dnsdebug --rm -it --image=nicolaka/netshoot --restart=Never -- bash -c '
echo "--- resolv.conf ---"
cat /etc/resolv.conf
echo "--- in-cluster (short name) ---"
nslookup kubernetes
echo "--- in-cluster (FQDN) ---"
nslookup kubernetes.default.svc.cluster.local
echo "--- external (with search expansion) ---"
nslookup google.com
echo "--- external (FQDN, no expansion) ---"
nslookup google.com.
echo "--- direct to CoreDNS Service ---"
dig @10.96.0.10 +short kubernetes.default.svc.cluster.local
'

Pod-level DNS tweak (for slow startup apps)

spec:
  dnsConfig:
    options:
    - name: ndots
      value: "2"                        # lower than default 5
  dnsPolicy: ClusterFirst              # default

NodeLocal DNS deployment (recommended)

# Pod uses local DNS cache at 169.254.20.10 instead of CoreDNS Service IP
# Eliminates UDP conntrack races + caches all responses including NXDOMAIN
# Deployed as DaemonSet in kube-system

# Kubelet args (on each node):
# --cluster-dns=169.254.20.10

Common findings this catches

External name lookups slow → ndots:5 + 4 search domains; use FQDN (google.com.) or lower ndots.
<svc>.cluster.local.svc.cluster.local in logs → app appended extra .cluster.local; not search-domain-aware.
NXDOMAIN flood for <svc> in different namespace → using short name; needs <svc>.<ns> form.
Intermittent failures under load → conntrack UDP race; install NodeLocal DNS.
CoreDNS forwarding loop detected in logs → upstream pointing back to CoreDNS; check forward config.
CoreDNS pod replicas too few for cluster size → scale up; aim for 2 per 1000 pods baseline.
No CoreDNS Service or wrong ClusterIP → kubelet --cluster-dns mismatch.

When to escalate

Cluster-wide DNS outage with CoreDNS healthy — upstream issue (node DNS, cloud VPC); engage networking.
Suspected CoreDNS plugin bug — file upstream, capture logs and Corefile.
Conntrack DNS race issues persist after NodeLocal DNS deployment — kernel/conntrack tuning; pull in platform team.

Reading prompts? Get all 500 in one free PDF

500 battle-tested, copy-paste AI prompts engineered by a senior systems engineer — every one with fill-in placeholders and safety/back-out notes. Drop your email and it's yours.

500 prompts: Linux · Kubernetes · Terraform · OpenStack · GitLab · Docker · Monitoring · Incident Response

Instant PDF download — yours free, forever

Plus one practical AI-workflow email a week (no spam)

Kubernetes CoreDNS Debugging Prompt

Why this prompt works

How to use it

Useful commands

Resolution test pattern

Pod-level DNS tweak (for slow startup apps)

NodeLocal DNS deployment (recommended)

Common findings this catches

When to escalate

Related prompts

Kubernetes NetworkPolicy Debug Prompt

Kubernetes LoadBalancer / NodePort Service Debug Prompt

Linux Host Network Connectivity Debug Prompt

NodeLocal DNSCache Performance Prompt

Reading prompts? Get all 500 in one free PDF