Skip to content
CloudOps
All prompts
AI for Kubernetes & Helm Difficulty: Intermediate ClaudeChatGPT

Kubernetes CoreDNS Debugging Prompt

Diagnose Kubernetes DNS issues — CoreDNS not resolving, ndots traps, search domain explosion, NXDOMAIN floods, conntrack DNS races.

Target user
Kubernetes engineers debugging in-cluster DNS
Difficulty
Intermediate
Tools
Claude, ChatGPT

The prompt

You are a senior Kubernetes engineer who has debugged DNS issues across CoreDNS configurations — `ndots` traps, search domain explosion, cache misses, NXDOMAIN floods. You know that "DNS slow" in K8s is usually 5 sequential lookups, not a slow upstream.

I will provide:
- The symptom (resolution fails, slow lookups, intermittent NXDOMAIN, lookups for `<svc>.cluster.local.<something>.cluster.local`)
- `kubectl -n kube-system get pods -l k8s-app=kube-dns` and `kubectl -n kube-system get cm coredns -o yaml`
- A reproduction from a pod: `kubectl exec <pod> -- nslookup <name>` or `dig <name>`
- The pod's `/etc/resolv.conf` content
- CoreDNS logs: `kubectl -n kube-system logs -l k8s-app=kube-dns --tail=200`
- Pod's `dnsPolicy` (`ClusterFirst`, `Default`, `ClusterFirstWithHostNet`, `None`)

Your job:

1. **Understand the lookup mechanics**:
   - Pod's `/etc/resolv.conf` typically has:
     ```
     nameserver 10.96.0.10        # CoreDNS Service ClusterIP
     search <ns>.svc.cluster.local svc.cluster.local cluster.local <node-search>
     options ndots:5
     ```
   - **`ndots:5`** means: if the name has fewer than 5 dots, append each search domain and try
   - **Looking up `google.com`** with `ndots:5` → 4 NXDOMAIN attempts (`google.com.foo.svc.cluster.local`, `google.com.svc.cluster.local`, `google.com.cluster.local`, then `google.com`)
   - Slow external lookups in K8s are usually this. Either lower `ndots` (to 2 or 1) or use FQDN (`google.com.`)
2. **Common failure modes**:
   - **CoreDNS pods CrashLoopBackOff** → entire cluster DNS down; check pod logs
   - **CoreDNS pods OK but resolution fails** → pod's nameserver wrong, or network policy blocking
   - **Intermittent failures** → conntrack race on UDP DNS (well-known issue); fix with TCP DNS or NodeLocal DNS
   - **NXDOMAIN floods on external names** → ndots:5 + search domains; reduce ndots
   - **Slow startup** for apps doing many DNS lookups → search domain explosion
3. **For "service can't reach service"**:
   - In-cluster name: `<svc>` (in same NS), `<svc>.<ns>`, or `<svc>.<ns>.svc.cluster.local`
   - Verify service exists: `kubectl get svc <svc> -n <ns>`
   - Verify endpoints (`kubectl get endpoints <svc>`) — empty means selector mismatch
4. **For NodeLocal DNS Cache** (recommended):
   - DaemonSet `node-local-dns` runs on each node
   - Pod's `/etc/resolv.conf` points to local DNS (e.g., `169.254.20.10`)
   - Eliminates UDP-conntrack races; caches negative responses
   - Configured via `--cluster-dns` on kubelet for new nodes
5. **CoreDNS Corefile** common config:
   ```
   .:53 {
       errors
       health
       ready
       kubernetes cluster.local in-addr.arpa ip6.arpa {
          pods insecure
          fallthrough in-addr.arpa ip6.arpa
          ttl 30
       }
       prometheus :9153
       forward . /etc/resolv.conf {
          max_concurrent 1000
       }
       cache 30
       loop
       reload
       loadbalance
   }
   ```
   - Tune `cache` (default 30s) for hot lookups
   - `forward` defines upstream; can be `8.8.8.8` instead of `/etc/resolv.conf`
6. **For `dnsPolicy: ClusterFirstWithHostNet`** — required for pods with `hostNetwork: true` to use CoreDNS instead of node's DNS
7. **For `dnsConfig: nameservers:` override** — pod-level DNS config; useful for specific apps

Mark DESTRUCTIVE: editing CoreDNS Corefile without test (whole-cluster DNS outage if broken), deleting CoreDNS pods without ensuring others are healthy.

---

Symptom: [DESCRIBE]
Pod's `/etc/resolv.conf`:
```
[PASTE — from kubectl exec <pod> -- cat /etc/resolv.conf]
```
Reproduction (nslookup output):
```
[PASTE]
```
CoreDNS pod status and logs:
```
[PASTE]
```
CoreDNS Corefile:
```
[PASTE `kubectl -n kube-system get cm coredns -o yaml`]
```
Pod's `dnsPolicy` + `dnsConfig`:
```yaml
[PASTE]
```

Why this prompt works

DNS issues in K8s have specific cluster-side causes (CoreDNS, ndots, search domains) layered on top of normal DNS. Models tend to chase “the DNS server is slow” when the issue is the resolver doing 5 lookups per name. This prompt walks the resolver path.

How to use it

  1. Always include the pod’s /etc/resolv.conf — it reveals ndots, search, nameserver.
  2. Test with dig +short for clean output; with dig +trace for debugging.
  3. Test BOTH in-cluster name and external name — different failure paths.
  4. Check CoreDNS logs, not just the application’s logs.

Useful commands

# In-pod diagnostics
kubectl exec -n <ns> <pod> -- cat /etc/resolv.conf
kubectl exec -n <ns> <pod> -- nslookup kubernetes.default
kubectl exec -n <ns> <pod> -- nslookup kubernetes.default 10.96.0.10
kubectl exec -n <ns> <pod> -- nslookup google.com
kubectl exec -n <ns> <pod> -- getent hosts <name>

# Debug pod with dig
kubectl run dns-test --rm -it --image=nicolaka/netshoot --restart=Never -- bash
# Inside:
# dig +short kubernetes.default.svc.cluster.local
# dig +trace google.com
# nslookup -type=any kubernetes.default

# CoreDNS health
kubectl -n kube-system get pods -l k8s-app=kube-dns
kubectl -n kube-system describe pods -l k8s-app=kube-dns
kubectl -n kube-system logs -l k8s-app=kube-dns --tail=200
kubectl -n kube-system get cm coredns -o yaml

# CoreDNS metrics (Prometheus endpoint :9153)
kubectl -n kube-system port-forward <coredns-pod> 9153:9153
curl localhost:9153/metrics | grep coredns

# Service / endpoint
kubectl get svc <svc> -n <ns>
kubectl get endpoints <svc> -n <ns>

# NodeLocal DNS
kubectl -n kube-system get pods -l k8s-app=node-local-dns
kubectl -n kube-system get cm node-local-dns -o yaml

# Conntrack (DNS UDP issues)
kubectl exec -n <ns> <pod> -- conntrack -L | grep ":53"
# (requires conntrack tools in container; usually need debug pod)

Resolution test pattern

# Comprehensive DNS test
kubectl run dnsdebug --rm -it --image=nicolaka/netshoot --restart=Never -- bash -c '
echo "--- resolv.conf ---"
cat /etc/resolv.conf
echo "--- in-cluster (short name) ---"
nslookup kubernetes
echo "--- in-cluster (FQDN) ---"
nslookup kubernetes.default.svc.cluster.local
echo "--- external (with search expansion) ---"
nslookup google.com
echo "--- external (FQDN, no expansion) ---"
nslookup google.com.
echo "--- direct to CoreDNS Service ---"
dig @10.96.0.10 +short kubernetes.default.svc.cluster.local
'

Pod-level DNS tweak (for slow startup apps)

spec:
  dnsConfig:
    options:
    - name: ndots
      value: "2"                        # lower than default 5
  dnsPolicy: ClusterFirst              # default
# Pod uses local DNS cache at 169.254.20.10 instead of CoreDNS Service IP
# Eliminates UDP conntrack races + caches all responses including NXDOMAIN
# Deployed as DaemonSet in kube-system

# Kubelet args (on each node):
# --cluster-dns=169.254.20.10

Common findings this catches

  • External name lookups slow → ndots:5 + 4 search domains; use FQDN (google.com.) or lower ndots.
  • <svc>.cluster.local.svc.cluster.local in logs → app appended extra .cluster.local; not search-domain-aware.
  • NXDOMAIN flood for <svc> in different namespace → using short name; needs <svc>.<ns> form.
  • Intermittent failures under load → conntrack UDP race; install NodeLocal DNS.
  • CoreDNS forwarding loop detected in logs → upstream pointing back to CoreDNS; check forward config.
  • CoreDNS pod replicas too few for cluster size → scale up; aim for 2 per 1000 pods baseline.
  • No CoreDNS Service or wrong ClusterIP → kubelet --cluster-dns mismatch.

When to escalate

  • Cluster-wide DNS outage with CoreDNS healthy — upstream issue (node DNS, cloud VPC); engage networking.
  • Suspected CoreDNS plugin bug — file upstream, capture logs and Corefile.
  • Conntrack DNS race issues persist after NodeLocal DNS deployment — kernel/conntrack tuning; pull in platform team.

Related prompts

Newsletter

Get weekly AI workflows for DevOps engineers

Practical prompts, automation ideas, and tool reviews for infrastructure engineers. One email per week. No spam.