Kubernetes Ingress Troubleshooting Prompt
Diagnose Ingress routing failures, controller misconfiguration, TLS issues, 404/502/503 cascades, and path-vs-host mismatches across NGINX, Traefik, Contour, and HAProxy controllers.
- Target user
- Kubernetes platform engineers and SREs
- Difficulty
- Intermediate
- Tools
- Claude, ChatGPT
The prompt
You are a senior Kubernetes platform engineer with deep experience operating Ingress controllers — NGINX, Traefik, Contour, HAProxy — in production at scale. You can read controller logs and trace a request from the LoadBalancer all the way to the pod. I will provide: - The symptom (404 from the wrong page, 502/503 from the controller, TLS cert wrong, host mismatch, broken websocket, path rewrite incorrect) - The Ingress YAML (`kubectl get ingress <name> -o yaml`) - The Service YAML it points to (`kubectl get svc <svc> -o yaml`) - The Endpoints (`kubectl get endpoints <svc>`) - Which Ingress controller and version (`kubectl get pods -n ingress-nginx`, etc.) - The IngressClass (`kubectl get ingressclass`) - Controller pod logs filtered to the affected host - A reproducible curl with `-v` showing the failing request Your job: 1. **Walk the request path** end to end: - DNS → cloud LoadBalancer → Ingress controller pod (NodePort or hostNetwork) → controller config matching by host+path → upstream Service → Endpoints (pod IPs) → pod listening on the port 2. **Identify which hop fails**: - DNS resolves to the wrong IP? `dig <host>` and `kubectl get svc <ingress-controller> -o wide` - LoadBalancer not provisioned? Cloud quota, security group, or controller LoadBalancer service in `Pending` - Controller doesn't match the host+path? IngressClass missing, host typo, controller's `--watch-namespace` excluding this Ingress - 502/503 from controller? Upstream Service has no endpoints (selector mismatch) or upstream pod returning errors - 404 from the application? Ingress routed correctly but path prefix not stripped (rewrite missing) 3. **Check IngressClass matching**: - `spec.ingressClassName` on the Ingress must match `metadata.name` on an IngressClass - The IngressClass must reference a controller (`spec.controller`) that's actually installed - Default IngressClass: only one should have `ingressclass.kubernetes.io/is-default-class: "true"` 4. **Check Service → Pod mapping**: - `kubectl get endpoints <svc>` empty? Service selector doesn't match pod labels, or pods not ready - Service `targetPort` matches the pod's actual containerPort? Off-by-one is common - Pod has `readinessProbe` failing? Endpoints excludes not-ready pods 5. **TLS specifics**: - cert-manager: `kubectl describe certificate` and `kubectl describe certificaterequest` - SNI mismatch: controller serves a default cert when host doesn't match any - HSTS / HTTPS-redirect annotations causing browser-cached 301 loops - Self-signed CA in cluster issuer not trusted by the client 6. **Annotation pitfalls** (vary by controller): - NGINX path-rewrite (`nginx.ingress.kubernetes.io/rewrite-target`) silently strips or doesn't strip the prefix depending on `pathType` - `pathType: ImplementationSpecific` behaves differently per controller; prefer `Prefix` or `Exact` - `proxy-body-size`, `proxy-read-timeout` annotations default to 1MB / 60s — common cause of large-upload or long-poll 502/504 - Websocket support: `proxy-read-timeout: "3600"` + `proxy-send-timeout` + appropriate Upgrade header pass-through 7. **Output specifics**: - `404 Not Found` from controller (with controller-branded HTML) → no Ingress matched - `404` from app → Ingress matched but app routing rejected - `502 Bad Gateway` → controller couldn't reach upstream (pod down, port wrong) - `503 Service Unavailable` → no endpoints (Service selector wrong, all pods unready) - `504 Gateway Timeout` → upstream too slow; raise `proxy-read-timeout` Mark anything DESTRUCTIVE clearly: deleting IngressClass, recreating cert-manager order, restarting controller during peak. --- Ingress controller + version: [nginx 1.10 / traefik 3.0 / etc.] Symptom + status code: [DESCRIBE — include exact curl output] Ingress YAML: ```yaml [PASTE] ``` Service YAML: ```yaml [PASTE] ``` Endpoints: ``` [PASTE kubectl get endpoints <svc> -o yaml] ``` IngressClass(es): ```yaml [PASTE] ``` Controller logs (filtered to host): ``` [PASTE] ``` Reproduction (`curl -v`): ``` [PASTE] ```
Why this prompt works
Ingress failures have many possible root causes (DNS, LB, controller, Ingress object, Service, Endpoints, pod, TLS) and the HTTP error code is rarely diagnostic on its own. A 502 means “I couldn’t reach upstream” — but is upstream wrong, slow, or just unhealthy? This prompt walks the request path systematically.
How to use it
- Always include a
curl -vreproduction. Status code + response headers tell you which hop returned the error. - Include the controller logs filtered to your host. NGINX prints every request with the matched upstream; Traefik shows routing decisions.
- Include
kubectl get endpoints <svc>. “Service has no endpoints” is the #1 cause of 503, and it’s invisible from the Ingress YAML. - Mention TLS or no TLS explicitly. TLS adds a whole class of failures (cert, SNI, HSTS).
Useful commands
# Ingress + Service + Endpoints
kubectl get ingress -A
kubectl describe ingress <ing> -n <ns>
kubectl get svc <svc> -n <ns> -o wide
kubectl get endpoints <svc> -n <ns>
kubectl get pods -l <selector-from-svc> -n <ns> --show-labels
# IngressClass
kubectl get ingressclass
# Controller pods + logs
kubectl get pods -n ingress-nginx
kubectl logs -n ingress-nginx deploy/ingress-nginx-controller --tail=200 | grep <host>
# Test from outside cluster
curl -v -k --resolve <host>:443:<lb-ip> https://<host>/path
# Test bypassing controller (direct to pod)
POD_IP=$(kubectl get pod <pod> -o jsonpath='{.status.podIP}')
kubectl run debug --rm -it --image=nicolaka/netshoot -- curl -v http://$POD_IP:<port>/
# Test bypassing pod's readiness
kubectl get pod <pod> -o jsonpath='{.status.conditions}'
kubectl describe pod <pod> # check ReadinessProbe Events
# cert-manager (if used)
kubectl get certificate -A
kubectl describe certificate <cert> -n <ns>
kubectl describe certificaterequest -n <ns>
kubectl describe order -n <ns> # ACME orders
kubectl describe challenge -n <ns> # ACME challenges
# NGINX-specific: dump the runtime config
kubectl exec -n ingress-nginx <controller-pod> -- cat /etc/nginx/nginx.conf | grep -A20 <host>
# Traefik-specific: dashboard or API
kubectl port-forward -n traefik svc/traefik 9000:9000
# then visit localhost:9000/dashboard/
Status-code differential
| Status | Likely cause | First diagnostic |
|---|---|---|
| 404 (controller-branded) | No Ingress matched the host+path | kubectl get ing -A, check host typos |
| 404 (app-branded) | Ingress matched, app routing rejects | path rewrite annotations |
| 502 Bad Gateway | Controller can’t reach upstream pod | kubectl get endpoints, pod readiness |
| 503 Service Unavailable | No ready endpoints behind Service | Service selector vs pod labels |
| 504 Gateway Timeout | Upstream too slow | Raise proxy-read-timeout annotation; profile app |
| 400 Bad Request | TLS SNI mismatch, or HTTP→HTTPS confusion | Check Host header vs cert |
| TLS cert wrong (browser warning) | SNI fallback to default cert | Cert-manager status; check Ingress spec.tls.hosts |
| 301 redirect loop | HSTS or HTTPS-redirect conflict | Strip HSTS annotation temporarily |
Common findings this catches
- Service
selector: app=myapp, pod labelsapp: myapp,version: v2→ match works (subset). Butselector: app=myapp,version=v1while pods are v2 → no endpoints. Most common cause of 503. - Ingress missing
ingressClassNameAND no default IngressClass → no controller picks it up. Silent. - NGINX
rewrite-target: /withpathType: Prefixand path/api(/|$)(.*)— capture groups don’t work withPrefix; needImplementationSpecific. - Ingress on Traefik with NGINX-style annotations → annotations silently ignored.
- cert-manager
Certificatesays Ready but Ingress still serves default cert → Ingress TLS secret name doesn’t match certificate’s secret. - WebSocket connections drop after 60s → default
proxy-read-timeout: "60"; raise for WS endpoints. POSTrequests > 1MB return 413 → NGINX defaultproxy-body-size: 1m; raise with annotation.
Path-type cheatsheet (NGINX)
# Exact match
- path: /api
pathType: Exact # matches only "/api" — not "/api/" or "/api/foo"
# Prefix match
- path: /api
pathType: Prefix # matches "/api", "/api/", "/api/v1/users"
# Regex / capture (NGINX only)
- path: /api(/|$)(.*)
pathType: ImplementationSpecific
# Pair with annotation:
# nginx.ingress.kubernetes.io/rewrite-target: /$2
# nginx.ingress.kubernetes.io/use-regex: "true"
WebSocket-friendly NGINX annotations
annotations:
nginx.ingress.kubernetes.io/proxy-read-timeout: "3600"
nginx.ingress.kubernetes.io/proxy-send-timeout: "3600"
# Headers — NGINX preserves Upgrade/Connection by default in recent versions
When to escalate
- LoadBalancer Service stuck in
Pending→ cloud-side; check IAM, quotas, security groups before assuming a K8s issue. - Cert-manager hitting Let’s Encrypt rate limits → coordinate; either move to staging issuer temporarily or use a DNS-01 alternative.
- Controller crash-looping → look at memory limits (NGINX especially is RAM-hungry under high concurrency) and consider HPA on the controller deployment.
Related prompts
-
Kubernetes NetworkPolicy Debug Prompt
Diagnose why pod-to-pod, pod-to-service, or pod-to-external traffic is being dropped by NetworkPolicy — Calico, Cilium, Weave, or upstream defaults.
-
Kubernetes Pod Troubleshooting Prompt
Diagnose any misbehaving pod — pending, evicted, networking-broken, storage-stuck, or just plain slow — with a structured AI walkthrough.
-
Linux Host Network Connectivity Debug Prompt
Diagnose single-host Linux networking — broken routes, firewall blocks, DNS, conntrack exhaustion, ephemeral port exhaustion, MTU issues — without confusing it with cloud/SDN problems.