Skip to content
CloudOps
All prompts
AI for Kubernetes & Helm Difficulty: Intermediate ClaudeChatGPT

Kubernetes Ingress Troubleshooting Prompt

Diagnose Ingress routing failures, controller misconfiguration, TLS issues, 404/502/503 cascades, and path-vs-host mismatches across NGINX, Traefik, Contour, and HAProxy controllers.

Target user
Kubernetes platform engineers and SREs
Difficulty
Intermediate
Tools
Claude, ChatGPT

The prompt

You are a senior Kubernetes platform engineer with deep experience operating Ingress controllers — NGINX, Traefik, Contour, HAProxy — in production at scale. You can read controller logs and trace a request from the LoadBalancer all the way to the pod.

I will provide:
- The symptom (404 from the wrong page, 502/503 from the controller, TLS cert wrong, host mismatch, broken websocket, path rewrite incorrect)
- The Ingress YAML (`kubectl get ingress <name> -o yaml`)
- The Service YAML it points to (`kubectl get svc <svc> -o yaml`)
- The Endpoints (`kubectl get endpoints <svc>`)
- Which Ingress controller and version (`kubectl get pods -n ingress-nginx`, etc.)
- The IngressClass (`kubectl get ingressclass`)
- Controller pod logs filtered to the affected host
- A reproducible curl with `-v` showing the failing request

Your job:

1. **Walk the request path** end to end:
   - DNS → cloud LoadBalancer → Ingress controller pod (NodePort or hostNetwork) → controller config matching by host+path → upstream Service → Endpoints (pod IPs) → pod listening on the port
2. **Identify which hop fails**:
   - DNS resolves to the wrong IP? `dig <host>` and `kubectl get svc <ingress-controller> -o wide`
   - LoadBalancer not provisioned? Cloud quota, security group, or controller LoadBalancer service in `Pending`
   - Controller doesn't match the host+path? IngressClass missing, host typo, controller's `--watch-namespace` excluding this Ingress
   - 502/503 from controller? Upstream Service has no endpoints (selector mismatch) or upstream pod returning errors
   - 404 from the application? Ingress routed correctly but path prefix not stripped (rewrite missing)
3. **Check IngressClass matching**:
   - `spec.ingressClassName` on the Ingress must match `metadata.name` on an IngressClass
   - The IngressClass must reference a controller (`spec.controller`) that's actually installed
   - Default IngressClass: only one should have `ingressclass.kubernetes.io/is-default-class: "true"`
4. **Check Service → Pod mapping**:
   - `kubectl get endpoints <svc>` empty? Service selector doesn't match pod labels, or pods not ready
   - Service `targetPort` matches the pod's actual containerPort? Off-by-one is common
   - Pod has `readinessProbe` failing? Endpoints excludes not-ready pods
5. **TLS specifics**:
   - cert-manager: `kubectl describe certificate` and `kubectl describe certificaterequest`
   - SNI mismatch: controller serves a default cert when host doesn't match any
   - HSTS / HTTPS-redirect annotations causing browser-cached 301 loops
   - Self-signed CA in cluster issuer not trusted by the client
6. **Annotation pitfalls** (vary by controller):
   - NGINX path-rewrite (`nginx.ingress.kubernetes.io/rewrite-target`) silently strips or doesn't strip the prefix depending on `pathType`
   - `pathType: ImplementationSpecific` behaves differently per controller; prefer `Prefix` or `Exact`
   - `proxy-body-size`, `proxy-read-timeout` annotations default to 1MB / 60s — common cause of large-upload or long-poll 502/504
   - Websocket support: `proxy-read-timeout: "3600"` + `proxy-send-timeout` + appropriate Upgrade header pass-through
7. **Output specifics**:
   - `404 Not Found` from controller (with controller-branded HTML) → no Ingress matched
   - `404` from app → Ingress matched but app routing rejected
   - `502 Bad Gateway` → controller couldn't reach upstream (pod down, port wrong)
   - `503 Service Unavailable` → no endpoints (Service selector wrong, all pods unready)
   - `504 Gateway Timeout` → upstream too slow; raise `proxy-read-timeout`

Mark anything DESTRUCTIVE clearly: deleting IngressClass, recreating cert-manager order, restarting controller during peak.

---

Ingress controller + version: [nginx 1.10 / traefik 3.0 / etc.]
Symptom + status code: [DESCRIBE — include exact curl output]
Ingress YAML:
```yaml
[PASTE]
```
Service YAML:
```yaml
[PASTE]
```
Endpoints:
```
[PASTE kubectl get endpoints <svc> -o yaml]
```
IngressClass(es):
```yaml
[PASTE]
```
Controller logs (filtered to host):
```
[PASTE]
```
Reproduction (`curl -v`):
```
[PASTE]
```

Why this prompt works

Ingress failures have many possible root causes (DNS, LB, controller, Ingress object, Service, Endpoints, pod, TLS) and the HTTP error code is rarely diagnostic on its own. A 502 means “I couldn’t reach upstream” — but is upstream wrong, slow, or just unhealthy? This prompt walks the request path systematically.

How to use it

  1. Always include a curl -v reproduction. Status code + response headers tell you which hop returned the error.
  2. Include the controller logs filtered to your host. NGINX prints every request with the matched upstream; Traefik shows routing decisions.
  3. Include kubectl get endpoints <svc>. “Service has no endpoints” is the #1 cause of 503, and it’s invisible from the Ingress YAML.
  4. Mention TLS or no TLS explicitly. TLS adds a whole class of failures (cert, SNI, HSTS).

Useful commands

# Ingress + Service + Endpoints
kubectl get ingress -A
kubectl describe ingress <ing> -n <ns>
kubectl get svc <svc> -n <ns> -o wide
kubectl get endpoints <svc> -n <ns>
kubectl get pods -l <selector-from-svc> -n <ns> --show-labels

# IngressClass
kubectl get ingressclass

# Controller pods + logs
kubectl get pods -n ingress-nginx
kubectl logs -n ingress-nginx deploy/ingress-nginx-controller --tail=200 | grep <host>

# Test from outside cluster
curl -v -k --resolve <host>:443:<lb-ip> https://<host>/path

# Test bypassing controller (direct to pod)
POD_IP=$(kubectl get pod <pod> -o jsonpath='{.status.podIP}')
kubectl run debug --rm -it --image=nicolaka/netshoot -- curl -v http://$POD_IP:<port>/

# Test bypassing pod's readiness
kubectl get pod <pod> -o jsonpath='{.status.conditions}'
kubectl describe pod <pod>          # check ReadinessProbe Events

# cert-manager (if used)
kubectl get certificate -A
kubectl describe certificate <cert> -n <ns>
kubectl describe certificaterequest -n <ns>
kubectl describe order -n <ns>      # ACME orders
kubectl describe challenge -n <ns>  # ACME challenges

# NGINX-specific: dump the runtime config
kubectl exec -n ingress-nginx <controller-pod> -- cat /etc/nginx/nginx.conf | grep -A20 <host>

# Traefik-specific: dashboard or API
kubectl port-forward -n traefik svc/traefik 9000:9000
# then visit localhost:9000/dashboard/

Status-code differential

StatusLikely causeFirst diagnostic
404 (controller-branded)No Ingress matched the host+pathkubectl get ing -A, check host typos
404 (app-branded)Ingress matched, app routing rejectspath rewrite annotations
502 Bad GatewayController can’t reach upstream podkubectl get endpoints, pod readiness
503 Service UnavailableNo ready endpoints behind ServiceService selector vs pod labels
504 Gateway TimeoutUpstream too slowRaise proxy-read-timeout annotation; profile app
400 Bad RequestTLS SNI mismatch, or HTTP→HTTPS confusionCheck Host header vs cert
TLS cert wrong (browser warning)SNI fallback to default certCert-manager status; check Ingress spec.tls.hosts
301 redirect loopHSTS or HTTPS-redirect conflictStrip HSTS annotation temporarily

Common findings this catches

  • Service selector: app=myapp, pod labels app: myapp,version: v2 → match works (subset). But selector: app=myapp,version=v1 while pods are v2 → no endpoints. Most common cause of 503.
  • Ingress missing ingressClassName AND no default IngressClass → no controller picks it up. Silent.
  • NGINX rewrite-target: / with pathType: Prefix and path /api(/|$)(.*) — capture groups don’t work with Prefix; need ImplementationSpecific.
  • Ingress on Traefik with NGINX-style annotations → annotations silently ignored.
  • cert-manager Certificate says Ready but Ingress still serves default cert → Ingress TLS secret name doesn’t match certificate’s secret.
  • WebSocket connections drop after 60s → default proxy-read-timeout: "60"; raise for WS endpoints.
  • POST requests > 1MB return 413 → NGINX default proxy-body-size: 1m; raise with annotation.

Path-type cheatsheet (NGINX)

# Exact match
- path: /api
  pathType: Exact            # matches only "/api" — not "/api/" or "/api/foo"

# Prefix match
- path: /api
  pathType: Prefix           # matches "/api", "/api/", "/api/v1/users"

# Regex / capture (NGINX only)
- path: /api(/|$)(.*)
  pathType: ImplementationSpecific
# Pair with annotation:
# nginx.ingress.kubernetes.io/rewrite-target: /$2
# nginx.ingress.kubernetes.io/use-regex: "true"

WebSocket-friendly NGINX annotations

annotations:
  nginx.ingress.kubernetes.io/proxy-read-timeout: "3600"
  nginx.ingress.kubernetes.io/proxy-send-timeout: "3600"
  # Headers — NGINX preserves Upgrade/Connection by default in recent versions

When to escalate

  • LoadBalancer Service stuck in Pending → cloud-side; check IAM, quotas, security groups before assuming a K8s issue.
  • Cert-manager hitting Let’s Encrypt rate limits → coordinate; either move to staging issuer temporarily or use a DNS-01 alternative.
  • Controller crash-looping → look at memory limits (NGINX especially is RAM-hungry under high concurrency) and consider HPA on the controller deployment.

Related prompts

Newsletter

Get weekly AI workflows for DevOps engineers

Practical prompts, automation ideas, and tool reviews for infrastructure engineers. One email per week. No spam.