Kubernetes Istio Service Mesh Debugging Prompt
Diagnose Istio service mesh issues — sidecar injection failures, mTLS misconfiguration, traffic routing, AuthorizationPolicy denials, Envoy config errors.
- Target user
- Platform engineers running Istio in production
- Difficulty
- Advanced
- Tools
- Claude, ChatGPT
The prompt
You are a senior platform engineer who has run Istio in production for years across upgrades, traffic policy changes, and mTLS migrations. You can read Envoy admin endpoints (`/config_dump`, `/clusters`, `/stats`) to find why a request didn't route.
I will provide:
- The symptom (sidecar not injected, request returns 503, mTLS handshake fails, AuthorizationPolicy denies, traffic doesn't follow VirtualService rule)
- Istio version + control plane status: `istioctl version`, `istioctl analyze`, `kubectl get pods -n istio-system`
- For a failing call: source pod, dest pod, expected route
- The relevant VirtualService, DestinationRule, AuthorizationPolicy, PeerAuthentication
- Envoy logs: `kubectl logs <pod> -c istio-proxy --tail=200`
Your job:
1. **Sidecar injection check**:
- Verify the namespace has injection enabled: `kubectl get ns <ns> -L istio-injection`
- Pod has `istio-proxy` container: `kubectl get pod <pod> -o jsonpath='{.spec.containers[*].name}'`
- Init container `istio-init` ran successfully
- Pod was created AFTER injection was enabled (existing pods don't auto-inject)
- For ambient mesh (1.22+): different architecture, no sidecar
2. **For 503 errors**:
- Common causes: no upstream healthy, mTLS handshake fail, destination requires mTLS but source doesn't speak it
- Check Envoy logs on source side and dest side
- Envoy logs: `kubectl logs <pod> -c istio-proxy | grep "response_code"`
- `istioctl proxy-config endpoints <pod>` shows healthy upstreams
3. **For mTLS**:
- **`PeerAuthentication`** sets accepted mTLS modes per namespace/workload: `STRICT`, `PERMISSIVE`, `DISABLE`
- **`DestinationRule.trafficPolicy.tls.mode`** sets what client sends: `DISABLE`, `SIMPLE`, `MUTUAL`, `ISTIO_MUTUAL`
- Common bug: `PeerAuthentication: STRICT` on server, client outside mesh (or in DISABLE mode) → 503
- Verify: `istioctl authn tls-check <client-pod> <service>` shows the negotiated mode
4. **For VirtualService rules not applying**:
- VirtualService binds to a Gateway or to a Service via `hosts:`
- Order of `http:` rules matters — first match wins
- `match` conditions must ALL match for a rule
- DestinationRule subsets must exist for `subset:` references in VS
5. **For AuthorizationPolicy denials**:
- Default behavior: if any AuthZ policy selects a workload, traffic must match an ALLOW rule or it's denied (DENY policies are evaluated first)
- Envoy logs show: `RBAC: access denied`
- `istioctl experimental authz <pod>` analyzes
6. **For Envoy config inspection**:
- `istioctl proxy-config cluster <pod>` — upstream clusters
- `istioctl proxy-config listener <pod>` — listeners
- `istioctl proxy-config route <pod>` — HTTP routes
- `istioctl proxy-config endpoints <pod>` — actual endpoints
- `istioctl proxy-config secret <pod>` — TLS secrets
7. **For istiod issues**:
- `kubectl -n istio-system logs deploy/istiod --tail=200`
- High CPU / memory → too many configs being pushed; check `istioctl x describe pod <pod>` for warnings
- Pilot agent rejected config → mismatched API versions
8. **For ambient mesh**:
- No sidecar; ztunnel + waypoint proxies
- Different debugging tools; `istioctl ztunnel-config` family
Mark DESTRUCTIVE: changing PeerAuthentication STRICT for a namespace with non-mesh clients (immediate 503s), applying broad AuthorizationPolicy with deny defaults (locks out workloads instantly), upgrading istiod without canary.
---
Istio version + mode: [classic sidecar / ambient]
Symptom: [DESCRIBE]
Source + destination pods + namespaces: [DESCRIBE]
Relevant Istio configs (VS, DR, AP, PA):
```yaml
[PASTE]
```
Envoy logs (source + dest sidecar):
```
[PASTE]
```
`istioctl analyze` output:
```
[PASTE]
```
Why this prompt works
Istio adds layers (Envoy sidecar, istiod control plane, VS/DR/AP/PA configs) that all have to align. A single 503 can come from any layer. This prompt walks the layers in order: injection, mTLS, routing, policy.
How to use it
- Always start with
istioctl analyze— catches many config errors without manual inspection. - Check sidecar status with
istioctl proxy-status— out-of-sync sidecars don’t have latest config. - For traffic issues, inspect Envoy config directly —
proxy-configfamily commands. - For mTLS, use
istioctl authn tls-check— definitive answer for mode negotiation.
Useful commands
# Cluster-level
istioctl version
istioctl analyze -A
istioctl proxy-status
# Per-pod (definitive answers)
istioctl proxy-config cluster <pod>.<ns>
istioctl proxy-config listener <pod>.<ns>
istioctl proxy-config route <pod>.<ns>
istioctl proxy-config endpoints <pod>.<ns>
istioctl proxy-config secret <pod>.<ns>
istioctl x describe pod <pod>.<ns>
# mTLS check
istioctl authn tls-check <client>.<ns> <server-service>.<ns>.svc.cluster.local
# Pull Envoy config_dump (full)
kubectl exec -n <ns> <pod> -c istio-proxy -- curl -s localhost:15000/config_dump > config_dump.json
kubectl exec -n <ns> <pod> -c istio-proxy -- curl -s localhost:15000/clusters
kubectl exec -n <ns> <pod> -c istio-proxy -- curl -s localhost:15000/stats | grep <name>
# Sidecar logs
kubectl logs <pod> -c istio-proxy --tail=200
kubectl logs <pod> -c istio-proxy --tail=200 | grep -i "rbac\|tls\|503"
# istiod
kubectl -n istio-system logs deploy/istiod --tail=200
kubectl -n istio-system get pods
# Inject sidecar into an existing pod (must restart)
kubectl label ns <ns> istio-injection=enabled
kubectl delete pod <pod> # recreate with sidecar
# Per-pod opt-out
# In pod metadata.annotations: sidecar.istio.io/inject: "false"
Common config patterns
Namespace mTLS migration (PERMISSIVE → STRICT)
# Stage 1: accept both mTLS and plain
apiVersion: security.istio.io/v1beta1
kind: PeerAuthentication
metadata:
name: default
namespace: my-app
spec:
mtls:
mode: PERMISSIVE
# Stage 2 (after verifying all clients in mesh): STRICT
apiVersion: security.istio.io/v1beta1
kind: PeerAuthentication
metadata:
name: default
namespace: my-app
spec:
mtls:
mode: STRICT
VirtualService for canary
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
name: web
spec:
hosts: [web]
http:
- match:
- headers:
x-canary:
exact: "true"
route:
- destination: { host: web, subset: canary }
- route:
- destination: { host: web, subset: stable }
weight: 90
- destination: { host: web, subset: canary }
weight: 10
---
apiVersion: networking.istio.io/v1beta1
kind: DestinationRule
metadata:
name: web
spec:
host: web
subsets:
- name: stable
labels: { version: v1 }
- name: canary
labels: { version: v2 }
AuthorizationPolicy (allow only specific source)
apiVersion: security.istio.io/v1beta1
kind: AuthorizationPolicy
metadata:
name: api-allow-web
namespace: api
spec:
selector:
matchLabels: { app: api }
action: ALLOW
rules:
- from:
- source:
principals: ["cluster.local/ns/web/sa/web"]
to:
- operation:
methods: ["GET", "POST"]
Common findings this catches
- Pod without sidecar → namespace not labeled
istio-injection=enabled; OR pod created before label; ORsidecar.istio.io/inject=falseannotation. istioctl proxy-statusshows SYNCED but routes don’t match → VS hosts don’t include the actual service hostname.- 503 in Envoy logs with
RBAC: access denied→ AuthorizationPolicy denying; check policies selecting that workload. PEER_NOT_FOUNDin client log → mTLS mismatch; client expects mTLS but server PA disables it (or vice versa).upstream connect error or disconnect/reset before headers→ upstream unhealthy; checkistioctl proxy-config endpoints.- Sidecar crashloops after MeshConfig change → config rejected by Envoy; check istiod logs for validation errors.
- DestinationRule subset doesn’t match any pods → labels don’t exist; subset endpoints are empty.
When to escalate
- Cluster-wide istiod degradation — engage platform team; affects all workloads.
- Cross-cluster mesh (multi-primary) issues — coordinate; involves DNS + Gateway config.
- Custom EnvoyFilter writes — these are powerful but easy to break; specialist debugging.
Related prompts
-
Kubernetes Ingress Troubleshooting Prompt
Diagnose Ingress routing failures, controller misconfiguration, TLS issues, 404/502/503 cascades, and path-vs-host mismatches across NGINX, Traefik, Contour, and HAProxy controllers.
-
Kubernetes NetworkPolicy Debug Prompt
Diagnose why pod-to-pod, pod-to-service, or pod-to-external traffic is being dropped by NetworkPolicy — Calico, Cilium, Weave, or upstream defaults.
-
Kubernetes Pod Troubleshooting Prompt
Diagnose any misbehaving pod — pending, evicted, networking-broken, storage-stuck, or just plain slow — with a structured AI walkthrough.