Skip to content
CloudOps
All prompts
AI for Kubernetes & Helm Difficulty: Advanced ClaudeChatGPT

Kubernetes Istio Service Mesh Debugging Prompt

Diagnose Istio service mesh issues — sidecar injection failures, mTLS misconfiguration, traffic routing, AuthorizationPolicy denials, Envoy config errors.

Target user
Platform engineers running Istio in production
Difficulty
Advanced
Tools
Claude, ChatGPT

The prompt

You are a senior platform engineer who has run Istio in production for years across upgrades, traffic policy changes, and mTLS migrations. You can read Envoy admin endpoints (`/config_dump`, `/clusters`, `/stats`) to find why a request didn't route.

I will provide:
- The symptom (sidecar not injected, request returns 503, mTLS handshake fails, AuthorizationPolicy denies, traffic doesn't follow VirtualService rule)
- Istio version + control plane status: `istioctl version`, `istioctl analyze`, `kubectl get pods -n istio-system`
- For a failing call: source pod, dest pod, expected route
- The relevant VirtualService, DestinationRule, AuthorizationPolicy, PeerAuthentication
- Envoy logs: `kubectl logs <pod> -c istio-proxy --tail=200`

Your job:

1. **Sidecar injection check**:
   - Verify the namespace has injection enabled: `kubectl get ns <ns> -L istio-injection`
   - Pod has `istio-proxy` container: `kubectl get pod <pod> -o jsonpath='{.spec.containers[*].name}'`
   - Init container `istio-init` ran successfully
   - Pod was created AFTER injection was enabled (existing pods don't auto-inject)
   - For ambient mesh (1.22+): different architecture, no sidecar
2. **For 503 errors**:
   - Common causes: no upstream healthy, mTLS handshake fail, destination requires mTLS but source doesn't speak it
   - Check Envoy logs on source side and dest side
   - Envoy logs: `kubectl logs <pod> -c istio-proxy | grep "response_code"`
   - `istioctl proxy-config endpoints <pod>` shows healthy upstreams
3. **For mTLS**:
   - **`PeerAuthentication`** sets accepted mTLS modes per namespace/workload: `STRICT`, `PERMISSIVE`, `DISABLE`
   - **`DestinationRule.trafficPolicy.tls.mode`** sets what client sends: `DISABLE`, `SIMPLE`, `MUTUAL`, `ISTIO_MUTUAL`
   - Common bug: `PeerAuthentication: STRICT` on server, client outside mesh (or in DISABLE mode) → 503
   - Verify: `istioctl authn tls-check <client-pod> <service>` shows the negotiated mode
4. **For VirtualService rules not applying**:
   - VirtualService binds to a Gateway or to a Service via `hosts:`
   - Order of `http:` rules matters — first match wins
   - `match` conditions must ALL match for a rule
   - DestinationRule subsets must exist for `subset:` references in VS
5. **For AuthorizationPolicy denials**:
   - Default behavior: if any AuthZ policy selects a workload, traffic must match an ALLOW rule or it's denied (DENY policies are evaluated first)
   - Envoy logs show: `RBAC: access denied`
   - `istioctl experimental authz <pod>` analyzes
6. **For Envoy config inspection**:
   - `istioctl proxy-config cluster <pod>` — upstream clusters
   - `istioctl proxy-config listener <pod>` — listeners
   - `istioctl proxy-config route <pod>` — HTTP routes
   - `istioctl proxy-config endpoints <pod>` — actual endpoints
   - `istioctl proxy-config secret <pod>` — TLS secrets
7. **For istiod issues**:
   - `kubectl -n istio-system logs deploy/istiod --tail=200`
   - High CPU / memory → too many configs being pushed; check `istioctl x describe pod <pod>` for warnings
   - Pilot agent rejected config → mismatched API versions
8. **For ambient mesh**:
   - No sidecar; ztunnel + waypoint proxies
   - Different debugging tools; `istioctl ztunnel-config` family

Mark DESTRUCTIVE: changing PeerAuthentication STRICT for a namespace with non-mesh clients (immediate 503s), applying broad AuthorizationPolicy with deny defaults (locks out workloads instantly), upgrading istiod without canary.

---

Istio version + mode: [classic sidecar / ambient]
Symptom: [DESCRIBE]
Source + destination pods + namespaces: [DESCRIBE]
Relevant Istio configs (VS, DR, AP, PA):
```yaml
[PASTE]
```
Envoy logs (source + dest sidecar):
```
[PASTE]
```
`istioctl analyze` output:
```
[PASTE]
```

Why this prompt works

Istio adds layers (Envoy sidecar, istiod control plane, VS/DR/AP/PA configs) that all have to align. A single 503 can come from any layer. This prompt walks the layers in order: injection, mTLS, routing, policy.

How to use it

  1. Always start with istioctl analyze — catches many config errors without manual inspection.
  2. Check sidecar status with istioctl proxy-status — out-of-sync sidecars don’t have latest config.
  3. For traffic issues, inspect Envoy config directlyproxy-config family commands.
  4. For mTLS, use istioctl authn tls-check — definitive answer for mode negotiation.

Useful commands

# Cluster-level
istioctl version
istioctl analyze -A
istioctl proxy-status

# Per-pod (definitive answers)
istioctl proxy-config cluster <pod>.<ns>
istioctl proxy-config listener <pod>.<ns>
istioctl proxy-config route <pod>.<ns>
istioctl proxy-config endpoints <pod>.<ns>
istioctl proxy-config secret <pod>.<ns>
istioctl x describe pod <pod>.<ns>

# mTLS check
istioctl authn tls-check <client>.<ns> <server-service>.<ns>.svc.cluster.local

# Pull Envoy config_dump (full)
kubectl exec -n <ns> <pod> -c istio-proxy -- curl -s localhost:15000/config_dump > config_dump.json
kubectl exec -n <ns> <pod> -c istio-proxy -- curl -s localhost:15000/clusters
kubectl exec -n <ns> <pod> -c istio-proxy -- curl -s localhost:15000/stats | grep <name>

# Sidecar logs
kubectl logs <pod> -c istio-proxy --tail=200
kubectl logs <pod> -c istio-proxy --tail=200 | grep -i "rbac\|tls\|503"

# istiod
kubectl -n istio-system logs deploy/istiod --tail=200
kubectl -n istio-system get pods

# Inject sidecar into an existing pod (must restart)
kubectl label ns <ns> istio-injection=enabled
kubectl delete pod <pod>            # recreate with sidecar

# Per-pod opt-out
# In pod metadata.annotations: sidecar.istio.io/inject: "false"

Common config patterns

Namespace mTLS migration (PERMISSIVE → STRICT)

# Stage 1: accept both mTLS and plain
apiVersion: security.istio.io/v1beta1
kind: PeerAuthentication
metadata:
  name: default
  namespace: my-app
spec:
  mtls:
    mode: PERMISSIVE

# Stage 2 (after verifying all clients in mesh): STRICT
apiVersion: security.istio.io/v1beta1
kind: PeerAuthentication
metadata:
  name: default
  namespace: my-app
spec:
  mtls:
    mode: STRICT

VirtualService for canary

apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
  name: web
spec:
  hosts: [web]
  http:
  - match:
    - headers:
        x-canary:
          exact: "true"
    route:
    - destination: { host: web, subset: canary }
  - route:
    - destination: { host: web, subset: stable }
      weight: 90
    - destination: { host: web, subset: canary }
      weight: 10
---
apiVersion: networking.istio.io/v1beta1
kind: DestinationRule
metadata:
  name: web
spec:
  host: web
  subsets:
  - name: stable
    labels: { version: v1 }
  - name: canary
    labels: { version: v2 }

AuthorizationPolicy (allow only specific source)

apiVersion: security.istio.io/v1beta1
kind: AuthorizationPolicy
metadata:
  name: api-allow-web
  namespace: api
spec:
  selector:
    matchLabels: { app: api }
  action: ALLOW
  rules:
  - from:
    - source:
        principals: ["cluster.local/ns/web/sa/web"]
    to:
    - operation:
        methods: ["GET", "POST"]

Common findings this catches

  • Pod without sidecar → namespace not labeled istio-injection=enabled; OR pod created before label; OR sidecar.istio.io/inject=false annotation.
  • istioctl proxy-status shows SYNCED but routes don’t match → VS hosts don’t include the actual service hostname.
  • 503 in Envoy logs with RBAC: access denied → AuthorizationPolicy denying; check policies selecting that workload.
  • PEER_NOT_FOUND in client log → mTLS mismatch; client expects mTLS but server PA disables it (or vice versa).
  • upstream connect error or disconnect/reset before headers → upstream unhealthy; check istioctl proxy-config endpoints.
  • Sidecar crashloops after MeshConfig change → config rejected by Envoy; check istiod logs for validation errors.
  • DestinationRule subset doesn’t match any pods → labels don’t exist; subset endpoints are empty.

When to escalate

  • Cluster-wide istiod degradation — engage platform team; affects all workloads.
  • Cross-cluster mesh (multi-primary) issues — coordinate; involves DNS + Gateway config.
  • Custom EnvoyFilter writes — these are powerful but easy to break; specialist debugging.

Related prompts

Newsletter

Get weekly AI workflows for DevOps engineers

Practical prompts, automation ideas, and tool reviews for infrastructure engineers. One email per week. No spam.