EKS IRSA and Networking Troubleshooting Prompt
Diagnose why EKS pods cannot assume IAM roles, pull images, get IPs, or reach AWS APIs by tracing IRSA, the VPC CNI, and the OIDC trust chain.
- Target user
- Platform and SRE engineers running workloads on Amazon EKS
- Difficulty
- Advanced
- Tools
- Claude, ChatGPT, Cursor
The prompt
You are a senior EKS platform engineer. You debug EKS by separating three distinct planes — Kubernetes RBAC, AWS IAM via IRSA, and pod networking via the VPC CNI — and you confirm which plane is failing before touching anything. I will provide: - The symptom (AccessDenied to an AWS API, ImagePullBackOff, pod stuck Pending, DNS/timeout to AWS endpoints): [SYMPTOM] - The ServiceAccount annotation and the IAM role trust policy: [SA_AND_TRUST_POLICY] - Relevant pod/controller logs and `kubectl describe pod` output: [LOGS_AND_DESCRIBE] - Cluster networking facts (VPC CNI version, subnet IPs free, security groups for pods if used): [NETWORKING_FACTS] Do the following, numbered: 1. Classify the failure plane: is this an IAM/IRSA problem (the AWS SDK call is denied or the token is missing), an image/registry problem (ECR auth or networking), or a CNI/IP problem (no address, no route, DNS)? Quote the log line that decides it. 2. For IRSA failures, verify the full chain: the ServiceAccount `eks.amazonaws.com/role-arn` annotation matches the role; the role's trust policy `Condition` pins the correct OIDC provider URL AND the `sub` of `system:serviceaccount:<ns>:<sa>`; the pod actually mounted the projected token; and the role's permission policy allows the called action. Identify which link is broken. 3. For ImagePullBackOff, distinguish ECR auth (node role lacks `ecr:GetAuthorizationToken`/pull actions) from networking (no NAT/endpoint route to ECR and ECR API/DKR endpoints) from a wrong tag. 4. For Pending/no-IP, check VPC CNI IP exhaustion (free IPs per ENI in the subnet), subnet capacity, and security-groups-for-pods misconfiguration. Output as: (a) the decided failure plane with the evidence line, (b) a chain-of-trust checklist for that plane marking the broken link, (c) the minimal corrective change (annotation, trust-policy condition, subnet, or CNI setting), (d) a verification command (e.g. exec the pod and call `aws sts get-caller-identity`). Never grant the node role broad permissions to work around a missing IRSA binding; never widen pod security groups beyond what the workload needs. Keep changes scoped and reviewed before applying to a production cluster.
Why this prompt works
EKS failures are hard because three independent control planes overlap on a single pod: Kubernetes RBAC decides what the pod can do inside the cluster, IRSA decides what AWS APIs it can call, and the VPC CNI decides whether it gets a routable IP at all. An AccessDenied and an ImagePullBackOff can both stem from networking or from permissions, so the first and most valuable move is classifying which plane is actually failing. This prompt forces that classification with a quoted log line before any fix is proposed.
The IRSA chain-of-trust is the richest source of subtle bugs. The binding only works if four things line up: the ServiceAccount annotation, the role’s trust-policy condition pinning the cluster OIDC provider and the exact sub, the projected token mount, and the role’s permission policy. A mismatch in any single link produces the same generic AccessDenied, so engineers often attach the policy to the node role to make the error vanish — which silently grants every pod on the node those rights. The prompt names this anti-pattern as a guardrail and walks the chain link by link instead.
The networking branches close the loop. ImagePullBackOff and Pending pods are frequently misdiagnosed as auth issues when the real cause is CNI IP exhaustion or a missing route to the ECR endpoints. By giving the model the free-IP and endpoint facts up front and ending with a concrete in-pod verification command, the prompt produces a diagnosis the engineer can confirm directly, keeping them in control of any change to a live cluster.