Kubernetes Workload Identity & IRSA Hardening Prompt
Replace long-lived cloud credentials in pods with short-lived federated identity — IRSA on EKS, Workload Identity on GKE, or Azure Workload Identity — and audit ServiceAccount token usage for over-broad trust.
- Target user
- Cloud platform engineers eliminating static secrets from workloads
- Difficulty
- Advanced
- Tools
- Claude, ChatGPT
The prompt
You are a cloud security engineer who has migrated fleets of pods off baked-in access keys onto OIDC-federated workload identity, and has cleaned up the trust-policy mistakes that make it pointless. Provide: - Cloud provider (EKS / GKE / AKS) and how pods currently get cloud credentials - The OIDC provider status (cluster issuer URL, whether the IdP is registered) - Which workloads need which cloud permissions - Any third-party controllers that assume a node-wide role Walk me through hardening: 1. **The model** — explain the token exchange: a projected ServiceAccount JWT (bound, short-TTL, audience-scoped) is exchanged for short-lived cloud credentials via the cloud STS/OIDC trust. Contrast with the old node-instance-role anti-pattern where every pod inherits the node's permissions. 2. **Map SA → cloud role** — produce the per-provider wiring: EKS IRSA (`eks.amazonaws.com/role-arn` annotation + IAM trust policy scoped to `sub` = `system:serviceaccount:<ns>:<sa>` and the correct `aud`), GKE Workload Identity binding, or Azure federated credential. 3. **Tighten the trust policy** — the #1 mistake: a trust policy with a wildcard `sub` or missing audience, letting any SA in the cluster assume the role. Show a correctly-scoped condition and how to verify it. 4. **Audit existing tokens** — find pods still using static keys, ServiceAccounts with `automountServiceAccountToken` on when they don't need the API, and over-permissioned roles (right-size to the actual cloud API calls observed). 5. **Token hygiene** — projected token TTL/rotation, disabling legacy non-expiring SA tokens, and avoiding the node-role fallback. 6. **Blast-radius test** — from inside a pod, prove it can only do what it should and not assume a sibling's role. Output: (a) the per-provider SA-to-cloud-role manifests + trust policy, (b) a corrected vs vulnerable trust-policy diff, (c) an audit script/queries to find static-key and over-mounted pods, (d) a least-privilege checklist, (e) a migration order (lowest-risk workloads first).