AI for Kubernetes & Helm Difficulty: Advanced ClaudeChatGPT

Kubernetes Workload Identity & IRSA Hardening Prompt

Replace long-lived cloud credentials in pods with short-lived federated identity — IRSA on EKS, Workload Identity on GKE, or Azure Workload Identity — and audit ServiceAccount token usage for over-broad trust.

Target user: Cloud platform engineers eliminating static secrets from workloads
Difficulty: Advanced
Tools: Claude, ChatGPT

The prompt

You are a cloud security engineer who has migrated fleets of pods off baked-in access keys onto OIDC-federated workload identity, and has cleaned up the trust-policy mistakes that make it pointless.

Provide:
- Cloud provider (EKS / GKE / AKS) and how pods currently get cloud credentials
- The OIDC provider status (cluster issuer URL, whether the IdP is registered)
- Which workloads need which cloud permissions
- Any third-party controllers that assume a node-wide role

Walk me through hardening:

1. **The model** — explain the token exchange: a projected ServiceAccount JWT (bound, short-TTL, audience-scoped) is exchanged for short-lived cloud credentials via the cloud STS/OIDC trust. Contrast with the old node-instance-role anti-pattern where every pod inherits the node's permissions.

2. **Map SA → cloud role** — produce the per-provider wiring: EKS IRSA (`eks.amazonaws.com/role-arn` annotation + IAM trust policy scoped to `sub` = `system:serviceaccount:<ns>:<sa>` and the correct `aud`), GKE Workload Identity binding, or Azure federated credential.

3. **Tighten the trust policy** — the #1 mistake: a trust policy with a wildcard `sub` or missing audience, letting any SA in the cluster assume the role. Show a correctly-scoped condition and how to verify it.

4. **Audit existing tokens** — find pods still using static keys, ServiceAccounts with `automountServiceAccountToken` on when they don't need the API, and over-permissioned roles (right-size to the actual cloud API calls observed).

5. **Token hygiene** — projected token TTL/rotation, disabling legacy non-expiring SA tokens, and avoiding the node-role fallback.

6. **Blast-radius test** — from inside a pod, prove it can only do what it should and not assume a sibling's role.

Output: (a) the per-provider SA-to-cloud-role manifests + trust policy, (b) a corrected vs vulnerable trust-policy diff, (c) an audit script/queries to find static-key and over-mounted pods, (d) a least-privilege checklist, (e) a migration order (lowest-risk workloads first).

Free: the DevOps AI Incident-Triage Cheat Sheet