Writing Least-Privilege IAM Policies With AI From CloudTrail

A service account in our staging account had AdministratorAccess attached to it for eight months. Nobody put it there maliciously — someone was debugging a deploy at 1am, slapped admin on the role to unblock themselves, and the cleanup ticket died in the backlog. I found it during a routine review, and the honest reason it survived so long is that writing the correct least-privilege policy by hand is miserable work. You have to know every API the workload calls, map each one to an action, get the resource ARNs right, and then iterate when something breaks in production at the worst possible time.

This is the kind of grind AI is genuinely good at — not because it knows your security posture, but because it can read a pile of CloudTrail events and turn them into a structured policy faster than you can. The catch, and it’s a big one: AI will happily invent actions that don’t exist and over-scope resources to make the policy “work.” So you ground it in real evidence and you verify every line. Here’s how I do it.

Start from what the role actually did

The whole trick is to never ask AI to guess what permissions a workload needs. Instead, you give it the ground truth: the actual API calls the role made, pulled from CloudTrail. If your org has CloudTrail going to S3 with Athena on top, that’s the cleanest source. Query the last 30–90 days of activity for the principal in question.

SELECT eventsource, eventname, count(*) AS calls
FROM cloudtrail_logs
WHERE useridentity.arn LIKE '%role/staging-deploy-role%'
  AND eventtime > '2026-03-21'
  AND errorcode IS NULL
GROUP BY eventsource, eventname
ORDER BY calls DESC;

The errorcode IS NULL filter matters — you don’t want to grant permissions for calls that were failing anyway and weren’t load-bearing. If you don’t have Athena set up, aws cloudtrail lookup-events works for lower-volume roles, though it only covers 90 days and management events:

aws cloudtrail lookup-events \
  --lookup-attributes AttributeKey=AccessKeyId,AttributeValue=ASIA... \
  --start-time 2026-03-21T00:00:00Z \
  --query 'Events[].CloudTrailEvent' --output text \
  | jq -r 'fromjson | "\(.eventSource) \(.eventName)"' \
  | sort | uniq -c | sort -rn

Now you have a deduplicated list of service:Action pairs. That list is the contract. AI’s job is to translate it into a policy, not to expand it.

Hand the evidence to AI, not the imagination

Paste the call list into your model with a tight, constrained prompt. The constraints are doing all the work here.

You are writing an AWS IAM policy. Below is the exact list of API calls a role made over 30 days, taken from CloudTrail. Produce a least-privilege identity policy that grants ONLY these actions. Rules: (1) Use only actions that map directly to the eventName values listed — do not add related or “convenience” actions. (2) Where the eventName differs from the IAM action name (e.g. AssumeRole vs sts:AssumeRole), use the correct IAM action. (3) Group actions by service into separate statements. (4) For resources, use a placeholder ARN comment // CONFIRM rather than * so I can scope each one. (5) Flag any eventName you are not confident maps to a single IAM action.

That last rule is the one people skip, and it’s the most valuable. CloudTrail’s eventName does not always equal the IAM action — s3:ListBucket shows up as ListObjects or ListObjectsV2 in some logs, and a handful of console actions have no IAM equivalent at all. Forcing the model to surface its uncertainty turns a silent guess into a review item.

A typical first draft comes back looking like this:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "EcrPullForDeploy",
      "Effect": "Allow",
      "Action": [
        "ecr:GetAuthorizationToken",
        "ecr:BatchGetImage",
        "ecr:GetDownloadUrlForLayer"
      ],
      "Resource": "arn:aws:ecr:us-east-1:111122223333:repository/staging/*"
    },
    {
      "Sid": "EcsDeploy",
      "Effect": "Allow",
      "Action": [
        "ecs:UpdateService",
        "ecs:DescribeServices",
        "ecs:RegisterTaskDefinition"
      ],
      "Resource": "*"
    }
  ]
}

The human part: scope resources and kill the wildcards

The draft above has the actions right but the resources wrong, and that’s exactly the division of labor I want. ecr:GetAuthorizationToken genuinely requires Resource: "*" — it’s one of the actions that doesn’t support resource-level permissions, which you can confirm in the Service Authorization Reference. But ecs:UpdateService does support resource scoping, and leaving it as * means this role can redeploy every service in the account. Ask AI which actions support resource-level permissions, then verify against the docs and tighten:

{
  "Sid": "EcsDeploy",
  "Effect": "Allow",
  "Action": ["ecs:UpdateService", "ecs:DescribeServices"],
  "Resource": "arn:aws:ecs:us-east-1:111122223333:service/staging-cluster/*",
  "Condition": {
    "ArnEquals": { "ecs:cluster": "arn:aws:ecs:us-east-1:111122223333:cluster/staging-cluster" }
  }
}

RegisterTaskDefinition doesn’t support resource-level scoping, so it gets its own statement — keeping the scopeable and unscopeable actions separate is good hygiene the model won’t do on its own.

Validate before you ever attach it

Never trust a generated policy syntactically. Two AWS-native checks catch most problems for free:

# Parse + grammar + finding-level checks (unused actions, overly broad, etc.)
aws accessanalyzer validate-policy \
  --policy-type IDENTITY_POLICY \
  --policy-document file://staging-deploy-policy.json

# Confirm the policy actually permits the call you care about
aws iam simulate-custom-policy \
  --policy-input-list file://staging-deploy-policy.json \
  --action-names ecs:UpdateService \
  --resource-arns arn:aws:ecs:us-east-1:111122223333:service/staging-cluster/web

validate-policy will flag a wildcard you missed or an action that doesn’t exist — which is your safety net against a hallucinated action sneaking through. simulate-custom-policy confirms the real calls still pass before you swap admin off.

Ship it behind a safety net

Attach the new policy alongside the old one for a few days, with CloudTrail still recording, and watch for AccessDenied events. AI is good at reading those too — feed it the denied calls and it’ll tell you exactly which statement to extend. When the denials stop, detach AdministratorAccess.

The mindset that matters: AI compresses the tedious translation from “what did this thing do” to “what policy expresses that,” but it has zero knowledge of your blast radius or your trust boundaries. CloudTrail is the source of truth, AWS’s own validators are the gate, and you are the one who decides what Resource is allowed to be. If you’re hunting for over-permissioned principals more broadly, the same evidence-first approach shows up in finding public cloud exposure with AI. And if you want a starting prompt library for this kind of work, I keep a running set in the prompts collection.

Start from what the role actually did

Hand the evidence to AI, not the imagination

The human part: scope resources and kill the wildcards

Validate before you ever attach it

Ship it behind a safety net

Download the Free 500-Prompt DevOps AI Toolkit