Building a Safe Bulk Resource Tagging Workflow With AI

Last quarter I opened our cost report and found a line item I could not explain: roughly $4,100 a month sitting in a bucket labeled, helpfully, untagged. No owner. No cost center. No environment. Just a pile of EBS volumes, a few NAT gateways, and an RDS instance someone spun up for a demo that never got torn down. Nobody on the team would claim it, because nobody could prove who launched it. That is the FinOps version of a haunted house, and the only way out is tags.

The temptation is to point an LLM at the account, say “tag everything sensibly,” and hand it credentials. Do not do that. AI is a fast junior engineer: brilliant at pattern-matching naming conventions and drafting code, terrible at understanding the blast radius of a mistake. The workflow below treats the model exactly that way. It does the tedious inference; a human owns every decision that writes to the account.

Start With a Tagging Policy, Not a Prompt

Before any automation, you need a schema. If you cannot describe a “correctly tagged” resource in a config file, the AI cannot either, and neither can your auditor. We settled on three required keys and a handful of optional ones.

# tagging-policy.yaml
required_tags:
  owner:        # email or team alias, e.g. platform@corp.io
    pattern: "^[a-z0-9._%+-]+@corp\\.io$"
  cost-center:  # finance code
    pattern: "^CC-[0-9]{4}$"
  env:          # lifecycle stage
    allowed: [prod, staging, dev, sandbox]

optional_tags:
  - service        # logical app name
  - data-class     # public, internal, confidential

enforcement:
  scope:
    accounts: ["1234-prod", "5678-shared"]
    regions:  ["us-east-1", "us-west-2"]
  protected_tag_keys: ["owner", "data-class"]  # never auto-overwrite

That scope block matters more than it looks. It is your blast-radius boundary. The script will refuse to touch anything outside the named accounts and regions, so a typo in a tag value can never cascade across your whole organization. This is the same discipline I covered in blast-radius scoping for AI-driven automation: decide what the job is allowed to reach before you decide what it should do.

Audit Read-Only First

The first script reads. It never writes. Run it under an IAM role that physically cannot mutate anything, so even a bug is harmless.

{
  "Version": "2012-10-17",
  "Statement": [{
    "Sid": "TagAuditReadOnly",
    "Effect": "Allow",
    "Action": [
      "ec2:DescribeInstances",
      "ec2:DescribeVolumes",
      "rds:DescribeDBInstances",
      "tag:GetResources"
    ],
    "Resource": "*"
  }]
}

The Resource Groups Tagging API is the fastest way to find gaps across services at once:

# audit.py — read-only inventory of untagged resources
import boto3, yaml, csv, sys

policy = yaml.safe_load(open("tagging-policy.yaml"))
required = set(policy["required_tags"])
scope = policy["enforcement"]["scope"]

def find_untagged(region):
    client = boto3.client("resourcegroupstaggingapi", region_name=region)
    paginator = client.get_paginator("get_resources")
    for page in paginator.paginate(ResourcesPerPage=100):
        for r in page["ResourceTagMappingList"]:
            arn = r["ResourceARN"]
            present = {t["Key"] for t in r["Tags"]}
            missing = required - present
            if missing:
                yield arn, sorted(missing), {t["Key"]: t["Value"] for t in r["Tags"]}

writer = csv.writer(sys.stdout)
writer.writerow(["arn", "missing_tags", "existing_tags"])
for region in scope["regions"]:
    for arn, missing, existing in find_untagged(region):
        writer.writerow([arn, "|".join(missing), str(existing)])

Now you have a CSV of exactly what is broken, scoped to the accounts and regions you declared. No guessing, no surprises. This inventory is the input to everything that follows.

Pro Tip: Run the audit on a schedule and diff it week over week. A sudden spike in untagged resources usually means a new pipeline or team is provisioning outside your IaC. The tag gap is a leading indicator of shadow infrastructure.

Let AI Infer, Never Decide

Here is where the model earns its keep. Resource ARNs and names carry signal: web-prod-use1-07, analytics-staging-spark-driver, joyner-test-ebs. A human reading 600 of those will go cross-eyed. An LLM reads them in seconds and proposes likely values.

The critical rule: the model writes a proposal file, not tags. I feed it the audit CSV and the policy, and ask for structured output I can review.

You are assisting a tagging audit. For each ARN below, infer the most
likely `env` and `owner` from naming conventions ONLY. If a name gives
no clear signal, output "UNKNOWN" — do not guess. Return CSV with columns:
arn, inferred_env, inferred_owner, confidence (high|low), reasoning.
Never output a value for the protected key `data-class`.

I run this through whichever assistant is handy. Tools like Claude or ChatGPT are fine for the inference pass because they never touch the account; they only see resource names I have already exported. The credentials stay on my laptop and in the IAM role, never in the prompt.

The output looks like this, and every low confidence row gets a human eyeball before it goes anywhere:

arn,inferred_env,inferred_owner,confidence,reasoning
arn:aws:ec2:...:volume/vol-0a1,prod,UNKNOWN,low,"name 'data-vol-1' has no team token"
arn:aws:ec2:...:volume/vol-0b2,staging,analytics@corp.io,high,"'analytics-staging' matches policy"

UNKNOWN is a feature, not a failure. The model abstaining is exactly what you want from a junior who is not sure. If you want a reusable version of this inference prompt with the abstention guardrail baked in, I keep one in the prompt library and a fuller FinOps set in the prompt packs.

The Dry-Run Is the Whole Point

The proposal CSV becomes input to the write script, which defaults to dry-run. Nothing mutates unless a human passes --apply and the proposal has been reviewed. This mirrors the pattern in dry-run and simulation before automated actions: show me precisely what will change before you change it.

# tag_writer.py — dry-run by default
import boto3, csv, json, argparse, datetime, yaml

ap = argparse.ArgumentParser()
ap.add_argument("--proposals", required=True)
ap.add_argument("--apply", action="store_true")  # off unless explicit
args = ap.parse_args()

policy = yaml.safe_load(open("tagging-policy.yaml"))
protected = set(policy["enforcement"]["protected_tag_keys"])
backout = []

def plan(row):
    tags = {}
    if row["inferred_env"] not in ("", "UNKNOWN"):
        tags["env"] = row["inferred_env"]
    if row["inferred_owner"] not in ("", "UNKNOWN"):
        tags["owner"] = row["inferred_owner"]
    return {k: v for k, v in tags.items() if k not in protected or args.apply}

for row in csv.DictReader(open(args.proposals)):
    if row["confidence"] == "low":
        print(f"SKIP (low confidence): {row['arn']}")
        continue
    proposed = plan(row)
    if not proposed:
        continue
    if args.apply:
        # record prior state BEFORE writing — this is the back-out trail
        client = boto3.client("resourcegroupstaggingapi")
        current = client.get_resources(ResourceARNList=[row["arn"]])
        prior = current["ResourceTagMappingList"][0]["Tags"]
        backout.append({"arn": row["arn"], "prior_tags": prior})
        client.tag_resources(ResourceARNList=[row["arn"]], Tags=proposed)
        print(f"APPLIED {proposed} -> {row['arn']}")
    else:
        print(f"DRY-RUN would set {proposed} -> {row['arn']}")

if args.apply:
    stamp = datetime.datetime.utcnow().isoformat()
    json.dump(backout, open(f"backout-{stamp}.json", "w"), indent=2)
    print(f"Back-out written: backout-{stamp}.json")

Notice two safety behaviors. Low-confidence rows are skipped entirely, not silently tagged. And before any write, the script captures the resource’s current tags into a back-out manifest. You cannot undo what you did not record.

Batch With Rate Limits in Mind

tag_resources accepts up to 20 ARNs per call, and the API will throttle you if you sprint. When you move from 30 demo resources to 3,000 real ones, naive looping gets you ThrottlingException halfway through, leaving the account half-tagged. Batch and back off.

import time
from botocore.exceptions import ClientError

def write_batch(client, arns, tags, attempt=1):
    try:
        resp = client.tag_resources(ResourceARNList=arns, Tags=tags)
        return resp.get("FailedResourcesMap", {})
    except ClientError as e:
        if e.response["Error"]["Code"] == "ThrottlingException" and attempt <= 5:
            time.sleep(2 ** attempt)        # exponential backoff
            return write_batch(client, arns, tags, attempt + 1)
        raise

def chunk(items, n=20):
    for i in range(0, len(items), n):
        yield items[i:i + n]

Group ARNs that share the same tag set, send them in chunks of 20, and back off on throttle. Because writes are scoped per-ARN and the script is restartable from the same proposal CSV, re-running after a partial failure is safe. That restartability is just idempotency applied to tagging: running twice produces the same end state, never double damage.

Pro Tip: Tag your IaC-managed resources in the IaC, not with this script. If Terraform owns a resource, a bulk tag write will drift and get reverted on the next apply. Use this workflow for the orphans and click-ops leftovers, then fix the modules so the gap does not reopen.

The Back-Out Is Not Optional

Every apply run drops a backout-<timestamp>.json with the prior tag state of every resource it touched. Restoring is a short script that reads that file and re-applies the old values (or removes keys that did not exist before).

# rollback.py
import boto3, json, sys
client = boto3.client("resourcegroupstaggingapi")
for entry in json.load(open(sys.argv[1])):
    if entry["prior_tags"]:
        client.tag_resources(
            ResourceARNList=[entry["arn"]],
            Tags={t["Key"]: t["Value"] for t in entry["prior_tags"]})
    print(f"restored {entry['arn']}")

If a finance review three days later says half the cost-center values landed wrong, you run one command and you are back to the prior state. No archaeology, no guessing what the tags used to be. The same way an incident response workflow values a clean rollback path, a tagging job is only as safe as its undo.

Wiring It Into a Review Gate

The full loop is: audit (read-only) → AI inference (proposal file) → human review of the CSV → dry-run → approve → apply (records back-out) → verify with a fresh audit. The model never holds production credentials, never writes directly, and never gets the last word. A human approves the proposal before --apply ever runs, and the least-privilege role caps what the script can reach even if I fat-finger a flag. For larger orgs, route the proposal CSV through the same approval mechanism you use for code review so the sign-off is logged.

Conclusion

Bulk tagging is the perfect first AI automation precisely because it is boring, repetitive, and pattern-heavy. The model shines at the drudgery of reading 3,000 resource names and guessing intent. But guessing is all it does. The audit runs read-only, the inference produces a proposal not a mutation, the dry-run shows you everything before it happens, the writes are scoped and rate-aware, and the back-out file means any mistake is one command from undone. Treat the AI as your fastest junior engineer, keep your hand on the approval gate, and that haunted $4,100 line item turns into a clean, attributable, accountable cost report. For more automation patterns in this vein, the automation category collects the rest of the playbook.