Managing Cloud KMS With AI: Rotation, IAM, and CMEK

There’s a category of cloud mistake where the cleanup is the disaster. Cloud KMS is the cleanest example I know. Someone tidying up “unused” keys disables an old key version, and somewhere a Cloud Storage bucket full of CMEK-encrypted objects becomes permanently unreadable — because the data was encrypted under that exact version and KMS will not decrypt it without that version enabled. No restore, no support ticket that brings it back. KMS rewards a specific mental model: keys have versions, data is bound to the version that encrypted it, and rotation creates new versions without migrating old data. Get that model right and key management is routine; get it wrong and you strand data. AI is a useful partner here, but only inside firm guardrails, because the failure mode is irreversible.

Least privilege and separation of duties

KMS IAM splits into two worlds that should rarely overlap: admins who manage keys, and the cryptoKeyEncrypterDecrypter role that services use to actually encrypt and decrypt. The audit findings I see most are a single principal holding both, broad grants at the key-ring level that should sit on one key, and human users holding the service role.

gcloud kms keys get-iam-policy my-key \
  --keyring=my-ring --location=us-central1

Prompt: “Here’s the IAM policy on a Cloud KMS key. Flag any principal that holds both key-admin and cryptoKeyEncrypterDecrypter (separation-of-duties violation), any grant at the key-ring level that should be scoped to a single key, and any human user holding the encrypt/decrypt service role. Recommend a least-privilege layout: service accounts for encrypt/decrypt, a separate audited admin group, and per-key scoping.”

Separation of duties isn’t bureaucratic box-checking here — it’s what stops one compromised credential from both reading data and covering its tracks by managing the key.

Rotation that doesn’t strand data

The most dangerous misconception about rotation is that it re-encrypts existing data. It doesn’t. Rotation creates a new primary key version that encrypts new data; everything already encrypted stays on its original version until something rewrites it. So old versions must remain enabled, and automatic rotation is safe precisely because it doesn’t touch existing ciphertext.

gcloud kms keys update my-key --keyring=my-ring \
  --location=us-central1 --rotation-period=90d --next-rotation-time=...

Prompt: “We want to set 90-day automatic rotation on this key, which encrypts CMEK data in Cloud Storage and BigQuery. Confirm what rotation actually does — that it creates a new version for new data and does NOT re-encrypt existing data — and explain why old key versions must stay ENABLED for existing data to remain readable. Then give the rotation config and what to monitor.”

That clarification prevents the catastrophic follow-on action: a well-meaning engineer setting rotation, assuming the data has migrated, and then destroying the old versions. The model spelling out that existing data stays on its original version is what stops the disaster before it starts.

CMEK coverage and the dependency it creates

Customer-managed encryption keys let you control the keys behind managed services, but they also create a hard dependency: if the key is disabled or the service agent loses its grant, the service loses access to its data instantly. So CMEK review is partly a coverage check and partly a dependency map.

Prompt: “For each of these services holding sensitive data — Cloud Storage, BigQuery, Compute disks — check whether CMEK is applied and that the right service agent has cryptoKeyEncrypterDecrypter on the key. For each CMEK relationship, state the dependency explicitly: what breaks if this key is disabled or this grant is removed. I want to know exactly what I’d strand.”

Knowing the dependency map is what makes every later decision safe. Before anyone disables a key or revokes a grant, the map tells them what data hangs off it.

Destroy safety: the reversible path

When a key version genuinely needs to go, the safe sequence is disable first, watch for breakage, then schedule destruction with the built-in delay — never immediate destruction.

Prompt: “I believe an old key version is no longer protecting any data and want to remove it. Give me a safe procedure: how to confirm nothing is still encrypted under it, why I should disable it first as a reversible test, and why scheduled destruction with the delay window is safer than immediate destruction. Treat this as potentially data-stranding and tell me what to verify before each step.”

Disabling is reversible — re-enable and access returns. Destruction is not. The scheduled-destruction delay exists as a last chance to catch a mistake, and I use it every time.

The honest division of labor

AI is strong at the structural KMS work: auditing key IAM for separation-of-duties violations, mapping CMEK dependencies, and laying out a safe rotation or destruction procedure. The rules follow from how versions and grants work, which is well-documented, so the model is reliable on them. What it cannot know is whether some forgotten archive bucket still holds objects encrypted under the version you’re about to destroy — only your inventory knows that.

So the guardrails are absolute: I never destroy a key version or revoke a service agent’s grant on the model’s word alone, and I always prefer disable-then-scheduled-destroy over immediate action. The reusable prompts live in my prompts library, and the GCP with AI series covers the identity layer these grants depend on, including least-privilege IAM for the service accounts that hold encrypt/decrypt rights. With KMS, the slow, reversible path is the professional one.