Envelope Encryption in Practice: DEKs, KEKs, and Containing

Disk encryption protects data when someone steals the disk. It does nothing when an attacker is inside your application with a valid database connection — at that point the data is plaintext to the very process that has been compromised. Field- and object-level encryption closes that gap, and the standard way to do it without melting your KMS bill is envelope encryption: encrypt the data with a per-object data key, then encrypt that data key with a key that never leaves the KMS. The pattern is simple. The decisions inside it — how granular the data keys are, how rotation works, how aggressively you cache — are what determine whether a key compromise is a contained incident or a full breach. This guide walks those decisions.

Why Two Layers of Keys

Calling a KMS to encrypt every record directly does not scale: KMS APIs are rate-limited, latency-bound, and billed per call, and they cap the payload size you can hand them. Envelope encryption sidesteps all of that with a two-tier hierarchy:

A data encryption key (DEK) — a symmetric key that does the actual bulk encryption of your data, locally and fast.
A key encryption key (KEK) — a key held inside the KMS that never leaves it, used only to encrypt (wrap) the DEK.

You store the wrapped DEK right alongside the ciphertext. To decrypt, you ask the KMS to unwrap the DEK, then use the plaintext DEK locally. The KMS only ever sees a tiny wrapped key, never your data, and it does one small operation per object rather than one per byte.

# Encrypt path: get a DEK, use it locally, store only the wrapped DEK
resp = kms.generate_data_key(KeyId=KEK_ARN, KeySpec="AES_256")
plaintext_dek = resp["Plaintext"]          # use locally, then zero out
wrapped_dek    = resp["CiphertextBlob"]     # store next to the ciphertext

ciphertext = aes_gcm_encrypt(plaintext_dek, data, aad=object_context)
zeroize(plaintext_dek)
store(object_id, ciphertext, wrapped_dek)   # the KEK never left the KMS

The generate_data_key call hands back both the plaintext DEK (for immediate local use) and its wrapped form (for storage) in one round trip — that is the whole efficiency of the pattern.

DEK Granularity Decides Your Blast Radius

The single most consequential design choice is how many things share a DEK. One DEK for the entire dataset is the cheapest to manage and the worst possible outcome if that key leaks — one compromised DEK exposes everything. A DEK per tenant, per object, or per record means a leaked DEK exposes only that tenant, object, or record, and the rest of the data stays sealed.

The trade is key-management overhead: more DEKs means more wrapped keys to store and more unwrap calls. For most systems, per-tenant or per-object DEKs hit the sweet spot — meaningful blast-radius containment without a key explosion. The mistake to avoid is the convenient default of a single global DEK, because it quietly turns any key leak into a total breach. Decide granularity deliberately, and state the blast radius you are accepting.

Rotation: Re-Wrap, Don’t Re-Encrypt

Key rotation sounds expensive until you realize envelope encryption makes the cheap kind possible. Rotating the KEK does not require touching your data at all — you re-wrap the existing DEKs under the new KEK version. The bulk ciphertext is untouched; only the small wrapped keys change. Most cloud KMS services automate KEK rotation, and decryption transparently handles older KEK versions.

# KEK rotation is cheap: data is never re-encrypted, only DEKs are re-wrapped.
aws kms enable-key-rotation --key-id "$KEK_ARN"

Rotating the DEKs themselves — which you do periodically, and immediately if one is suspected compromised — does require re-encrypting the data those DEKs protect. That is more expensive, which is another argument for sensible DEK granularity: per-object DEKs let you rotate just the affected objects rather than the whole dataset. The KEK rotation handles the routine case for free; DEK rotation is your response to a specific compromise.

Caching Trades KMS Cost for Exposure Window

To hit throughput, you cache plaintext DEKs in memory so you are not calling the KMS to unwrap on every operation. This is a real and necessary optimization, and it is also a deliberate security trade: a cached plaintext DEK is a key sitting in process memory, and the longer it lives there, the wider the window for a memory-scraping attacker.

The discipline is a short TTL, memory-only storage, and never persisting an unwrapped DEK to disk. A cache hit reduces KMS calls and cost; a long TTL turns your cache into a key store an attacker can harvest. State the TTL explicitly as a security parameter, not just a performance knob — it is both.

Pro Tip: Treat the plaintext-DEK cache like a credential cache, because that is what it is. Memory-only, short TTL, and zeroed on eviction. If your threat model includes an attacker who can read process memory, every second a DEK lingers in cache is a second of exposure — tune the TTL against that, not just against your KMS bill.

Crypto Hygiene: Use AEAD, Bind the Context

Do not invent cryptography. Use a vetted authenticated cipher — AES-256-GCM is the standard choice — with a unique nonce per encryption and the authenticated associated data (AAD) bound to the object’s context. The AAD binding matters: it ties the ciphertext to where it belongs, so an attacker cannot take a valid ciphertext for object A and pass it off as object B. GCM’s authentication tag then guarantees both confidentiality and integrity, detecting any tampering on decrypt.

The same rule that governs all good encryption-at-rest design applies here: the primitives are settled, your job is to compose them correctly, and a qualified reviewer should check the final scheme before it guards production data. Rolling your own mode or mishandling nonces is how schemes that look fine in code review fail catastrophically in practice.

Access Control, Failure, and Audit

The KEK is only as protective as the policy on it. Lock the KMS key policy and IAM so only the intended service can call Decrypt — if everything can unwrap, the KMS is a speed bump, not a control. Define fail-closed behavior for new writes when the KMS is unreachable (better to refuse a write than store it unencrypted), log every decrypt call for audit, and alert on anomalous decrypt volume, which is one of the earliest signals of a compromised service exfiltrating data at scale.

Let AI Draft the Design, Have a Human Verify the Crypto

Envelope encryption has enough moving parts — granularity, rotation, caching, AAD, fail modes — that a model is genuinely useful for drafting the architecture and surfacing the trade-offs you might skip:

Prompt: “Design envelope encryption for per-tenant PII in a multi-tenant app on AWS KMS, ~2,000 decrypts/sec. Specify DEK granularity and the blast radius it gives me, how KEK rotation avoids data re-encryption, a DEK cache TTL with its stated risk, the AEAD cipher and AAD binding, and fail-closed/audit behavior. Flag anything that trades blast radius for convenience, and tell me what a cryptographer should review.”

Output (excerpt): “Per-tenant DEK: a leaked DEK exposes one tenant, not all — accept the overhead of one wrapped DEK per tenant. KEK rotation: enable KMS auto-rotation; data is never re-encrypted, only tenant DEKs re-wrap under the new KEK version. Cache: plaintext DEK TTL ~60s, memory-only, zeroed on eviction — risk: a memory-scrape attacker has up to 60s of exposure per cached tenant key. AES-256-GCM, unique nonce per write, AAD = tenant_id||object_id to prevent cross-object reuse. Cryptographer should verify nonce uniqueness guarantees and the AAD scheme before production.”

The model drafts a coherent design and names the trades, but a qualified human verifies the cryptographic choices and a cryptographer signs off before it protects real data — the AI is a drafting partner, not the final reviewer. For a structured design pass with the blast-radius and crypto-hygiene guardrails built in, the envelope encryption DEK/KEK prompt walks the full hierarchy, rotation, and caching trade-offs.

Envelope Encryption in Practice: DEKs, KEKs, and Containing a Key Compromise