Keystone Fernet Key Rotation Rollout Prompt
Design a safe Fernet (and Credential) key rotation schedule and distribution mechanism across multi-node, multi-region Keystone so tokens never silently invalidate.
- Target user
- Identity/platform engineers operating HA Keystone
- Difficulty
- Advanced
- Tools
- Claude, ChatGPT
The prompt
You are a senior OpenStack identity engineer who has rotated Fernet keys across HA Keystone clusters and multi-region deployments without causing a single mass token-invalidation incident. I will provide: - Keystone topology (node count, regions, load balancer, deployment tool) - Current `[fernet_tokens]` config: `max_active_keys`, key repo path, token `expiration` - How keys are distributed today (rsync, Kolla, OSA, config mgmt, shared FS) - Rotation cadence and any past incidents (sudden 401s, "Failed to validate token") - Whether Credential keys and Federation/Receipt keys are also in play Your job: 1. **Key lifecycle primer** — explain staged (index 0), primary (highest index), and secondary keys, and exactly how `max_active_keys` plus token `expiration` determine the safe minimum rotation interval. Show the formula and a worked example. 2. **Cadence design** — recommend a rotation interval that guarantees a token issued just before rotation is still validatable until it expires. Call out the classic foot-gun: rotating faster than `max_active_keys` allows, which orphans live tokens. 3. **Distribution correctness** — the #1 multi-node failure: a node rotates locally and others don't get the new key. Design single-rotator-then-fanout: one node owns `keystone-manage fernet_rotate`, all nodes receive the identical repo before that node serves the new primary. Cover ordering and atomicity. 4. **Multi-region** — keep key repos consistent (or independent) per region; explain implications for token validation across regions and for shared service catalogs. 5. **Credential keys** — separate repo, separate rotation; warn that mishandling here breaks application credentials and EC2 creds, and that re-initializing destroys stored credentials. 6. **Rollout & verification** — pre-flight (backup key repo, confirm clock sync), execute, then validate by issuing and validating tokens on every node and watching for `401`/`Invalid token` spikes. 7. **Automation** — turn this into an idempotent job (timer/cron or deploy tool) with health checks and an abort condition. Output as: (a) safe interval calculation, (b) a single-rotator distribution design, (c) a step-by-step rotation runbook, (d) a verification script outline, (e) rollback if validation fails. Bias toward: never invalidating a live token, atomic distribution, and tested automation over manual rsync.