Automating Secrets Rotation Without Taking Down Production
Static credentials that never rotate are a breach waiting to happen. Here's how to automate rotation for database creds, API keys, and certs without a single outage.
- #security
- #hardening
- #secrets
- #rotation
- #vault
- #automation
Every team has them: the database password set during the initial deploy three years ago, the API key pasted into a config map and forgotten, the service-account credential that’s been valid since before half the team was hired. Static, long-lived secrets are the credential equivalent of a door that’s been propped open so long nobody remembers it’s supposed to close.
Rotation fixes this — but the reason rotation doesn’t happen is fear. Everyone has a story about the rotation that took down production at the worst moment. So the real skill isn’t rotating secrets; it’s rotating them without anyone noticing. That requires automation built around one core idea.
The principle: overlap, never swap
The naive approach — invalidate the old secret, set the new one, restart the app — guarantees a window where in-flight requests fail. Don’t do that. Every safe rotation follows an overlap pattern:
- Generate the new credential.
- Make the new credential valid alongside the old one.
- Roll the new credential out to all consumers.
- Confirm nothing is still using the old one.
- Only then, revoke the old credential.
The system accepts both secrets during the overlap window. No request ever hits a moment where neither the old nor the new credential works. This single pattern is the difference between rotation that’s routine and rotation that’s terrifying.
Dynamic secrets: rotation by design
The cleanest version of rotation is to never have a static secret at all. Vault’s dynamic secrets engine generates a fresh, short-lived database credential per request and revokes it when the lease expires:
# Configure Vault to mint Postgres creds on demand
vault write database/roles/app-readonly \
db_name=appdb \
creation_statements="CREATE ROLE \"{{name}}\" WITH LOGIN PASSWORD '{{password}}' VALID UNTIL '{{expiration}}'; GRANT SELECT ON ALL TABLES IN SCHEMA public TO \"{{name}}\";" \
default_ttl="1h" \
max_ttl="24h"
The app requests a credential, gets one that lives for an hour, and Vault cleans it up automatically:
vault read database/creds/app-readonly
# username: v-token-app-readonly-x7d... password: A1a-... lease_duration: 1h
There’s nothing to rotate because nothing is permanent. A leaked credential is dead within the hour, and you never run a rotation job at all. This is the gold standard — reach for it whenever the backend supports it.
Rotating what you can’t make dynamic
Not everything supports dynamic issuance. For a static third-party API key, you script the overlap explicitly. Many providers let you have two active keys precisely so you can rotate without downtime:
#!/usr/bin/env bash
set -euo pipefail
# 1. Create the new key (provider allows 2 active keys)
NEW_KEY=$(provider-cli keys create --name "app-$(date +%Y%m%d)" -o json | jq -r .key)
# 2. Write it to the secret store, where consumers read it
vault kv put secret/app/provider api_key="$NEW_KEY"
# 3. Trigger a rolling restart so pods pick up the new value
kubectl rollout restart deployment/app -n production
kubectl rollout status deployment/app -n production --timeout=300s
# 4. Confirm zero usage of the old key, THEN revoke it
sleep 600 # drain window
provider-cli keys revoke --name "app-old"
The rollout status gate matters: you don’t revoke the old key until every pod is confirmed running the new one. The drain window covers any long-lived connection or queued job still holding the old value.
For consumers that can reload without a restart, even better — a sidecar like Vault Agent or the CSI Secrets Store driver can refresh the mounted secret and signal the app, so rotation doesn’t even cost a deploy.
Don’t break on rotation: build for reload
Half of rotation pain comes from apps that read a secret exactly once at boot and cache it forever. Such an app requires a restart to pick up a new credential, which couples rotation to deploys. Where you control the code, make it re-read on a signal or watch the secret file:
// Reload credentials on SIGHUP instead of requiring a restart
sig := make(chan os.Signal, 1)
signal.Notify(sig, syscall.SIGHUP)
go func() {
for range sig {
creds.Reload() // re-read from mounted secret
}
}()
An app that can hot-reload its secrets makes rotation a non-event.
Make it scheduled and boring
Rotation that depends on someone remembering is rotation that doesn’t happen. Put it on a timer and let the absence of alerts confirm success:
# systemd timer: rotate weekly, off-hours
[Timer]
OnCalendar=Sun 03:00
Persistent=true
Then alert on staleness, not just failure. A dashboard query for “secrets older than 90 days” turns silent drift into a visible, fixable list. The secret that never rotates is invisible until it leaks; a staleness alert makes it loud.
Test the rotation, including the rollback
The rotation script you’ve never run in anger is a liability, not an asset. Exercise it regularly in staging, and crucially, test the rollback — what happens if the new credential is bad? Your automation should verify the new secret works before revoking the old one:
# Validate before committing to the new credential
if ! psql "postgresql://$NEW_USER:$NEW_PASS@$DB_HOST/appdb" -c 'SELECT 1' >/dev/null 2>&1; then
echo "New credential failed validation; aborting, old key still valid"
exit 1
fi
That guard means a botched rotation degrades to “nothing changed” instead of “everything is down.”
Where to start
You don’t have to boil the ocean. Pick your highest-risk static credential — usually a database password or a cloud access key with broad permissions — and automate just that one, end to end, with the overlap pattern. Get the muscle memory, prove it doesn’t cause outages, then template it for the next ten. Reviewing rotation scripts through automated code review before they touch production catches the missing validation gate that turns a routine rotation into an incident, and the wider security hardening guides cover how rotation fits with the rest of your credential hygiene.
The goal is a world where every secret has a known age, a known owner, and an automated path to replacement — so that when one inevitably leaks, the blast radius is hours, not years.
Rotation scripts are templates. Always validate new credentials before revoking old ones, and test rotation and rollback in a non-production environment first.
Download the Free 500-Prompt DevOps AI Toolkit
500 battle-tested, copy-paste AI prompts engineered by a senior systems engineer — every one with fill-in placeholders and safety/back-out notes. Drop your email and it's yours.
- 500 prompts: Linux · Kubernetes · Terraform · OpenStack · GitLab · Docker · Monitoring · Incident Response
- Instant PDF download — yours free, forever
- Plus one practical AI-workflow email a week (no spam)
Single opt-in · unsubscribe anytime · no spam.