Redis Distributed Lock and Redlock Design Prompt
Design SET NX PX distributed locks safely, evaluate Redlock and its controversy, and add fencing tokens to avoid split-brain writes.
- Target user
- Engineers building distributed mutual exclusion on Redis
- Difficulty
- Advanced
- Tools
- Claude, ChatGPT
The prompt
You are a senior distributed-systems engineer who has built and debugged Redis locks in production and understands the Redlock debate in depth. I will provide: - What resource I need to protect - My availability and correctness requirements - My current locking code Your job: 1. **Single-instance lock**: use `SET lock:<resource> <unique-token> NX PX <ttl>`. NX ensures only one holder; PX auto-expires so a crashed holder cannot deadlock. NEVER use SETNX + separate EXPIRE (non-atomic). 2. **Safe release**: release must be a Lua compare-and-delete — check the token matches before DEL, so a client cannot delete a lock it no longer owns after TTL expiry. 3. **Choose TTL carefully**: TTL must exceed worst-case critical-section time, or the lock expires mid-work and another client enters. 4. **Explain Redlock**: the multi-node algorithm acquiring the lock on a majority of N independent masters. State the Kleppmann critique — it is NOT safe under GC pauses, clock jumps, or network delays for correctness-critical work — and the antirez rebuttal. 5. **Recommend fencing tokens**: a monotonically increasing token (e.g. from INCR) passed to the protected resource so a stale lock holder's writes are rejected. Locks alone cannot guarantee safety without fencing. 6. **Decide correctness vs efficiency**: if the lock is only an optimization (avoid duplicate work), single-instance SET NX PX is fine. If correctness depends on mutual exclusion, Redis locks are the wrong tool — use a consensus system (ZooKeeper/etcd) with fencing. 7. **Handle renewal**: long jobs need a watchdog that extends PX only while the token still matches (Lua). 8. **Avoid pitfalls**: no un-tokened DEL, no unbounded retries, jittered backoff on contention. Mark DESTRUCTIVE: DEL of a lock you do not own, FLUSHALL in tests, disabling PX (permanent deadlock risk). --- Resource to protect: [DESCRIBE] Correctness vs efficiency: [DESCRIBE] Current lock code: [PASTE]
Why this prompt works
Distributed locking on Redis is a minefield of subtly wrong implementations. This prompt makes the model separate the efficiency case (a single SET NX PX is fine) from the correctness case (where Redis locks are insufficient without fencing), and forces an honest treatment of the Redlock controversy instead of cargo-culting the algorithm.
How to use it
- State whether the lock is for efficiency or correctness — this changes everything.
- Provide worst-case critical-section duration so TTL is sane.
- Paste your acquire and release code — most bugs live in release.
- Ask explicitly about fencing tokens if writes must be safe.
Useful commands
# Acquire: atomic set-if-not-exists with TTL and a unique token
redis-cli SET lock:job:report "c3f9-uuid-token" NX PX 30000
# Release: compare-and-delete (only if we still own it)
redis-cli EVAL "if redis.call('GET', KEYS[1]) == ARGV[1] then \
return redis.call('DEL', KEYS[1]) else return 0 end" \
1 lock:job:report "c3f9-uuid-token"
# Renew (watchdog): extend only if token matches
redis-cli EVAL "if redis.call('GET', KEYS[1]) == ARGV[1] then \
return redis.call('PEXPIRE', KEYS[1], ARGV[2]) else return 0 end" \
1 lock:job:report "c3f9-uuid-token" 30000
# Fencing token source (monotonic)
redis-cli INCR fence:job:report
Example config
# Full lock lifecycle with a fencing token
TOKEN=$(uuidgen)
FENCE=$(redis-cli INCR fence:resource:42)
# 1. Acquire
redis-cli SET lock:resource:42 "$TOKEN" NX PX 20000
# -> "OK" means acquired; nil means someone else holds it
# 2. Do work, passing $FENCE to the protected store.
# The store must REJECT any write whose fence <= last-seen fence.
# 3. Release atomically by token
redis-cli EVAL "if redis.call('GET', KEYS[1]) == ARGV[1] then \
return redis.call('DEL', KEYS[1]) else return 0 end" \
1 lock:resource:42 "$TOKEN"
Common findings this catches
- Un-tokened DEL → deletes another owner’s lock.
- SETNX + EXPIRE → non-atomic, deadlock on crash.
- No fencing → stale holder corrupts data after pause.
- TTL too short → double entry into critical section.
- Redlock for correctness → false safety under clock drift.
- Unbounded retries → thundering herd on hot lock.
- Runaway renewal → wedged worker holds lock forever.
When to escalate
- Correctness-critical mutual exclusion — move to etcd/ZooKeeper with fencing.
- Cross-region locking — reconsider the whole design.
- Recurring split-brain writes — data platform and reliability review.
Related prompts
-
Redis Lua Scripting Review Prompt
Review Redis Lua scripts — EVAL/EVALSHA, atomicity, KEYS vs ARGV, and script safety — to keep server-side logic correct and non-blocking.
-
Redis Transactions MULTI/EXEC Design Prompt
Design correct Redis transactions with MULTI/EXEC/WATCH optimistic locking and understand atomicity limits and rollback behavior.
-
Redis TTL and Expiration Strategy Prompt
Design TTL hygiene with EXPIRE/PEXPIRE, understand active vs lazy expiry, and avoid immortal keys and expiry-driven latency spikes.