Redis Replication Setup Review Prompt
Review Redis primary/replica topology — replicaof, replica-read-only, sync health, and lag — for read scaling and failover readiness.
- Target user
- SREs operating Redis replicas
- Difficulty
- Intermediate
- Tools
- Claude, ChatGPT
The prompt
You are a senior SRE and Redis expert who reviews replication topologies for correctness and failover readiness. I will provide: - The topology (how many replicas, chained or flat) - `INFO replication` from primary and replicas - Read/write routing at the app Your job: 1. **Confirm the topology**: each replica set with `replicaof <primary-host> <port>` (or `REPLICAOF` at runtime). Chained replicas (replica-of-a-replica) reduce primary load but add lag. 2. **Check sync state**: `INFO replication` → `role`, `master_link_status:up`, `connected_slaves`, and per-replica `slaveN:...,state=online,offset=...,lag=...`. 3. **Measure lag**: compare primary `master_repl_offset` to each replica's `slave_repl_offset`. Growing gap = replica falling behind. 4. **Full vs partial resync**: check `sync_full` and `sync_partial_ok/err`. Frequent full syncs are expensive — size `repl-backlog-size` so partial resync survives brief disconnects. 5. **`replica-read-only`**: keep `yes` — writes to a replica are lost on the next full sync and cause split data. 6. **Timeouts and buffers**: tune `repl-timeout`, and `client-output-buffer-limit replica ...` so a slow replica isn't force-disconnected (triggering a full resync loop). 7. **Diskless sync**: `repl-diskless-sync yes` streams the RDB over the socket — useful when disk is slow but network is fast. 8. **Durability note**: replication is async by default; `WAIT <numreplicas> <ms>` blocks until N replicas ack for stronger guarantees. Mark DESTRUCTIVE: writing to a replica with `replica-read-only no` (data diverges), `REPLICAOF NO ONE` promoting the wrong node (split-brain), `FLUSHALL` on a primary (propagates to replicas), and `KEYS *`/`DEBUG` on prod. --- Topology: [DESCRIBE] INFO replication (primary + replicas): [PASTE] App read/write routing: [DESCRIBE]
Why this prompt works
Replication looks healthy until a failover exposes lag, a diverged replica, or a full-resync storm. This prompt reads the real INFO replication fields — master_link_status, offsets, sync_full — that reveal whether replicas are actually caught up and whether a failover would be safe, and it enforces replica-read-only to prevent the silent data-divergence bug.
How to use it
- Draw the topology — flat vs chained changes the lag budget.
- Paste
INFO replicationfrom every node so offsets can be compared. - Describe read routing — reads off replicas must tolerate lag.
- Note durability needs — mention if
WAITguarantees are required.
Useful commands
# On the primary
redis-cli INFO replication
# role:master, connected_slaves:N, slave0:...state=online,offset=...,lag=0
redis-cli -h primary INFO replication | grep master_repl_offset
# On a replica
redis-cli -h replica INFO replication | grep -E 'role|master_link_status|slave_repl_offset|master_sync_in_progress'
# Resync accounting
redis-cli INFO stats | grep -E 'sync_full|sync_partial_ok|sync_partial_err'
# Wait for N replicas to ack the last write (stronger durability)
redis-cli WAIT 1 1000
# Configure a replica at runtime
redis-cli -h replica REPLICAOF primary 6379
redis-cli -h replica CONFIG SET replica-read-only yes
Example config
# redis.conf on a replica
replicaof 10.0.0.10 6379
replica-read-only yes
repl-backlog-size 128mb
repl-backlog-ttl 3600
repl-timeout 60
repl-diskless-sync yes
repl-diskless-sync-delay 5
# Don't force-disconnect a lagging replica too aggressively
client-output-buffer-limit replica 256mb 64mb 60
Common findings this catches
- Growing offset gap → replica lag; reads serve stale data.
master_link_status:down→ broken replication link.- Frequent
sync_full→ backlog too small or buffer-limit disconnects. replica-read-only no→ writes to replica silently diverge.- Chained replica lag → deep chains amplify staleness.
- No
WAITwhere durability matters → acked writes can be lost on failover.
When to escalate
- Automatic failover requirements — move to Sentinel or Cluster.
- Cross-region replication lag — needs network/topology redesign.
- Repeated full-resync storms harming the primary — capacity and backlog review.
Related prompts
-
Redis Backup and Migration Plan Prompt
Plan Redis backups with BGSAVE/RDB, move keys with DUMP/RESTORE and MIGRATE, and sequence safe version upgrades and data migrations.
-
Redis Cluster Sharding Design Prompt
Design Redis Cluster sharding — 16384 hash slots, resharding, hash tags, and multi-key operation constraints across shards.
-
Redis Persistence RDB/AOF Config Prompt
Configure Redis durability — RDB snapshots vs AOF, appendfsync policy, and hybrid persistence — balancing data safety against latency.
-
Redis Sentinel High Availability Design Prompt
Design Redis Sentinel HA — quorum, automatic failover, and client discovery — for resilient primary/replica setups without Cluster.