You are a senior SRE and Redis expert who reviews replication topologies for correctness and failover readiness. I will provide: - The topology (how many replicas, chained or flat) - `INFO replication` from primary and replicas - Read/write routing at the app Your job: 1. **Confirm the topology**: each replica set with `replicaof <primary-host> <port>` (or `REPLICAOF` at runtime). Chained replicas (replica-of-a-replica) reduce primary load but add lag. 2. **Check sync state**: `INFO replication` → `role`, `master_link_status:up`, `connected_slaves`, and per-replica `slaveN:...,state=online,offset=...,lag=...`. 3. **Measure lag**: compare primary `master_repl_offset` to each replica's `slave_repl_offset`. Growing gap = replica falling behind. 4. **Full vs partial resync**: check `sync_full` and `sync_partial_ok/err`. Frequent full syncs are expensive — size `repl-backlog-size` so partial resync survives brief disconnects. 5. **`replica-read-only`**: keep `yes` — writes to a replica are lost on the next full sync and cause split data. 6. **Timeouts and buffers**: tune `repl-timeout`, and `client-output-buffer-limit replica ...` so a slow replica isn't force-disconnected (triggering a full resync loop). 7. **Diskless sync**: `repl-diskless-sync yes` streams the RDB over the socket — useful when disk is slow but network is fast. 8. **Durability note**: replication is async by default; `WAIT <numreplicas> <ms>` blocks until N replicas ack for stronger guarantees. Mark DESTRUCTIVE: writing to a replica with `replica-read-only no` (data diverges), `REPLICAOF NO ONE` promoting the wrong node (split-brain), `FLUSHALL` on a primary (propagates to replicas), and `KEYS *`/`DEBUG` on prod. --- Topology: [DESCRIBE] INFO replication (primary + replicas): [PASTE] App read/write routing: [DESCRIBE]

Why this prompt works

Replication looks healthy until a failover exposes lag, a diverged replica, or a full-resync storm. This prompt reads the real INFO replication fields — master_link_status, offsets, sync_full — that reveal whether replicas are actually caught up and whether a failover would be safe, and it enforces replica-read-only to prevent the silent data-divergence bug.

How to use it

Draw the topology — flat vs chained changes the lag budget.
Paste INFO replication from every node so offsets can be compared.
Describe read routing — reads off replicas must tolerate lag.
Note durability needs — mention if WAIT guarantees are required.

Useful commands

# On the primary
redis-cli INFO replication
# role:master, connected_slaves:N, slave0:...state=online,offset=...,lag=0
redis-cli -h primary INFO replication | grep master_repl_offset

# On a replica
redis-cli -h replica INFO replication | grep -E 'role|master_link_status|slave_repl_offset|master_sync_in_progress'

# Resync accounting
redis-cli INFO stats | grep -E 'sync_full|sync_partial_ok|sync_partial_err'

# Wait for N replicas to ack the last write (stronger durability)
redis-cli WAIT 1 1000

# Configure a replica at runtime
redis-cli -h replica REPLICAOF primary 6379
redis-cli -h replica CONFIG SET replica-read-only yes

Example config

# redis.conf on a replica
replicaof 10.0.0.10 6379
replica-read-only yes
repl-backlog-size 128mb
repl-backlog-ttl 3600
repl-timeout 60
repl-diskless-sync yes
repl-diskless-sync-delay 5
# Don't force-disconnect a lagging replica too aggressively
client-output-buffer-limit replica 256mb 64mb 60

Common findings this catches

Growing offset gap → replica lag; reads serve stale data.
master_link_status:down → broken replication link.
Frequent sync_full → backlog too small or buffer-limit disconnects.
replica-read-only no → writes to replica silently diverge.
Chained replica lag → deep chains amplify staleness.
No WAIT where durability matters → acked writes can be lost on failover.

When to escalate

Automatic failover requirements — move to Sentinel or Cluster.
Cross-region replication lag — needs network/topology redesign.
Repeated full-resync storms harming the primary — capacity and backlog review.

Redis Replication Setup Review Prompt

Why this prompt works

How to use it

Useful commands

Example config

Common findings this catches

When to escalate

Related prompts

Redis Backup and Migration Plan Prompt

Redis Cluster Sharding Design Prompt

Redis Persistence RDB/AOF Config Prompt

Redis Sentinel High Availability Design Prompt

Why this prompt works

How to use it

Useful commands

Example config

Common findings this catches

When to escalate

Related prompts

Redis Backup and Migration Plan Prompt

Redis Cluster Sharding Design Prompt

Redis Persistence RDB/AOF Config Prompt

Redis Sentinel High Availability Design Prompt

Free: the DevOps AI Incident-Triage Cheat Sheet