Skip to content
DevOps AI ToolKit
Newsletter
All prompts
AI for Redis Difficulty: Intermediate ClaudeChatGPT

Redis Replication Setup Review Prompt

Review Redis primary/replica topology — replicaof, replica-read-only, sync health, and lag — for read scaling and failover readiness.

Target user
SREs operating Redis replicas
Difficulty
Intermediate
Tools
Claude, ChatGPT

The prompt

You are a senior SRE and Redis expert who reviews replication topologies for correctness and failover readiness.

I will provide:
- The topology (how many replicas, chained or flat)
- `INFO replication` from primary and replicas
- Read/write routing at the app

Your job:

1. **Confirm the topology**: each replica set with `replicaof <primary-host> <port>` (or `REPLICAOF` at runtime). Chained replicas (replica-of-a-replica) reduce primary load but add lag.
2. **Check sync state**: `INFO replication` → `role`, `master_link_status:up`, `connected_slaves`, and per-replica `slaveN:...,state=online,offset=...,lag=...`.
3. **Measure lag**: compare primary `master_repl_offset` to each replica's `slave_repl_offset`. Growing gap = replica falling behind.
4. **Full vs partial resync**: check `sync_full` and `sync_partial_ok/err`. Frequent full syncs are expensive — size `repl-backlog-size` so partial resync survives brief disconnects.
5. **`replica-read-only`**: keep `yes` — writes to a replica are lost on the next full sync and cause split data.
6. **Timeouts and buffers**: tune `repl-timeout`, and `client-output-buffer-limit replica ...` so a slow replica isn't force-disconnected (triggering a full resync loop).
7. **Diskless sync**: `repl-diskless-sync yes` streams the RDB over the socket — useful when disk is slow but network is fast.
8. **Durability note**: replication is async by default; `WAIT <numreplicas> <ms>` blocks until N replicas ack for stronger guarantees.

Mark DESTRUCTIVE: writing to a replica with `replica-read-only no` (data diverges), `REPLICAOF NO ONE` promoting the wrong node (split-brain), `FLUSHALL` on a primary (propagates to replicas), and `KEYS *`/`DEBUG` on prod.

---

Topology: [DESCRIBE]
INFO replication (primary + replicas): [PASTE]
App read/write routing: [DESCRIBE]

Why this prompt works

Replication looks healthy until a failover exposes lag, a diverged replica, or a full-resync storm. This prompt reads the real INFO replication fields — master_link_status, offsets, sync_full — that reveal whether replicas are actually caught up and whether a failover would be safe, and it enforces replica-read-only to prevent the silent data-divergence bug.

How to use it

  1. Draw the topology — flat vs chained changes the lag budget.
  2. Paste INFO replication from every node so offsets can be compared.
  3. Describe read routing — reads off replicas must tolerate lag.
  4. Note durability needs — mention if WAIT guarantees are required.

Useful commands

# On the primary
redis-cli INFO replication
# role:master, connected_slaves:N, slave0:...state=online,offset=...,lag=0
redis-cli -h primary INFO replication | grep master_repl_offset

# On a replica
redis-cli -h replica INFO replication | grep -E 'role|master_link_status|slave_repl_offset|master_sync_in_progress'

# Resync accounting
redis-cli INFO stats | grep -E 'sync_full|sync_partial_ok|sync_partial_err'

# Wait for N replicas to ack the last write (stronger durability)
redis-cli WAIT 1 1000

# Configure a replica at runtime
redis-cli -h replica REPLICAOF primary 6379
redis-cli -h replica CONFIG SET replica-read-only yes

Example config

# redis.conf on a replica
replicaof 10.0.0.10 6379
replica-read-only yes
repl-backlog-size 128mb
repl-backlog-ttl 3600
repl-timeout 60
repl-diskless-sync yes
repl-diskless-sync-delay 5
# Don't force-disconnect a lagging replica too aggressively
client-output-buffer-limit replica 256mb 64mb 60

Common findings this catches

  • Growing offset gap → replica lag; reads serve stale data.
  • master_link_status:down → broken replication link.
  • Frequent sync_full → backlog too small or buffer-limit disconnects.
  • replica-read-only no → writes to replica silently diverge.
  • Chained replica lag → deep chains amplify staleness.
  • No WAIT where durability matters → acked writes can be lost on failover.

When to escalate

  • Automatic failover requirements — move to Sentinel or Cluster.
  • Cross-region replication lag — needs network/topology redesign.
  • Repeated full-resync storms harming the primary — capacity and backlog review.

Related prompts

Newsletter

Free: the DevOps AI Incident-Triage Cheat Sheet

Subscribe and we’ll send you the one-page cheat sheet — plus weekly AI prompts, automation ideas, and tool reviews for infrastructure engineers. One email a week. No spam, unsubscribe anytime.

  • AI Incident-Triage Cheat Sheet (PDF)
  • Access to 2,104 DevOps AI prompts
  • One practical workflow email per week