Redis Error Guide: Replica Stuck in Repeated Full Resync

Overview

Redis replication is designed so a briefly-disconnected replica can reattach with a partial resync — the master replays only the commands the replica missed from its in-memory replication backlog. A full resync is the heavyweight fallback: the master forks, produces a full RDB, ships it, and the replica reloads the entire dataset. When partial resync keeps failing, the replica falls into a loop of repeated full resyncs — each one forking the master, saturating the network, and never “catching up.” The sync_full counter in INFO stats climbs steadily, which is the clearest signal.

There is no single error string; the pattern shows up in logs:

# Master log
Partial resynchronization not accepted: Replication ID mismatch (Replica asked for '3f9c...', my replication IDs are 'a1b2...' and '0000...')
Starting BGSAVE for SYNC with target: disk
Background saving started by pid 8123

# Replica log (repeating)
Full resync from master: a1b2...:14311988
MASTER <-> REPLICA sync: Loading DB in memory
Connecting to MASTER 10.0.0.5:6379   # ...and it starts over

The core issue is that partial resync is being refused — usually because the backlog is too small to cover the disconnect, or the master’s replication ID changed.

Symptoms

INFO stats sync_full increments repeatedly; sync_partial_ok stays flat while sync_partial_err grows.
Master CPU/fork and network spike on a cycle; the replica never reaches a stable master_link_status:up for long.
Master log repeatedly shows “Partial resynchronization not accepted” and “Starting BGSAVE for SYNC”.
Replica log loops through “Full resync … Loading DB … Connecting to MASTER”.

redis-cli -h <master> INFO stats | grep -E 'sync_full|sync_partial'

sync_full:47
sync_partial_ok:0
sync_partial_err:46

Common Root Causes

1. `repl-backlog-size` too small for the disconnect window

If the replica is offline longer than the backlog can hold (or write volume is high), the needed offset ages out and only a full resync is possible.

redis-cli -h <master> CONFIG GET repl-backlog-size
redis-cli -h <master> INFO replication | grep -E 'repl_backlog_active|repl_backlog_histlen|master_repl_offset'

repl-backlog-size 1mb    # too small for a busy master

2. Replica output buffer limit killing the transfer mid-sync

If the RDB transfer or command stream exceeds client-output-buffer-limit slave, the master drops the replica mid-sync and it restarts full.

redis-cli -h <master> CONFIG GET client-output-buffer-limit
redis-cli -h <master> journalctl 2>/dev/null | grep -i 'output buffer'

client-output-buffer-limit slave 256mb 64mb 60

3. Master replication ID changed (restart / failover)

After a master restart or failover, the replication ID changes, so the replica’s cached replid no longer matches and partial resync is refused.

Partial resynchronization not accepted: Replication ID mismatch

4. Network instability / `repl-timeout` too low

A flaky link or a short repl-timeout aborts the transfer before the RDB finishes loading, forcing a restart of the whole sync.

redis-cli -h <master> CONFIG GET repl-timeout
redis-cli INFO replication | grep master_link_status

Diagnostic Workflow

Step 1: Confirm the full-resync loop

redis-cli -h <master> INFO stats | grep -E 'sync_full|sync_partial_ok|sync_partial_err'
watch -n2 "redis-cli -h <master> INFO stats | grep sync_full"

A steadily rising sync_full with sync_partial_err climbing = the loop.

Step 2: Read both logs for the refusal reason

journalctl -u redis --no-pager | grep -iE 'resync|BGSAVE|Replication ID|output buffer|Loading DB' | tail -30

“Replication ID mismatch” → master restarted/failed over; “output buffer” → buffer limit; nothing but repeated BGSAVE → backlog too small.

Step 3: Check backlog sizing vs. write rate

redis-cli -h <master> INFO replication | grep -E 'repl_backlog_size|repl_backlog_histlen|master_repl_offset'
redis-cli -h <master> INFO stats | grep instantaneous_ops_per_sec

If repl_backlog_histlen is tiny relative to how much the master writes during a disconnect, partial resync cannot succeed.

Step 4: Check buffer limits and timeouts

redis-cli -h <master> CONFIG GET client-output-buffer-limit
redis-cli -h <master> CONFIG GET repl-timeout
redis-cli -h <master> CONFIG GET repl-diskless-sync

Example Root Cause Analysis

A replica of a write-heavy master never stabilized. sync_full climbed by one every ~90 seconds:

redis-cli -h 10.0.0.5 INFO stats | grep -E 'sync_full|sync_partial_err'

sync_full:47
sync_partial_err:46

The master log showed the transfer being cut off, not an ID mismatch:

Client id=... flags=S ... scheduled to be closed ASAP for overcoming of output buffer limits.
Connection with replica 10.0.0.9:6379 lost.
Starting BGSAVE for SYNC ...

The replica’s slave output buffer hard limit (64mb) was too small for the RDB + backlog of writes accumulating during the multi-GB transfer, so the master killed the replica mid-sync every cycle — which then restarted as a full resync.

Fix: raise the slave output buffer limits and grow the backlog so brief blips heal partially:

redis-cli -h 10.0.0.5 CONFIG SET client-output-buffer-limit "slave 512mb 128mb 60"
redis-cli -h 10.0.0.5 CONFIG SET repl-backlog-size 64mb   # + persist to redis.conf

After the change, the next sync completed, master_link_status:up held, and sync_full stopped incrementing — subsequent blips resolved via sync_partial_ok.

Prevention Best Practices

Size repl-backlog-size for peak write rate × expected disconnect window (tens of MB for busy masters, not the 1 MB default).
Raise client-output-buffer-limit slave so a large RDB transfer plus concurrent writes never trips the limit mid-sync.
Set a generous repl-timeout for high-latency or high-throughput links so transfers are not aborted prematurely.
Consider repl-diskless-sync yes when disk I/O on the master is the bottleneck for producing the transfer RDB.
Use Sentinel/Cluster and a stable topology; frequent master restarts change the replication ID and force full resyncs.
Alert on sync_full rate and sync_partial_err; a rising sync_full is the early warning. See more Redis error guides.

Quick Command Reference

# Confirm the loop
redis-cli -h <master> INFO stats | grep -E 'sync_full|sync_partial_ok|sync_partial_err'

# Why partial resync is refused
journalctl -u redis | grep -iE 'resync|Replication ID|output buffer|BGSAVE|Loading DB' | tail -30

# Backlog sizing vs write rate
redis-cli -h <master> INFO replication | grep -E 'repl_backlog_size|repl_backlog_histlen|master_repl_offset'
redis-cli -h <master> INFO stats | grep instantaneous_ops_per_sec

# Limits & timeouts
redis-cli -h <master> CONFIG GET client-output-buffer-limit
redis-cli -h <master> CONFIG GET repl-timeout

# Remediate
redis-cli -h <master> CONFIG SET repl-backlog-size 64mb
redis-cli -h <master> CONFIG SET client-output-buffer-limit "slave 512mb 128mb 60"

Conclusion

A replica looping on full resync means partial resync keeps being refused, so the master re-ships the whole dataset over and over — hammering fork, CPU, and network without ever stabilizing. The sync_full counter climbing is the signature. Root causes:

repl-backlog-size too small to cover the disconnect window.
client-output-buffer-limit slave too small, killing the transfer mid-sync.
The master’s replication ID changing after a restart/failover.
Network instability or a short repl-timeout aborting transfers.

Read both logs to see why partial resync was rejected, then size the backlog and slave output buffers for your write rate and let the transfer complete. Once partial resync succeeds, brief blips heal via sync_partial_ok and the loop ends.

Redis Error Guide: Replica Stuck in Repeated Full Resync — Partial Resync Failing

Overview

Symptoms

Common Root Causes

1. `repl-backlog-size` too small for the disconnect window

2. Replica output buffer limit killing the transfer mid-sync

3. Master replication ID changed (restart / failover)

4. Network instability / `repl-timeout` too low

Diagnostic Workflow

Step 1: Confirm the full-resync loop

Step 2: Read both logs for the refusal reason

Step 3: Check backlog sizing vs. write rate

Step 4: Check buffer limits and timeouts

Example Root Cause Analysis

Prevention Best Practices

Quick Command Reference

Conclusion

Download the Free 500-Prompt DevOps AI Toolkit

Overview

Symptoms

Common Root Causes

1. repl-backlog-size too small for the disconnect window

2. Replica output buffer limit killing the transfer mid-sync

3. Master replication ID changed (restart / failover)

4. Network instability / repl-timeout too low

Diagnostic Workflow

Step 1: Confirm the full-resync loop

Step 2: Read both logs for the refusal reason

Step 3: Check backlog sizing vs. write rate

Step 4: Check buffer limits and timeouts

Example Root Cause Analysis

Prevention Best Practices

Quick Command Reference

Conclusion

Download the Free 500-Prompt DevOps AI Toolkit

1. `repl-backlog-size` too small for the disconnect window

4. Network instability / `repl-timeout` too low