You are a senior SRE and Redis expert who designs Sentinel-based high availability. I will provide: - The current primary/replica layout - Number and placement of Sentinels - How clients connect today Your job: 1. **Sentinel role**: Sentinels monitor primary+replicas, detect failure, elect a new primary, and reconfigure replicas — they do NOT proxy data. 2. **Quorum and count**: run an ODD number of Sentinels (>= 3) across separate failure domains. `sentinel monitor <name> <ip> <port> <quorum>` — quorum is the votes needed to agree the primary is down. A majority of Sentinels must also be reachable to authorize failover. 3. **Failure detection**: `down-after-milliseconds` sets how long unresponsiveness = subjectively down (SDOWN). Quorum SDOWNs = objectively down (ODOWN) → failover. 4. **Failover controls**: `failover-timeout` bounds retries; `parallel-syncs` limits how many replicas resync the new primary at once (avoid overwhelming it). 5. **Client discovery**: clients ask Sentinel `SENTINEL get-master-addr-by-name <name>` and subscribe to `+switch-master` pub/sub to learn the new primary. Use a Sentinel-aware client library — never hardcode the primary IP. 6. **Auth**: set `sentinel auth-pass`/`requirepass` and `sentinel auth-user` (ACL) consistently across nodes. 7. **Split-brain avoidance**: `min-replicas-to-write`/`min-replicas-max-lag` on the primary make it stop accepting writes if too few replicas are in sync. 8. **Validate**: `SENTINEL master <name>`, `SENTINEL replicas <name>`, `SENTINEL sentinels <name>`, and rehearse a failover in staging. Mark DESTRUCTIVE: `SENTINEL FAILOVER <name>` in prod without a plan (forces a switch), even quorum with 2 Sentinels (can't form majority → split-brain), `FLUSHALL` on the primary, and `KEYS *`/`DEBUG` on prod. --- Current layout: [DESCRIBE] Sentinel count/placement: [DESCRIBE] Client connection method: [DESCRIBE]

Why this prompt works

Sentinel HA fails in predictable ways: even Sentinel counts that can’t reach majority, aggressive timeouts that flap, and clients that hardcode the primary and never notice a failover. This prompt enforces an odd Sentinel count across failure domains, ties down-after/quorum/parallel-syncs to real behavior, and insists on Sentinel-aware client discovery — the three things that make automatic failover actually work.

How to use it

Describe failure domains — Sentinels must span racks/AZs to survive one failing.
State the Sentinel count — it must be odd and at least 3.
Explain how clients find the primary — this is where most outages hide.
Rehearse in staging using SENTINEL FAILOVER before trusting prod.

Useful commands

# Query Sentinel state (port 26379)
redis-cli -p 26379 SENTINEL master mymaster
redis-cli -p 26379 SENTINEL replicas mymaster
redis-cli -p 26379 SENTINEL sentinels mymaster

# Client discovery: current primary address
redis-cli -p 26379 SENTINEL get-master-addr-by-name mymaster

# Watch failover events
redis-cli -p 26379 PSUBSCRIBE '+switch-master' '+odown' '+sdown'

# Adjust monitoring at runtime
redis-cli -p 26379 SENTINEL set mymaster down-after-milliseconds 5000
redis-cli -p 26379 SENTINEL set mymaster parallel-syncs 1

# Rehearse a failover (staging only)
redis-cli -p 26379 SENTINEL FAILOVER mymaster

Example config

# sentinel.conf (run 3 of these across separate AZs)
port 26379
sentinel monitor mymaster 10.0.0.10 6379 2
sentinel down-after-milliseconds mymaster 5000
sentinel failover-timeout mymaster 60000
sentinel parallel-syncs mymaster 1
sentinel auth-pass mymaster <STRONG_PASSWORD>

# On the primary (redis.conf) — refuse writes if replicas fall behind
min-replicas-to-write 1
min-replicas-max-lag 10

Common findings this catches

Even Sentinel count → no majority on partition; failover stalls.
Sentinels co-located with primary → HA dies with that domain.
down-after too low → flapping failovers on brief blips.
Clients hardcode primary IP → never follow the failover.
parallel-syncs too high → new primary overwhelmed by resyncs.
No min-replicas-to-write → primary keeps accepting writes during split-brain.

When to escalate

Sharding needs beyond a single primary — evaluate Redis Cluster.
Cross-region failover — needs a broader DR design.
Zero-data-loss failover requirements — async replication is insufficient alone.

Redis Sentinel High Availability Design Prompt

Why this prompt works

How to use it

Useful commands

Example config

Common findings this catches

When to escalate

Related prompts

Redis Cluster Sharding Design Prompt

Redis Connection Pool Tuning Prompt

Redis Persistence RDB/AOF Config Prompt

Redis Replication Setup Review Prompt

Why this prompt works

How to use it

Useful commands

Example config

Common findings this catches

When to escalate

Related prompts

Redis Cluster Sharding Design Prompt

Redis Connection Pool Tuning Prompt

Redis Persistence RDB/AOF Config Prompt

Redis Replication Setup Review Prompt

Free: the DevOps AI Incident-Triage Cheat Sheet