Redis Error Guide: 'CLUSTERDOWN Hash slot not served' — Restore Slot Coverage
Fix CLUSTERDOWN Hash slot not served in Redis Cluster: diagnose unassigned slots, failed masters with no replica, cluster-require-full-coverage, and broken cluster state.
- #redis
- #troubleshooting
- #errors
- #cluster
Overview
CLUSTERDOWN Hash slot not served means a Redis Cluster is in a broken state: at least one of the 16384 hash slots has no master serving it, so the cluster refuses commands for those slots. By default cluster-require-full-coverage yes, which means if any slot is unserved, the whole cluster stops serving to avoid handing out inconsistent data.
The literal error clients receive:
(error) CLUSTERDOWN Hash slot not served
Sometimes you will also see the broader variant:
(error) CLUSTERDOWN The cluster is down
This is a genuine availability incident, not a client-side or routing issue like MOVED. It means a master went down with no replica to take over, slots were never assigned, or the cluster lost quorum. The diagnosis is about finding which slots are uncovered and why the owning master is gone.
Symptoms
- Commands fail with
CLUSTERDOWN Hash slot not served(or “The cluster is down”). CLUSTER INFOshowscluster_state:fail.cluster_slots_assignedis less than 16384, or a master showsfail.
redis-cli -c GET user:1
(error) CLUSTERDOWN Hash slot not served
redis-cli CLUSTER INFO | grep -E 'cluster_state|cluster_slots_assigned|cluster_slots_ok'
cluster_state:fail
cluster_slots_assigned:16384
cluster_slots_ok:10923
Common Root Causes
1. A master failed with no replica to promote
The node owning a slot range died and had no healthy replica, so those slots are unserved.
redis-cli CLUSTER NODES | grep -E 'master|fail'
9a1... 10.0.0.32:6379 master,fail - 0 ... disconnected
master,fail on a node whose slots have no replacement causes CLUSTERDOWN.
2. Slots never assigned (incomplete cluster setup)
A newly created or partially resharded cluster left some slots unassigned.
redis-cli CLUSTER INFO | grep cluster_slots_assigned
redis-cli --cluster check <NODE>:6379
cluster_slots_assigned:16000
[ERR] Not all 16384 slots are covered by nodes.
3. Lost quorum / too many masters down
More masters are down than the surviving majority, so failover cannot complete.
redis-cli CLUSTER NODES | grep -c 'master,fail'
redis-cli CLUSTER INFO | grep -E 'cluster_size|cluster_known_nodes'
4. Network partition splitting the cluster
A partition isolates masters; the minority side sees slots as unserved.
redis-cli CLUSTER NODES | grep -E 'fail\?|fail|handshake'
Diagnostic Workflow
Step 1: Confirm cluster state and coverage
redis-cli CLUSTER INFO | grep -E 'cluster_state|cluster_slots_assigned|cluster_slots_ok|cluster_slots_fail'
cluster_state:fail with cluster_slots_ok < 16384 confirms uncovered slots.
Step 2: Identify the failed/missing nodes
redis-cli CLUSTER NODES | awk '{print $2, $3, $9}'
redis-cli CLUSTER NODES | grep -E 'fail|handshake|noaddr'
Find masters marked fail and note which slot ranges ($9) they owned.
Step 3: Run the cluster check for a coverage report
redis-cli --cluster check <NODE>:6379
[ERR] Not all 16384 slots are covered by nodes.
Step 4: Look at the logs for the failure event
sudo journalctl -u redis-server --no-pager | grep -iE 'FAIL|cluster|slot|down' | tail -20
* Marking node 9a1... as failing (quorum reached).
# Cluster state changed: fail
Step 5: Check per-node reachability
for h in 10.0.0.30 10.0.0.31 10.0.0.32; do redis-cli -h $h PING; done
redis-cli -h 10.0.0.32 -p 16379 PING # cluster bus port
Example Root Cause Analysis
At 03:22 the cluster starts returning CLUSTERDOWN Hash slot not served for a subset of keys. CLUSTER INFO shows the cluster is in fail state with partial coverage:
redis-cli CLUSTER INFO | grep -E 'cluster_state|cluster_slots_ok'
cluster_state:fail
cluster_slots_ok:10923
CLUSTER NODES reveals a master down with no replica behind it:
9a1... 10.0.0.32:6379 master,fail - ... disconnected
# (no replica line references 9a1...)
The host 10.0.0.32 was terminated during an autoscaling event, and this master was deployed without a replica, so its ~5461 slots have no one to promote. The root cause is a topology gap (a master with no replica), exposed by an instance termination.
Recovery is to bring a node back for that slot range. Fastest path is restarting/replacing the node so it rejoins with its data; if the data is lost, assign the slots to a new node:
# If the node's data survived (e.g. persistent volume), restart it
sudo systemctl start redis-server # on the recovered host
redis-cli CLUSTER INFO | grep cluster_state # -> ok once slots are served
# If the node is gone for good, add a replacement and reassign the empty slots
redis-cli --cluster add-node 10.0.0.35:6379 10.0.0.30:6379
redis-cli --cluster fix 10.0.0.30:6379 # reassign/cover missing slots
Once every slot is served, cluster_state returns to ok and the errors stop. The permanent fix is ensuring every master has at least one replica, and using cluster-require-full-coverage deliberately so a single node loss triggers automatic failover instead of a full outage.
Prevention Best Practices
- Give every master at least one replica so a node loss triggers automatic failover instead of unserved slots.
- Spread masters and their replicas across failure domains (AZs/racks) so one failure cannot take a master and its replica together.
- After any create/reshard, verify full coverage with
redis-cli --cluster check— all 16384 slots must be assigned. - Decide on
cluster-require-full-coveragedeliberately:yes(default) fails safe by refusing partial service;nokeeps serving the covered slots (accepting partial availability). - Monitor
cluster_state,cluster_slots_ok, andmaster,failnodes; page immediately oncluster_state:fail. - Protect the cluster bus port (16379 / data port + 10000) in firewalls so gossip and failover work.
- Keep
redis-cli --cluster fix/rebalancein your runbook for recovering coverage. - Drop
CLUSTER NODESoutput into the free incident assistant, and see more Redis guides.
Quick Command Reference
# State and coverage
redis-cli CLUSTER INFO | grep -E 'cluster_state|cluster_slots_assigned|cluster_slots_ok|cluster_slots_fail'
# Failed / missing nodes and their slot ranges
redis-cli CLUSTER NODES | awk '{print $2,$3,$9}'
redis-cli CLUSTER NODES | grep -E 'fail|noaddr|handshake'
# Coverage check and repair
redis-cli --cluster check <NODE>:6379
redis-cli --cluster fix <NODE>:6379
redis-cli --cluster add-node <NEW>:6379 <EXISTING>:6379
# Failure events in the log
sudo journalctl -u redis-server | grep -iE 'FAIL|cluster|slot' | tail
Conclusion
CLUSTERDOWN Hash slot not served is a real availability incident: at least one hash slot has no master, and with cluster-require-full-coverage yes the whole cluster stops serving. The typical root causes are:
- A master failed with no replica to promote.
- Slots were never assigned (incomplete setup or reshard).
- Lost quorum with too many masters down.
- A network partition isolating masters.
Confirm with CLUSTER INFO (cluster_state:fail, cluster_slots_ok < 16384), find the failed nodes and uncovered slot ranges with CLUSTER NODES, and restore coverage by recovering the node or reassigning slots with --cluster fix. The durable prevention is simple: every master needs a replica, spread across failure domains.
Download the Free 500-Prompt DevOps AI Toolkit
500 battle-tested, copy-paste AI prompts engineered by a senior systems engineer — every one with fill-in placeholders and safety/back-out notes. Drop your email and it's yours.
- 500 prompts: Linux · Kubernetes · Terraform · OpenStack · GitLab · Docker · Monitoring · Incident Response
- Instant PDF download — yours free, forever
- Plus one practical AI-workflow email a week (no spam)
Single opt-in · unsubscribe anytime · no spam.