Skip to content
DevOps AI ToolKit
Newsletter
All guides
AI for Redis By James Joyner IV · · 9 min read

Redis Error Guide: 'CLUSTERDOWN Hash slot not served' — Restore Slot Coverage

Fix CLUSTERDOWN Hash slot not served in Redis Cluster: diagnose unassigned slots, failed masters with no replica, cluster-require-full-coverage, and broken cluster state.

  • #redis
  • #troubleshooting
  • #errors
  • #cluster

Overview

CLUSTERDOWN Hash slot not served means a Redis Cluster is in a broken state: at least one of the 16384 hash slots has no master serving it, so the cluster refuses commands for those slots. By default cluster-require-full-coverage yes, which means if any slot is unserved, the whole cluster stops serving to avoid handing out inconsistent data.

The literal error clients receive:

(error) CLUSTERDOWN Hash slot not served

Sometimes you will also see the broader variant:

(error) CLUSTERDOWN The cluster is down

This is a genuine availability incident, not a client-side or routing issue like MOVED. It means a master went down with no replica to take over, slots were never assigned, or the cluster lost quorum. The diagnosis is about finding which slots are uncovered and why the owning master is gone.

Symptoms

  • Commands fail with CLUSTERDOWN Hash slot not served (or “The cluster is down”).
  • CLUSTER INFO shows cluster_state:fail.
  • cluster_slots_assigned is less than 16384, or a master shows fail.
redis-cli -c GET user:1
(error) CLUSTERDOWN Hash slot not served
redis-cli CLUSTER INFO | grep -E 'cluster_state|cluster_slots_assigned|cluster_slots_ok'
cluster_state:fail
cluster_slots_assigned:16384
cluster_slots_ok:10923

Common Root Causes

1. A master failed with no replica to promote

The node owning a slot range died and had no healthy replica, so those slots are unserved.

redis-cli CLUSTER NODES | grep -E 'master|fail'
9a1... 10.0.0.32:6379 master,fail - 0 ... disconnected

master,fail on a node whose slots have no replacement causes CLUSTERDOWN.

2. Slots never assigned (incomplete cluster setup)

A newly created or partially resharded cluster left some slots unassigned.

redis-cli CLUSTER INFO | grep cluster_slots_assigned
redis-cli --cluster check <NODE>:6379
cluster_slots_assigned:16000
[ERR] Not all 16384 slots are covered by nodes.

3. Lost quorum / too many masters down

More masters are down than the surviving majority, so failover cannot complete.

redis-cli CLUSTER NODES | grep -c 'master,fail'
redis-cli CLUSTER INFO | grep -E 'cluster_size|cluster_known_nodes'

4. Network partition splitting the cluster

A partition isolates masters; the minority side sees slots as unserved.

redis-cli CLUSTER NODES | grep -E 'fail\?|fail|handshake'

Diagnostic Workflow

Step 1: Confirm cluster state and coverage

redis-cli CLUSTER INFO | grep -E 'cluster_state|cluster_slots_assigned|cluster_slots_ok|cluster_slots_fail'

cluster_state:fail with cluster_slots_ok < 16384 confirms uncovered slots.

Step 2: Identify the failed/missing nodes

redis-cli CLUSTER NODES | awk '{print $2, $3, $9}'
redis-cli CLUSTER NODES | grep -E 'fail|handshake|noaddr'

Find masters marked fail and note which slot ranges ($9) they owned.

Step 3: Run the cluster check for a coverage report

redis-cli --cluster check <NODE>:6379
[ERR] Not all 16384 slots are covered by nodes.

Step 4: Look at the logs for the failure event

sudo journalctl -u redis-server --no-pager | grep -iE 'FAIL|cluster|slot|down' | tail -20
* Marking node 9a1... as failing (quorum reached).
# Cluster state changed: fail

Step 5: Check per-node reachability

for h in 10.0.0.30 10.0.0.31 10.0.0.32; do redis-cli -h $h PING; done
redis-cli -h 10.0.0.32 -p 16379 PING   # cluster bus port

Example Root Cause Analysis

At 03:22 the cluster starts returning CLUSTERDOWN Hash slot not served for a subset of keys. CLUSTER INFO shows the cluster is in fail state with partial coverage:

redis-cli CLUSTER INFO | grep -E 'cluster_state|cluster_slots_ok'
cluster_state:fail
cluster_slots_ok:10923

CLUSTER NODES reveals a master down with no replica behind it:

9a1... 10.0.0.32:6379 master,fail - ... disconnected
# (no replica line references 9a1...)

The host 10.0.0.32 was terminated during an autoscaling event, and this master was deployed without a replica, so its ~5461 slots have no one to promote. The root cause is a topology gap (a master with no replica), exposed by an instance termination.

Recovery is to bring a node back for that slot range. Fastest path is restarting/replacing the node so it rejoins with its data; if the data is lost, assign the slots to a new node:

# If the node's data survived (e.g. persistent volume), restart it
sudo systemctl start redis-server            # on the recovered host
redis-cli CLUSTER INFO | grep cluster_state  # -> ok once slots are served

# If the node is gone for good, add a replacement and reassign the empty slots
redis-cli --cluster add-node 10.0.0.35:6379 10.0.0.30:6379
redis-cli --cluster fix 10.0.0.30:6379       # reassign/cover missing slots

Once every slot is served, cluster_state returns to ok and the errors stop. The permanent fix is ensuring every master has at least one replica, and using cluster-require-full-coverage deliberately so a single node loss triggers automatic failover instead of a full outage.

Prevention Best Practices

  • Give every master at least one replica so a node loss triggers automatic failover instead of unserved slots.
  • Spread masters and their replicas across failure domains (AZs/racks) so one failure cannot take a master and its replica together.
  • After any create/reshard, verify full coverage with redis-cli --cluster check — all 16384 slots must be assigned.
  • Decide on cluster-require-full-coverage deliberately: yes (default) fails safe by refusing partial service; no keeps serving the covered slots (accepting partial availability).
  • Monitor cluster_state, cluster_slots_ok, and master,fail nodes; page immediately on cluster_state:fail.
  • Protect the cluster bus port (16379 / data port + 10000) in firewalls so gossip and failover work.
  • Keep redis-cli --cluster fix/rebalance in your runbook for recovering coverage.
  • Drop CLUSTER NODES output into the free incident assistant, and see more Redis guides.

Quick Command Reference

# State and coverage
redis-cli CLUSTER INFO | grep -E 'cluster_state|cluster_slots_assigned|cluster_slots_ok|cluster_slots_fail'

# Failed / missing nodes and their slot ranges
redis-cli CLUSTER NODES | awk '{print $2,$3,$9}'
redis-cli CLUSTER NODES | grep -E 'fail|noaddr|handshake'

# Coverage check and repair
redis-cli --cluster check <NODE>:6379
redis-cli --cluster fix <NODE>:6379
redis-cli --cluster add-node <NEW>:6379 <EXISTING>:6379

# Failure events in the log
sudo journalctl -u redis-server | grep -iE 'FAIL|cluster|slot' | tail

Conclusion

CLUSTERDOWN Hash slot not served is a real availability incident: at least one hash slot has no master, and with cluster-require-full-coverage yes the whole cluster stops serving. The typical root causes are:

  1. A master failed with no replica to promote.
  2. Slots were never assigned (incomplete setup or reshard).
  3. Lost quorum with too many masters down.
  4. A network partition isolating masters.

Confirm with CLUSTER INFO (cluster_state:fail, cluster_slots_ok < 16384), find the failed nodes and uncovered slot ranges with CLUSTER NODES, and restore coverage by recovering the node or reassigning slots with --cluster fix. The durable prevention is simple: every master needs a replica, spread across failure domains.

Free download · 368-page PDF

Download the Free 500-Prompt DevOps AI Toolkit

500 battle-tested, copy-paste AI prompts engineered by a senior systems engineer — every one with fill-in placeholders and safety/back-out notes. Drop your email and it's yours.

  • 500 prompts: Linux · Kubernetes · Terraform · OpenStack · GitLab · Docker · Monitoring · Incident Response
  • Instant PDF download — yours free, forever
  • Plus one practical AI-workflow email a week (no spam)

Single opt-in · unsubscribe anytime · no spam.