Redis SLOWLOG and Latency Analysis Prompt
Diagnose Redis latency using SLOWLOG, LATENCY monitoring, and latency-monitor-threshold to find slow commands and blocking sources.
- Target user
- SREs troubleshooting Redis performance
- Difficulty
- Intermediate
- Tools
- Claude, ChatGPT
The prompt
You are a senior SRE who has root-caused Redis latency incidents using its built-in observability. I will provide: - The symptom (p99 spikes, timeouts, periodic stalls) - My workload and command mix - Any SLOWLOG / LATENCY / INFO output I have Your job: 1. **Read the SLOWLOG**: `SLOWLOG GET` shows commands exceeding `slowlog-log-slower-than` (microseconds). Note this measures execution time only, NOT network or queue time. Look for O(N) commands (KEYS, big HGETALL/SMEMBERS/LRANGE 0 -1, large SORT, ZUNIONSTORE) on big collections. 2. **Enable latency monitoring**: set `latency-monitor-threshold <ms>` (0 = off), then `LATENCY LATEST`, `LATENCY HISTORY <event>`, and `LATENCY RESET`. Events include command, fork, expire-cycle, aof-write, etc. 3. **Use LATENCY DOCTOR / LATENCY GRAPH** for a human-readable analysis and spike visualization. 4. **Check intrinsic latency**: `redis-cli --intrinsic-latency <sec>` measures the runtime/OS floor independent of workload (detects noisy-neighbor/CPU issues). 5. **Correlate with INFO**: blocked_clients, instantaneous_ops_per_sec, mem_fragmentation_ratio, latest_fork_usec, rdb/aof rewrite in progress, evicted_keys. 6. **Identify common causes**: O(N) commands, big keys (redis-cli --bigkeys / --memkeys / MEMORY USAGE), fork pauses from RDB/AOF rewrite, swap, THP, network, single-threaded blocking by Lua/long commands. 7. **Remediate**: replace KEYS with SCAN, avoid returning whole large collections, split big keys, tune save/AOF, disable Transparent Huge Pages, pin CPU. 8. **Set thresholds and alerts** so regressions surface early. Mark DESTRUCTIVE: SLOWLOG RESET (loses evidence mid-incident), CONFIG changes during an incident without a record, DEBUG SLEEP (blocks the server), FLUSHALL. --- Symptom: [DESCRIBE] Workload/command mix: [DESCRIBE] SLOWLOG/LATENCY/INFO output: [PASTE]
Why this prompt works
Redis latency is single-threaded and unforgiving — one O(N) command or a fork pause stalls everyone. This prompt drives a disciplined diagnosis using SLOWLOG, the LATENCY subsystem, intrinsic-latency measurement, and INFO correlation, while warning that SLOWLOG only captures execution time so you do not chase the wrong layer.
How to use it
- Describe the symptom precisely (p99 spikes? periodic stalls? timeouts?).
- Give the command mix so O(N) offenders are considered.
- Paste SLOWLOG/LATENCY/INFO output for concrete analysis.
- Export evidence before resetting any counters.
Useful commands
# Slow command log (threshold in microseconds)
redis-cli CONFIG SET slowlog-log-slower-than 10000 # log > 10ms
redis-cli SLOWLOG GET 25
redis-cli SLOWLOG LEN
# redis-cli SLOWLOG RESET # only AFTER exporting
# Latency monitoring subsystem (threshold in ms)
redis-cli CONFIG SET latency-monitor-threshold 100
redis-cli LATENCY LATEST
redis-cli LATENCY HISTORY command
redis-cli LATENCY DOCTOR
redis-cli LATENCY RESET
# Runtime/OS latency floor, independent of workload
redis-cli --intrinsic-latency 5
# Find big keys and hot spots
redis-cli --bigkeys
redis-cli INFO stats | grep -E 'ops_per_sec|evicted|blocked'
redis-cli INFO persistence | grep -E 'fork|rewrite'
Example config
# redis.conf: observability + latency-friendly defaults
slowlog-log-slower-than 10000 # microseconds (10ms)
slowlog-max-len 256 # entries kept
latency-monitor-threshold 100 # ms; 0 disables
# Fork-pause mitigation (OS level, not redis.conf):
# echo never > /sys/kernel/mm/transparent_hugepage/enabled
# vm.overcommit_memory = 1 (sysctl) to avoid fork failures
#
# Avoid O(N) blocking: replace KEYS with SCAN in all clients
# redis-cli --scan --pattern 'user:*'
Common findings this catches
- O(N) commands → KEYS / full-collection reads block the loop.
- Big keys → single huge hash/list/zset stalls operations.
- Fork pauses → RDB/AOF rewrite spikes latest_fork_usec.
- THP enabled → fork-time latency amplified.
- Swapping → memory paged to disk, catastrophic latency.
- Blocked clients → long Lua or command starving others.
- Wrong layer → SLOWLOG clean but network/queue is the cause.
When to escalate
- Recurring fork pauses on large datasets — persistence and capacity review.
- Host-level noisy neighbor (high intrinsic latency) — infra/placement team.
- Sustained overload — scale out (Cluster) or shed load with SRE.
Related prompts
-
Redis Connection Pool Tuning Prompt
Tune Redis client connection pools: pool sizing, timeouts, maxclients, TCP keepalive, and avoiding connection exhaustion and leaks.
-
Redis Memory Optimization Prompt
Analyze Redis memory usage — encodings, big keys, fragmentation — and reduce footprint with listpack/intset thresholds and smarter modeling.
-
Redis Pipelining and Batching Optimization Prompt
Optimize Redis throughput with pipelining and batching — cut round-trip latency, size batches safely, and avoid blocking the event loop.