Skip to content
DevOps AI ToolKit
Newsletter
All prompts
AI for Redis Difficulty: Intermediate ClaudeChatGPT

Redis SLOWLOG and Latency Analysis Prompt

Diagnose Redis latency using SLOWLOG, LATENCY monitoring, and latency-monitor-threshold to find slow commands and blocking sources.

Target user
SREs troubleshooting Redis performance
Difficulty
Intermediate
Tools
Claude, ChatGPT

The prompt

You are a senior SRE who has root-caused Redis latency incidents using its built-in observability.

I will provide:
- The symptom (p99 spikes, timeouts, periodic stalls)
- My workload and command mix
- Any SLOWLOG / LATENCY / INFO output I have

Your job:

1. **Read the SLOWLOG**: `SLOWLOG GET` shows commands exceeding `slowlog-log-slower-than` (microseconds). Note this measures execution time only, NOT network or queue time. Look for O(N) commands (KEYS, big HGETALL/SMEMBERS/LRANGE 0 -1, large SORT, ZUNIONSTORE) on big collections.
2. **Enable latency monitoring**: set `latency-monitor-threshold <ms>` (0 = off), then `LATENCY LATEST`, `LATENCY HISTORY <event>`, and `LATENCY RESET`. Events include command, fork, expire-cycle, aof-write, etc.
3. **Use LATENCY DOCTOR / LATENCY GRAPH** for a human-readable analysis and spike visualization.
4. **Check intrinsic latency**: `redis-cli --intrinsic-latency <sec>` measures the runtime/OS floor independent of workload (detects noisy-neighbor/CPU issues).
5. **Correlate with INFO**: blocked_clients, instantaneous_ops_per_sec, mem_fragmentation_ratio, latest_fork_usec, rdb/aof rewrite in progress, evicted_keys.
6. **Identify common causes**: O(N) commands, big keys (redis-cli --bigkeys / --memkeys / MEMORY USAGE), fork pauses from RDB/AOF rewrite, swap, THP, network, single-threaded blocking by Lua/long commands.
7. **Remediate**: replace KEYS with SCAN, avoid returning whole large collections, split big keys, tune save/AOF, disable Transparent Huge Pages, pin CPU.
8. **Set thresholds and alerts** so regressions surface early.

Mark DESTRUCTIVE: SLOWLOG RESET (loses evidence mid-incident), CONFIG changes during an incident without a record, DEBUG SLEEP (blocks the server), FLUSHALL.

---

Symptom: [DESCRIBE]
Workload/command mix: [DESCRIBE]
SLOWLOG/LATENCY/INFO output: [PASTE]

Why this prompt works

Redis latency is single-threaded and unforgiving — one O(N) command or a fork pause stalls everyone. This prompt drives a disciplined diagnosis using SLOWLOG, the LATENCY subsystem, intrinsic-latency measurement, and INFO correlation, while warning that SLOWLOG only captures execution time so you do not chase the wrong layer.

How to use it

  1. Describe the symptom precisely (p99 spikes? periodic stalls? timeouts?).
  2. Give the command mix so O(N) offenders are considered.
  3. Paste SLOWLOG/LATENCY/INFO output for concrete analysis.
  4. Export evidence before resetting any counters.

Useful commands

# Slow command log (threshold in microseconds)
redis-cli CONFIG SET slowlog-log-slower-than 10000     # log > 10ms
redis-cli SLOWLOG GET 25
redis-cli SLOWLOG LEN
# redis-cli SLOWLOG RESET   # only AFTER exporting

# Latency monitoring subsystem (threshold in ms)
redis-cli CONFIG SET latency-monitor-threshold 100
redis-cli LATENCY LATEST
redis-cli LATENCY HISTORY command
redis-cli LATENCY DOCTOR
redis-cli LATENCY RESET

# Runtime/OS latency floor, independent of workload
redis-cli --intrinsic-latency 5

# Find big keys and hot spots
redis-cli --bigkeys
redis-cli INFO stats | grep -E 'ops_per_sec|evicted|blocked'
redis-cli INFO persistence | grep -E 'fork|rewrite'

Example config

# redis.conf: observability + latency-friendly defaults
slowlog-log-slower-than 10000      # microseconds (10ms)
slowlog-max-len 256                # entries kept
latency-monitor-threshold 100      # ms; 0 disables

# Fork-pause mitigation (OS level, not redis.conf):
#   echo never > /sys/kernel/mm/transparent_hugepage/enabled
#   vm.overcommit_memory = 1   (sysctl) to avoid fork failures
#
# Avoid O(N) blocking: replace KEYS with SCAN in all clients
#   redis-cli --scan --pattern 'user:*'

Common findings this catches

  • O(N) commands → KEYS / full-collection reads block the loop.
  • Big keys → single huge hash/list/zset stalls operations.
  • Fork pauses → RDB/AOF rewrite spikes latest_fork_usec.
  • THP enabled → fork-time latency amplified.
  • Swapping → memory paged to disk, catastrophic latency.
  • Blocked clients → long Lua or command starving others.
  • Wrong layer → SLOWLOG clean but network/queue is the cause.

When to escalate

  • Recurring fork pauses on large datasets — persistence and capacity review.
  • Host-level noisy neighbor (high intrinsic latency) — infra/placement team.
  • Sustained overload — scale out (Cluster) or shed load with SRE.

Related prompts

Newsletter

Free: the DevOps AI Incident-Triage Cheat Sheet

Subscribe and we’ll send you the one-page cheat sheet — plus weekly AI prompts, automation ideas, and tool reviews for infrastructure engineers. One email a week. No spam, unsubscribe anytime.

  • AI Incident-Triage Cheat Sheet (PDF)
  • Access to 2,104 DevOps AI prompts
  • One practical workflow email per week