Skip to content
DevOps AI ToolKit
Newsletter
All prompts
AI for Redis Difficulty: Advanced ClaudeChatGPT

Redis Persistence RDB/AOF Config Prompt

Configure Redis durability — RDB snapshots vs AOF, appendfsync policy, and hybrid persistence — balancing data safety against latency.

Target user
SREs configuring Redis durability
Difficulty
Advanced
Tools
Claude, ChatGPT

The prompt

You are a senior SRE and Redis expert who configures persistence for the right durability/performance tradeoff.

I will provide:
- The data's role (rebuildable cache vs source-of-truth)
- Acceptable data-loss window (RPO)
- Current `CONFIG GET save`, `appendonly`, `appendfsync`

Your job:

1. **Decide if persistence is needed at all**: a pure cache can run with `save ""` and `appendonly no` — faster, simpler, no fork stalls.
2. **RDB snapshots**:
   - Point-in-time dump via `save <sec> <changes>` rules (e.g. `save 900 1`), or manual `BGSAVE`.
   - Compact, fast restart, good for backups; but you lose everything since the last snapshot on crash.
   - `BGSAVE` forks — watch COW memory and latency spikes on big datasets.
3. **AOF (append-only file)**:
   - Logs every write; replay on restart. Set `appendonly yes`.
   - `appendfsync`: `always` (safest, slowest — fsync per write), `everysec` (default, ~1s max loss, good balance), `no` (OS decides, fastest, riskiest).
   - AOF rewrite (`BGREWRITEAOF`) compacts the log; `auto-aof-rewrite-percentage`/`-min-size` control it.
4. **Hybrid**: `aof-use-rdb-preamble yes` writes an RDB preamble in the AOF for faster loads — enable both for durability + fast restart.
5. **Corruption handling**: `aof-load-truncated yes`; `redis-check-aof --fix` / `redis-check-rdb` to repair.
6. **Replica offload**: run `BGSAVE`/backups on a replica to spare the primary's fork latency.
7. **Verify**: `INFO persistence` → `rdb_last_bgsave_status`, `aof_last_bgrewrite_status`, `aof_last_write_status`, `rdb_changes_since_last_save`.

Mark DESTRUCTIVE: `CONFIG SET appendonly no` (drops the AOF and its durability), `DEBUG RELOAD`/`DEBUG LOADAOF`, `FLUSHALL` before a snapshot, and `KEYS *`/`DEBUG` on prod. Deleting `dump.rdb`/`appendonly.aof` files loses data.

---

Data role: [DESCRIBE]
RPO (acceptable loss): [DESCRIBE]
Current persistence config: [PASTE]

Why this prompt works

Persistence is a durability-vs-latency dial, and the wrong setting either loses data on crash or stalls the event loop with fsyncs and forks. This prompt forces you to name the data’s role and RPO first, then maps that to the concrete save rules, appendonly, and appendfsync values — and reminds you to offload snapshots to a replica so the primary never pays the fork cost.

How to use it

  1. Classify the data — rebuildable cache or source of truth.
  2. State the RPO — how many seconds of writes you can lose.
  3. Paste the current persistence config and INFO persistence.
  4. Note dataset size — big datasets make fork latency the dominant concern.

Useful commands

# Current persistence settings and health
redis-cli CONFIG GET save
redis-cli CONFIG GET appendonly
redis-cli CONFIG GET appendfsync
redis-cli INFO persistence | grep -E 'rdb_last_bgsave_status|aof_last_bgrewrite_status|aof_last_write_status|rdb_changes_since_last_save'

# Trigger snapshot / rewrite (prefer a replica)
redis-cli BGSAVE
redis-cli BGREWRITEAOF
redis-cli LASTSAVE

# Enable AOF at runtime (also persist to redis.conf)
redis-cli CONFIG SET appendonly yes
redis-cli CONFIG SET appendfsync everysec

# Repair tooling (offline)
redis-check-aof --fix appendonly.aof
redis-check-rdb dump.rdb

Example config

# redis.conf — durable data store with hybrid persistence
save 900 1
save 300 10
save 60 10000

appendonly yes
appendfsync everysec
aof-use-rdb-preamble yes
aof-load-truncated yes
auto-aof-rewrite-percentage 100
auto-aof-rewrite-min-size 64mb
no-appendfsync-on-rewrite no
dir /var/lib/redis

Common findings this catches

  • Cache running full RDB+AOF → needless fork stalls; disable persistence.
  • RDB-only on a source-of-truth → unacceptable crash loss window.
  • appendfsync always on slow disk → throughput collapse.
  • AOF write errors on full disk → writes rejected; monitor aof_last_write_status.
  • Snapshots on the primary → periodic latency spikes; move to a replica.
  • No AOF rewrite thresholds → AOF grows unbounded.

When to escalate

  • Strict zero-loss requirements — needs replication + WAIT/quorum design.
  • Fork latency unacceptable even on a replica — capacity or architecture review.
  • Recurring AOF/RDB corruption — investigate disk/hardware.

Related prompts

Newsletter

Free: the DevOps AI Incident-Triage Cheat Sheet

Subscribe and we’ll send you the one-page cheat sheet — plus weekly AI prompts, automation ideas, and tool reviews for infrastructure engineers. One email a week. No spam, unsubscribe anytime.

  • AI Incident-Triage Cheat Sheet (PDF)
  • Access to 2,104 DevOps AI prompts
  • One practical workflow email per week