Skip to content
DevOps AI ToolKit
Newsletter
All guides
AI for Kubernetes & Helm By James Joyner IV · · 9 min read

Kubernetes Error Guide: 'etcdserver: mvcc: database space exceeded' Read-Only API

Fix 'etcdserver: mvcc: database space exceeded' in Kubernetes: clear the NOSPACE alarm with compaction and defrag, and tune quota-backend-bytes to stop recurrence.

  • #kubernetes-helm
  • #troubleshooting
  • #errors
  • #etcd

Exact Error Message

When etcd’s backend database reaches its quota, etcd raises a NOSPACE alarm and rejects every write. The kube-apiserver passes the failure straight through to clients:

rpc error: code = Unknown desc = etcdserver: mvcc: database space exceeded

You will see it on any mutating request — create, update, delete:

$ kubectl apply -f deploy.yaml
Error from server: error when creating "deploy.yaml": etcdserver: mvcc: database space exceeded

And etcd itself logs the alarm that put the cluster into a read-only state:

W0628 09:14:02.118203 etcdserver: database space exceeded, retry after compaction
W0628 09:14:02.118277 alarm NOSPACE raised by peer 8e9e05c52164694d

The cluster is now read-only: reads succeed, all writes fail, until the alarm is cleared.

What the Error Means

etcd stores every key revision in a multi-version concurrency control (MVCC) backend (a bbolt file). Each update creates a new revision rather than overwriting in place, so the on-disk size grows continuously until old revisions are compacted away and the freed pages are defragmented back to the OS.

The backend has a hard size limit, --quota-backend-bytes (default 2 GiB, commonly raised to 8 GiB). When the database file crosses this quota, etcd raises the NOSPACE alarm and refuses writes to protect itself from running out of space mid-transaction and corrupting the store. Critically, the database stays large even after compaction until you defrag, because compaction frees internal pages but does not shrink the file. So the alarm persists until you both compact and defrag, then explicitly disarm the alarm.

This is why the cluster does not self-heal: clearing space requires three deliberate steps, and the alarm latches until you disarm it.

Common Causes

  • No automatic compaction--auto-compaction-retention is unset or zero, so revisions accumulate forever.
  • Quota too small for the workload — default 2 GiB on a busy cluster with many objects/events fills quickly.
  • High write churn — frequent updates to ConfigMaps, leases, events, or CRDs spawn revisions fast.
  • Large objects — oversized ConfigMaps/Secrets or chatty operators bloat the keyspace.
  • Compaction without defrag — DB was compacted but never defragmented, so the file never shrank below quota.
  • Event flooding — a crash-looping component emitting thousands of events per minute.

How to Reproduce the Error

On a test cluster, set a tiny quota and write churn until the alarm fires:

# Start etcd with a very small backend quota, then generate revisions
etcd --quota-backend-bytes=16777216 &   # 16 MiB

# Churn keys until the backend exceeds quota
for i in $(seq 1 100000); do
  etcdctl put /load/key "$(head -c 256 /dev/urandom | base64)" >/dev/null
done
Error: etcdserver: mvcc: database space exceeded
etcdctl alarm list
memberID:13803658152347827386 alarm:NOSPACE

The alarm remains even after you stop writing — it does not clear on its own.

Diagnostic Commands

# Is an alarm active? This confirms NOSPACE vs a different write failure
etcdctl alarm list

# Current DB size vs quota — dbSize and dbSizeInUse per member
etcdctl endpoint status --cluster -w table

# Find the current revision (input for a targeted compaction)
etcdctl endpoint status -w json | grep -o '"revision":[0-9]*'

# DB size and quota from etcd metrics
curl -s http://127.0.0.1:2381/metrics | grep -E 'etcd_mvcc_db_total_size_in_bytes|etcd_server_quota_backend_bytes'

# etcd logs around the alarm and any compaction activity
journalctl -u etcd --no-pager | grep -iE 'nospace|compact|defrag|quota'

Compare etcd_mvcc_db_total_size_in_bytes against etcd_server_quota_backend_bytes: when the former approaches the latter, the alarm is imminent. dbSize vs dbSizeInUse in endpoint status shows how much is reclaimable by defrag.

Step-by-Step Resolution

1. Confirm the alarm. Run etcdctl alarm list; a NOSPACE entry confirms the diagnosis (not a generic timeout).

2. Compact the keyspace. Discard old revisions up to the current revision:

rev=$(etcdctl endpoint status -w json | grep -o '"revision":[0-9]*' | head -1 | cut -d: -f2)
etcdctl compact "$rev"

This frees internal pages but does not shrink the file yet.

3. Defragment each member. This rewrites the bbolt file and returns space to the OS. Do it one member at a time to avoid a quorum hit, leaders last:

etcdctl defrag --cluster

Verify dbSize in endpoint status has dropped well below the quota.

4. Disarm the alarm. Even after defrag the cluster stays read-only until you clear the latched alarm:

etcdctl alarm disarm
etcdctl alarm list   # should now be empty

Writes resume immediately. Test with a harmless kubectl create configmap.

5. Enable auto-compaction so it does not recur. Set --auto-compaction-mode=periodic and --auto-compaction-retention=8h (or revision mode) on every member.

6. Raise the quota if genuinely needed. Increase --quota-backend-bytes (e.g. 8 GiB; etcd’s recommended max is 8 GiB). Do not exceed it without testing — very large backends slow recovery and increase GC pauses.

Prevention and Best Practices

  • Always set --auto-compaction-retention (e.g. 8h periodic). Compaction alone is the biggest preventive lever.
  • Schedule periodic defrag during low-traffic windows so the file size tracks dbSizeInUse.
  • Alert when etcd_mvcc_db_total_size_in_bytes exceeds ~80% of etcd_server_quota_backend_bytes — well before the alarm.
  • Keep the quota at a tested value (commonly 8 GiB); bigger is not better.
  • Reduce churn: fix crash-looping controllers, cap event retention, and avoid storing large blobs in etcd.
  • Treat NOSPACE as a runbook item — compact, defrag, disarm — and automate the alert-to-runbook path. See more in our Kubernetes & Helm guides.

Frequently Asked Questions

Why is my cluster read-only after this error? etcd deliberately switches to read-only on NOSPACE to avoid running out of disk mid-write and corrupting the database. Reads keep working; only mutations are blocked until you clear the alarm.

I compacted and defragged but writes still fail — why? The alarm is latched. Even after the file shrinks, you must run etcdctl alarm disarm to release it. Forgetting this step is the most common reason the cluster stays read-only.

Does compaction delete my Kubernetes objects? No. Compaction only discards historical revisions of keys, not the current values. Your live ConfigMaps, Pods, and Secrets are untouched; you only lose the ability to do a time-travel read to an old revision.

Should I just keep raising --quota-backend-bytes? No. A larger backend delays the problem but increases memory use, GC pauses, and recovery time. Fix the root cause with auto-compaction and regular defrag; raise the quota only to a tested ceiling around 8 GiB.

How often should I defrag? Whenever dbSize is significantly larger than dbSizeInUse — typically after large deletes or on a periodic schedule (daily/weekly) during quiet hours. Always one member at a time to preserve quorum.

Free download · 368-page PDF

Download the Free 500-Prompt DevOps AI Toolkit

500 battle-tested, copy-paste AI prompts engineered by a senior systems engineer — every one with fill-in placeholders and safety/back-out notes. Drop your email and it's yours.

  • 500 prompts: Linux · Kubernetes · Terraform · OpenStack · GitLab · Docker · Monitoring · Incident Response
  • Instant PDF download — yours free, forever
  • Plus one practical AI-workflow email a week (no spam)

Single opt-in · unsubscribe anytime · no spam.