Kubernetes Error Guide: 'etcdserver: mvcc: database space exceeded' Read-Only API
Fix 'etcdserver: mvcc: database space exceeded' in Kubernetes: clear the NOSPACE alarm with compaction and defrag, and tune quota-backend-bytes to stop recurrence.
- #kubernetes-helm
- #troubleshooting
- #errors
- #etcd
Exact Error Message
When etcd’s backend database reaches its quota, etcd raises a NOSPACE alarm and rejects every write. The kube-apiserver passes the failure straight through to clients:
rpc error: code = Unknown desc = etcdserver: mvcc: database space exceeded
You will see it on any mutating request — create, update, delete:
$ kubectl apply -f deploy.yaml
Error from server: error when creating "deploy.yaml": etcdserver: mvcc: database space exceeded
And etcd itself logs the alarm that put the cluster into a read-only state:
W0628 09:14:02.118203 etcdserver: database space exceeded, retry after compaction
W0628 09:14:02.118277 alarm NOSPACE raised by peer 8e9e05c52164694d
The cluster is now read-only: reads succeed, all writes fail, until the alarm is cleared.
What the Error Means
etcd stores every key revision in a multi-version concurrency control (MVCC) backend (a bbolt file). Each update creates a new revision rather than overwriting in place, so the on-disk size grows continuously until old revisions are compacted away and the freed pages are defragmented back to the OS.
The backend has a hard size limit, --quota-backend-bytes (default 2 GiB, commonly raised to 8 GiB). When the database file crosses this quota, etcd raises the NOSPACE alarm and refuses writes to protect itself from running out of space mid-transaction and corrupting the store. Critically, the database stays large even after compaction until you defrag, because compaction frees internal pages but does not shrink the file. So the alarm persists until you both compact and defrag, then explicitly disarm the alarm.
This is why the cluster does not self-heal: clearing space requires three deliberate steps, and the alarm latches until you disarm it.
Common Causes
- No automatic compaction —
--auto-compaction-retentionis unset or zero, so revisions accumulate forever. - Quota too small for the workload — default 2 GiB on a busy cluster with many objects/events fills quickly.
- High write churn — frequent updates to ConfigMaps, leases, events, or CRDs spawn revisions fast.
- Large objects — oversized ConfigMaps/Secrets or chatty operators bloat the keyspace.
- Compaction without defrag — DB was compacted but never defragmented, so the file never shrank below quota.
- Event flooding — a crash-looping component emitting thousands of events per minute.
How to Reproduce the Error
On a test cluster, set a tiny quota and write churn until the alarm fires:
# Start etcd with a very small backend quota, then generate revisions
etcd --quota-backend-bytes=16777216 & # 16 MiB
# Churn keys until the backend exceeds quota
for i in $(seq 1 100000); do
etcdctl put /load/key "$(head -c 256 /dev/urandom | base64)" >/dev/null
done
Error: etcdserver: mvcc: database space exceeded
etcdctl alarm list
memberID:13803658152347827386 alarm:NOSPACE
The alarm remains even after you stop writing — it does not clear on its own.
Diagnostic Commands
# Is an alarm active? This confirms NOSPACE vs a different write failure
etcdctl alarm list
# Current DB size vs quota — dbSize and dbSizeInUse per member
etcdctl endpoint status --cluster -w table
# Find the current revision (input for a targeted compaction)
etcdctl endpoint status -w json | grep -o '"revision":[0-9]*'
# DB size and quota from etcd metrics
curl -s http://127.0.0.1:2381/metrics | grep -E 'etcd_mvcc_db_total_size_in_bytes|etcd_server_quota_backend_bytes'
# etcd logs around the alarm and any compaction activity
journalctl -u etcd --no-pager | grep -iE 'nospace|compact|defrag|quota'
Compare etcd_mvcc_db_total_size_in_bytes against etcd_server_quota_backend_bytes: when the former approaches the latter, the alarm is imminent. dbSize vs dbSizeInUse in endpoint status shows how much is reclaimable by defrag.
Step-by-Step Resolution
1. Confirm the alarm. Run etcdctl alarm list; a NOSPACE entry confirms the diagnosis (not a generic timeout).
2. Compact the keyspace. Discard old revisions up to the current revision:
rev=$(etcdctl endpoint status -w json | grep -o '"revision":[0-9]*' | head -1 | cut -d: -f2)
etcdctl compact "$rev"
This frees internal pages but does not shrink the file yet.
3. Defragment each member. This rewrites the bbolt file and returns space to the OS. Do it one member at a time to avoid a quorum hit, leaders last:
etcdctl defrag --cluster
Verify dbSize in endpoint status has dropped well below the quota.
4. Disarm the alarm. Even after defrag the cluster stays read-only until you clear the latched alarm:
etcdctl alarm disarm
etcdctl alarm list # should now be empty
Writes resume immediately. Test with a harmless kubectl create configmap.
5. Enable auto-compaction so it does not recur. Set --auto-compaction-mode=periodic and --auto-compaction-retention=8h (or revision mode) on every member.
6. Raise the quota if genuinely needed. Increase --quota-backend-bytes (e.g. 8 GiB; etcd’s recommended max is 8 GiB). Do not exceed it without testing — very large backends slow recovery and increase GC pauses.
Prevention and Best Practices
- Always set
--auto-compaction-retention(e.g. 8h periodic). Compaction alone is the biggest preventive lever. - Schedule periodic
defragduring low-traffic windows so the file size tracksdbSizeInUse. - Alert when
etcd_mvcc_db_total_size_in_bytesexceeds ~80% ofetcd_server_quota_backend_bytes— well before the alarm. - Keep the quota at a tested value (commonly 8 GiB); bigger is not better.
- Reduce churn: fix crash-looping controllers, cap event retention, and avoid storing large blobs in etcd.
- Treat NOSPACE as a runbook item — compact, defrag, disarm — and automate the alert-to-runbook path. See more in our Kubernetes & Helm guides.
Related Errors
- etcd request timed out — slow backend that may accompany a bloated DB.
- etcdserver: leader changed — election instability worsened by GC pauses from a large DB.
- context deadline exceeded — apiserver storage calls failing under etcd pressure.
Frequently Asked Questions
Why is my cluster read-only after this error? etcd deliberately switches to read-only on NOSPACE to avoid running out of disk mid-write and corrupting the database. Reads keep working; only mutations are blocked until you clear the alarm.
I compacted and defragged but writes still fail — why? The alarm is latched. Even after the file shrinks, you must run etcdctl alarm disarm to release it. Forgetting this step is the most common reason the cluster stays read-only.
Does compaction delete my Kubernetes objects? No. Compaction only discards historical revisions of keys, not the current values. Your live ConfigMaps, Pods, and Secrets are untouched; you only lose the ability to do a time-travel read to an old revision.
Should I just keep raising --quota-backend-bytes? No. A larger backend delays the problem but increases memory use, GC pauses, and recovery time. Fix the root cause with auto-compaction and regular defrag; raise the quota only to a tested ceiling around 8 GiB.
How often should I defrag? Whenever dbSize is significantly larger than dbSizeInUse — typically after large deletes or on a periodic schedule (daily/weekly) during quiet hours. Always one member at a time to preserve quorum.
Download the Free 500-Prompt DevOps AI Toolkit
500 battle-tested, copy-paste AI prompts engineered by a senior systems engineer — every one with fill-in placeholders and safety/back-out notes. Drop your email and it's yours.
- 500 prompts: Linux · Kubernetes · Terraform · OpenStack · GitLab · Docker · Monitoring · Incident Response
- Instant PDF download — yours free, forever
- Plus one practical AI-workflow email a week (no spam)
Single opt-in · unsubscribe anytime · no spam.