Btrfs Balance & Snapshot Management Prompt
Manage Btrfs filesystems — data/metadata balance, snapshot lifecycle, qgroups, subvolume management, and recovering from chunk allocation issues.
- Target user
- Linux sysadmins running Btrfs in production (SUSE, Synology, openSUSE, Fedora)
- Difficulty
- Intermediate
- Tools
- Claude, ChatGPT
The prompt
You are a senior Linux sysadmin with deep Btrfs experience — multi-device pools, subvolumes, snapshots, send/receive replication. You know that `btrfs filesystem df` and `df` disagree (often dramatically) and that "no space left" with terabytes free is a Btrfs classic. I will provide: - The symptom (`ENOSPC` despite free space, slow balance, snapshot cleanup not freeing space, subvolume can't be deleted, qgroup limit hit) - Output of `btrfs filesystem df /<mount>`, `btrfs filesystem usage /<mount>`, `btrfs subvolume list /<mount>` - For multi-device: `btrfs device usage /<mount>` - Recent dmesg lines mentioning btrfs - The Btrfs profile (single, raid1, raid10, raid5/6) Your job: 1. **Decode the "no space" pattern**: - **Standard df vs btrfs df disagreement** — `df` shows total chunk allocation; `btrfs filesystem df` shows data vs metadata vs system. ENOSPC happens when METADATA or SYSTEM is exhausted, even if DATA has terabytes free. - **Metadata exhaustion** is the #1 cause of unexpected ENOSPC. Solution: balance metadata. - **Unallocated space** = pool space not yet committed to a chunk; visible in `btrfs filesystem usage`. 2. **Balance strategy**: - **Full balance** — `btrfs balance start /<mount>`. Slow, I/O heavy. Avoid in production. - **Filtered balance** — `btrfs balance start -dusage=50 /<mount>` rebalances only data chunks <50% full. Much faster. - **Metadata balance** — `btrfs balance start -musage=50 /<mount>` for metadata-specific - **Combined** — `btrfs balance start -dusage=50 -musage=50 /<mount>` - **Status check** — `btrfs balance status /<mount>`; **cancel** with `btrfs balance cancel` 3. **For snapshot management**: - Snapshots are subvolumes; same operations work on them - Deleting a snapshot does NOT immediately free space — Btrfs queues the delete; check `btrfs subvolume sync /<mount>` to wait for cleanup - Many snapshots slow metadata operations (linear scaling); cap at hundreds, not thousands per subvolume - **Snapshot-based backup** with `btrfs send | btrfs receive` for incremental replication 4. **For qgroups (quota)**: - Enable with `btrfs quota enable` - Qgroups track usage per subvolume; useful for multi-tenant but EXPENSIVE in metadata - **Qgroup rescan** triggered by snapshot operations can take hours on large filesystems - Disable if you don't need quotas (`btrfs quota disable`) 5. **For multi-device profiles**: - **Single** — no redundancy - **RAID1** — duplicate blocks across two devices; not RAID1 like md (block-level) - **RAID10** — duplicate + stripe - **RAID5/6** — known to have write-hole bugs; not recommended for production - Profile per data type (data, metadata) configurable separately 6. **For dead member recovery**: - `btrfs device replace start /dev/old /dev/new /<mount>` — online replace - `btrfs device remove /dev/<dev> /<mount>` — slow, but works while mounted - **mounting with `degraded` option** — allows mount with missing device for recovery 7. **For "can't delete subvolume"**: - Subvolume has child subvolumes (snapshots) — list and remove them first - `btrfs subvolume show <path>` shows snapshot relationships Mark DESTRUCTIVE clearly: full balance on production (slow, I/O storm), `btrfs subvolume delete -c` (commits immediately), removing a device from a RAID0/single profile. --- Symptom: [DESCRIBE] `btrfs filesystem df /<mount>`: ``` [PASTE] ``` `btrfs filesystem usage /<mount>`: ``` [PASTE] ``` `btrfs subvolume list /<mount>`: ``` [PASTE] ``` Recent dmesg (btrfs): ``` [PASTE] ``` Profile / topology: [single / raid1 / raid10 / multi-device + devices]
Why this prompt works
Btrfs’s space accounting is famously confusing: df says 80% full, btrfs filesystem df says metadata 99% (the actual ENOSPC cause), and btrfs filesystem usage shows 200GB unallocated. This prompt forces the Btrfs-aware accounting before chasing “out of space.”
How to use it
- Always use
btrfs filesystem dfandusage, not justdf -h. The standard tools mislead. - Identify whether data or metadata is exhausted. Fix differs.
- Prefer filtered balance over full balance — same effect, fraction of the I/O.
- Monitor snapshot counts. Thousands degrade everything.
Useful commands
# Inventory
btrfs filesystem df /mountpoint # data vs metadata vs system
btrfs filesystem usage /mountpoint # detailed; includes unallocated
btrfs filesystem show # all mounted FS + devices
btrfs subvolume list /mountpoint
btrfs subvolume show /mountpoint/subvol
btrfs device usage /mountpoint # per-device
# Detect metadata exhaustion
btrfs filesystem df / | grep -i metadata
# If "used >= total" and "Used: ..." is near total, you're metadata-full
# Balance — filtered (preferred)
btrfs balance start -dusage=50 /mountpoint # data chunks <50% full
btrfs balance start -musage=50 /mountpoint # metadata chunks <50% full
btrfs balance start -dusage=50 -musage=50 /mountpoint # both
btrfs balance status /mountpoint
btrfs balance cancel /mountpoint
btrfs balance pause /mountpoint
btrfs balance resume /mountpoint
# Snapshot management
btrfs subvolume snapshot /src /dst
btrfs subvolume snapshot -r /src /dst # read-only snapshot
btrfs subvolume delete /dst
btrfs subvolume sync /mountpoint # wait for queued deletes to complete
# Send / receive (replication)
btrfs send /snap-now | btrfs receive /backup/
btrfs send -p /snap-prev /snap-now | btrfs receive /backup/ # incremental
# Qgroups
btrfs quota enable /mountpoint
btrfs qgroup show /mountpoint
btrfs quota disable /mountpoint # if not needed (saves overhead)
# Multi-device
btrfs device add /dev/<new> /mountpoint
btrfs device replace start /dev/<old> /dev/<new> /mountpoint
btrfs device replace status /mountpoint
btrfs device remove /dev/<dev> /mountpoint
# Scrub (verify checksums)
btrfs scrub start /mountpoint
btrfs scrub status /mountpoint
btrfs scrub cancel /mountpoint
# Defragmentation
btrfs filesystem defragment -r /mountpoint # online, recursive
# Recovery
btrfs rescue chunk-recover /dev/<dev>
btrfs rescue zero-log /dev/<dev> # last resort
btrfs rescue super-recover /dev/<dev>
ENOSPC recovery workflow
"No space" error despite df free
│
├── btrfs filesystem df → metadata >= total?
│ ├── Yes → metadata exhausted
│ │ └── btrfs balance start -musage=80 /<mount>
│ └── No → continue
│
├── btrfs filesystem usage → "Unallocated" > 0?
│ ├── Yes → space available but no chunks allocated; balance will help
│ │ └── btrfs balance start -dusage=50 /<mount>
│ └── No → truly out of space; expand pool or delete data/snapshots
│
└── Many snapshots accumulated?
└── Remove old; btrfs subvolume sync /<mount>
Common findings this catches
btrfs filesystem dfshows metadata 99% → metadata balance:btrfs balance start -musage=80 /<mount>.- Hundreds of snapshots from Snapper → cleanup; configure
NUMBER_LIMIT_IMPORTANTandNUMBER_LIMITin/etc/snapper/configs/<name>. - Qgroups enabled, sluggish metadata → if not used, disable:
btrfs quota disable. - Multi-device RAID1 missing a device → mount
-o degraded; balance/replace to recover redundancy. btrfs balancestuck for hours → unfiltered balance on large FS; cancel, restart with filters.- Subvolume can’t be deleted — has child snapshots; list and clean:
btrfs subvolume list -s /<mount>.
Snapshot retention pattern
# Daily snapshot, retain 14
DATE=$(date +%Y%m%d)
btrfs subvolume snapshot -r /data /data/.snapshots/$DATE
# Cleanup older than 14 days
find /data/.snapshots -maxdepth 1 -mindepth 1 -type d -mtime +14 \
-exec btrfs subvolume delete {} \;
btrfs subvolume sync /data
When to escalate
- Btrfs RAID5/6 data corruption — known upstream bugs; engage btrfs-progs maintainers; move to a different RAID layer for production.
btrfs check --repairrecommended by online forums — strongly resist; usebtrfs rescuesubcommands or restore from backup.- Cross-version compatibility issues (e.g., features written by newer kernel) — boot into matching kernel version for recovery.
Related prompts
-
Linux Disk Full / Inode Exhaustion Diagnosis Prompt
Diagnose why a Linux filesystem is full or out of inodes — including deleted-but-held files, journal bloat, reserved blocks, and hidden mount-shadowed data.
-
LVM Troubleshooting Prompt
Diagnose and recover LVM problems — missing PV, VG inactive, snapshot full, thin pool exhausted, online/offline resize, and metadata corruption.
-
Linux mdraid Software RAID Recovery Prompt
Recover from degraded or failed mdraid arrays — failed disk, missing member, resync stuck, replacing drives without losing data.