Skip to content
CloudOps
Newsletter
All prompts
AI for Linux Admins Difficulty: Intermediate ClaudeChatGPT

Btrfs Balance & Snapshot Management Prompt

Manage Btrfs filesystems — data/metadata balance, snapshot lifecycle, qgroups, subvolume management, and recovering from chunk allocation issues.

Target user
Linux sysadmins running Btrfs in production (SUSE, Synology, openSUSE, Fedora)
Difficulty
Intermediate
Tools
Claude, ChatGPT

The prompt

You are a senior Linux sysadmin with deep Btrfs experience — multi-device pools, subvolumes, snapshots, send/receive replication. You know that `btrfs filesystem df` and `df` disagree (often dramatically) and that "no space left" with terabytes free is a Btrfs classic.

I will provide:
- The symptom (`ENOSPC` despite free space, slow balance, snapshot cleanup not freeing space, subvolume can't be deleted, qgroup limit hit)
- Output of `btrfs filesystem df /<mount>`, `btrfs filesystem usage /<mount>`, `btrfs subvolume list /<mount>`
- For multi-device: `btrfs device usage /<mount>`
- Recent dmesg lines mentioning btrfs
- The Btrfs profile (single, raid1, raid10, raid5/6)

Your job:

1. **Decode the "no space" pattern**:
   - **Standard df vs btrfs df disagreement** — `df` shows total chunk allocation; `btrfs filesystem df` shows data vs metadata vs system. ENOSPC happens when METADATA or SYSTEM is exhausted, even if DATA has terabytes free.
   - **Metadata exhaustion** is the #1 cause of unexpected ENOSPC. Solution: balance metadata.
   - **Unallocated space** = pool space not yet committed to a chunk; visible in `btrfs filesystem usage`.
2. **Balance strategy**:
   - **Full balance** — `btrfs balance start /<mount>`. Slow, I/O heavy. Avoid in production.
   - **Filtered balance** — `btrfs balance start -dusage=50 /<mount>` rebalances only data chunks <50% full. Much faster.
   - **Metadata balance** — `btrfs balance start -musage=50 /<mount>` for metadata-specific
   - **Combined** — `btrfs balance start -dusage=50 -musage=50 /<mount>`
   - **Status check** — `btrfs balance status /<mount>`; **cancel** with `btrfs balance cancel`
3. **For snapshot management**:
   - Snapshots are subvolumes; same operations work on them
   - Deleting a snapshot does NOT immediately free space — Btrfs queues the delete; check `btrfs subvolume sync /<mount>` to wait for cleanup
   - Many snapshots slow metadata operations (linear scaling); cap at hundreds, not thousands per subvolume
   - **Snapshot-based backup** with `btrfs send | btrfs receive` for incremental replication
4. **For qgroups (quota)**:
   - Enable with `btrfs quota enable`
   - Qgroups track usage per subvolume; useful for multi-tenant but EXPENSIVE in metadata
   - **Qgroup rescan** triggered by snapshot operations can take hours on large filesystems
   - Disable if you don't need quotas (`btrfs quota disable`)
5. **For multi-device profiles**:
   - **Single** — no redundancy
   - **RAID1** — duplicate blocks across two devices; not RAID1 like md (block-level)
   - **RAID10** — duplicate + stripe
   - **RAID5/6** — known to have write-hole bugs; not recommended for production
   - Profile per data type (data, metadata) configurable separately
6. **For dead member recovery**:
   - `btrfs device replace start /dev/old /dev/new /<mount>` — online replace
   - `btrfs device remove /dev/<dev> /<mount>` — slow, but works while mounted
   - **mounting with `degraded` option** — allows mount with missing device for recovery
7. **For "can't delete subvolume"**:
   - Subvolume has child subvolumes (snapshots) — list and remove them first
   - `btrfs subvolume show <path>` shows snapshot relationships

Mark DESTRUCTIVE clearly: full balance on production (slow, I/O storm), `btrfs subvolume delete -c` (commits immediately), removing a device from a RAID0/single profile.

---

Symptom: [DESCRIBE]
`btrfs filesystem df /<mount>`:
```
[PASTE]
```
`btrfs filesystem usage /<mount>`:
```
[PASTE]
```
`btrfs subvolume list /<mount>`:
```
[PASTE]
```
Recent dmesg (btrfs):
```
[PASTE]
```
Profile / topology: [single / raid1 / raid10 / multi-device + devices]

Why this prompt works

Btrfs’s space accounting is famously confusing: df says 80% full, btrfs filesystem df says metadata 99% (the actual ENOSPC cause), and btrfs filesystem usage shows 200GB unallocated. This prompt forces the Btrfs-aware accounting before chasing “out of space.”

How to use it

  1. Always use btrfs filesystem df and usage, not just df -h. The standard tools mislead.
  2. Identify whether data or metadata is exhausted. Fix differs.
  3. Prefer filtered balance over full balance — same effect, fraction of the I/O.
  4. Monitor snapshot counts. Thousands degrade everything.

Useful commands

# Inventory
btrfs filesystem df /mountpoint           # data vs metadata vs system
btrfs filesystem usage /mountpoint        # detailed; includes unallocated
btrfs filesystem show                     # all mounted FS + devices
btrfs subvolume list /mountpoint
btrfs subvolume show /mountpoint/subvol
btrfs device usage /mountpoint            # per-device

# Detect metadata exhaustion
btrfs filesystem df / | grep -i metadata
# If "used >= total" and "Used: ..." is near total, you're metadata-full

# Balance — filtered (preferred)
btrfs balance start -dusage=50 /mountpoint                  # data chunks <50% full
btrfs balance start -musage=50 /mountpoint                  # metadata chunks <50% full
btrfs balance start -dusage=50 -musage=50 /mountpoint       # both
btrfs balance status /mountpoint
btrfs balance cancel /mountpoint
btrfs balance pause /mountpoint
btrfs balance resume /mountpoint

# Snapshot management
btrfs subvolume snapshot /src /dst
btrfs subvolume snapshot -r /src /dst       # read-only snapshot
btrfs subvolume delete /dst
btrfs subvolume sync /mountpoint             # wait for queued deletes to complete

# Send / receive (replication)
btrfs send /snap-now | btrfs receive /backup/
btrfs send -p /snap-prev /snap-now | btrfs receive /backup/   # incremental

# Qgroups
btrfs quota enable /mountpoint
btrfs qgroup show /mountpoint
btrfs quota disable /mountpoint              # if not needed (saves overhead)

# Multi-device
btrfs device add /dev/<new> /mountpoint
btrfs device replace start /dev/<old> /dev/<new> /mountpoint
btrfs device replace status /mountpoint
btrfs device remove /dev/<dev> /mountpoint

# Scrub (verify checksums)
btrfs scrub start /mountpoint
btrfs scrub status /mountpoint
btrfs scrub cancel /mountpoint

# Defragmentation
btrfs filesystem defragment -r /mountpoint    # online, recursive

# Recovery
btrfs rescue chunk-recover /dev/<dev>
btrfs rescue zero-log /dev/<dev>              # last resort
btrfs rescue super-recover /dev/<dev>

ENOSPC recovery workflow

"No space" error despite df free

├── btrfs filesystem df → metadata >= total?
│   ├── Yes → metadata exhausted
│   │   └── btrfs balance start -musage=80 /<mount>
│   └── No → continue

├── btrfs filesystem usage → "Unallocated" > 0?
│   ├── Yes → space available but no chunks allocated; balance will help
│   │   └── btrfs balance start -dusage=50 /<mount>
│   └── No → truly out of space; expand pool or delete data/snapshots

└── Many snapshots accumulated?
    └── Remove old; btrfs subvolume sync /<mount>

Common findings this catches

  • btrfs filesystem df shows metadata 99% → metadata balance: btrfs balance start -musage=80 /<mount>.
  • Hundreds of snapshots from Snapper → cleanup; configure NUMBER_LIMIT_IMPORTANT and NUMBER_LIMIT in /etc/snapper/configs/<name>.
  • Qgroups enabled, sluggish metadata → if not used, disable: btrfs quota disable.
  • Multi-device RAID1 missing a device → mount -o degraded; balance/replace to recover redundancy.
  • btrfs balance stuck for hours → unfiltered balance on large FS; cancel, restart with filters.
  • Subvolume can’t be deleted — has child snapshots; list and clean: btrfs subvolume list -s /<mount>.

Snapshot retention pattern

# Daily snapshot, retain 14
DATE=$(date +%Y%m%d)
btrfs subvolume snapshot -r /data /data/.snapshots/$DATE

# Cleanup older than 14 days
find /data/.snapshots -maxdepth 1 -mindepth 1 -type d -mtime +14 \
  -exec btrfs subvolume delete {} \;
btrfs subvolume sync /data

When to escalate

  • Btrfs RAID5/6 data corruption — known upstream bugs; engage btrfs-progs maintainers; move to a different RAID layer for production.
  • btrfs check --repair recommended by online forums — strongly resist; use btrfs rescue subcommands or restore from backup.
  • Cross-version compatibility issues (e.g., features written by newer kernel) — boot into matching kernel version for recovery.

Related prompts

Newsletter

Free: the DevOps AI Incident-Triage Cheat Sheet

Subscribe and we’ll send you the one-page cheat sheet — plus weekly AI prompts, automation ideas, and tool reviews for infrastructure engineers. One email a week. No spam, unsubscribe anytime.

  • AI Incident-Triage Cheat Sheet (PDF)
  • Access to 1,603 DevOps AI prompts
  • One practical workflow email per week