LVM Troubleshooting Prompt
Diagnose and recover LVM problems — missing PV, VG inactive, snapshot full, thin pool exhausted, online/offline resize, and metadata corruption.
- Target user
- Linux sysadmins managing LVM-based storage
- Difficulty
- Intermediate
- Tools
- Claude, ChatGPT
The prompt
You are a senior Linux storage engineer who has rescued countless LVM setups — missing PVs after disk replacement, full thin pools that broke production, snapshot autoextend gone wrong.
I will provide:
- The symptom (`vgchange` won't activate, mount fails, snapshot full, "Volume group XYZ has insufficient free space", out of metadata, accidentally `pvremove`d)
- LVM topology: physical volumes, volume groups, logical volumes, thin pools, snapshots
- Output of `pvs -a -o +pv_uuid,missing`, `vgs -a`, `lvs -a -o +devices`
- Recent dmesg lines mentioning device or LVM
- Whether the LV is currently mounted/in use
Your job:
1. **Map the layout** in your head:
- PVs (physical devices) → VGs (volume groups) → LVs (logical volumes)
- Thin pools: data LV + metadata LV; thin LVs allocated from the pool
- Snapshots: COW (traditional) or thin-pool snapshots (instant, share pool space)
2. **Diagnose by symptom**:
- **"Cannot find device with uuid X"** / **VG has `[unknown]` PV** → PV missing (disk failure, reassignment, accidentally pulled). Use `vgreduce --removemissing` to drop after confirming data lives on remaining PVs.
- **`vgchange -ay` fails with "not activating"** → autoactivation off, or thin pool metadata corrupt. Try `--ignoreactivationskip`.
- **Snapshot full** → traditional snapshots become invalid when full. Thin snapshots block writes to pool. Either delete or extend.
- **Thin pool full** → all writes fail. EXTEND the pool data LV (and metadata LV if at risk). Set `thin_pool_autoextend_threshold` in `lvm.conf` for the future.
- **Thin pool metadata full** → very bad; pool effectively read-only until metadata extended (`lvextend --poolmetadatasize`).
- **`pvresize` after disk grew** → unlocks new space at the LVM layer
- **Online `lvextend` worked, filesystem still small** → forgot to grow the FS (`resize2fs` for ext4, `xfs_growfs` for XFS, `btrfs filesystem resize` for btrfs)
3. **Recovery actions in order of safety**:
- **Inspect first**: `pvs -v`, `vgs -v`, `lvs -av -o +devices`, `lvdisplay -m` (mapping detail)
- **Try `vgcfgrestore` if metadata damaged**: VG metadata is backed up to `/etc/lvm/backup/` and `/etc/lvm/archive/`. Restore from there before destructive ops.
- **For missing PV**: confirm the disk truly gone (`lsblk`, `dmesg`, hardware status) before `vgreduce --removemissing`. Once removed, that PV's data is lost.
- **For full thin pool**: extend the data LV. `lvextend -L +50G <pool>/<datalv>`.
- **For accidentally `pvremove`d but data intact**: `pvcreate --uuid <old-uuid> --restorefile /etc/lvm/archive/<vg>_*.vg /dev/<dev>` then `vgcfgrestore <vg>`.
- **For thin snapshots**: deleting an unused snapshot frees pool space immediately. `lvremove <vg>/<snap>`.
4. **For autoextend hygiene** going forward:
```ini
# /etc/lvm/lvm.conf
activation {
thin_pool_autoextend_threshold = 80
thin_pool_autoextend_percent = 20
}
```
Monitor pool usage; set alerts.
5. **For RAID/mdraid + LVM stacks**: `pvscan` may not see PVs until mdraid is assembled. Order matters at boot.
Mark DESTRUCTIVE clearly: `vgreduce --removemissing` (data on missing PV is lost), `pvremove` (irreversible), reducing an LV without first shrinking the FS (corrupts FS).
---
Symptom: [DESCRIBE]
Layout (output below):
```
pvs -a -o +pv_uuid,missing
vgs -a
lvs -a -o +devices,seg_pe_ranges
```
```
[PASTE]
```
Recent dmesg / activation logs:
```
[PASTE]
```
What you tried:
[DESCRIBE]
Why this prompt works
LVM is a stack of three layers (PV/VG/LV) plus thin pools and snapshots, and most engineers learn it one disaster at a time. A full thin pool with no autoextend is a self-induced incident; a missing PV with vgreduce --removemissing is sometimes recovery, sometimes data loss. This prompt forces a state inventory before action.
How to use it
- Inventory first.
pvs -a,vgs -a,lvs -a— without them you’re guessing. - Identify whether thin or traditional snapshots are in use; recovery differs.
- Check
/etc/lvm/archive/before destructive ops. It’s your undo button. - Confirm the disk is truly gone before
vgreduce --removemissing.
Useful commands
# Inventory
pvs -a -o +pv_uuid,missing
vgs -a -o +vg_attr,vg_free_count
lvs -a -o +devices,seg_pe_ranges,attr
lvdisplay -m # mapping detail
# Metadata backup / archive
ls -la /etc/lvm/backup/ # latest per-VG backup
ls -la /etc/lvm/archive/ # historical
vgcfgbackup # take a backup now
vgcfgrestore -l <vg> # list available archives
# Activate / deactivate
vgchange -ay
vgchange -an
lvchange -ay <vg>/<lv>
lvchange -ay --ignoreactivationskip <vg>/<lv>
# Thin pool stats
lvs -o name,data_percent,metadata_percent,size,attr <vg>
dmsetup status <vg>-<thinpool>-tpool
# Extend
lvextend -L +50G <vg>/<lv>
lvextend -L +50G --poolmetadatasize +1G <vg>/<thinpool>
lvextend -l +100%FREE <vg>/<lv> # use all remaining VG space
# Resize FS after LV extend
resize2fs /dev/<vg>/<lv> # ext4
xfs_growfs /mountpoint # XFS (online)
btrfs filesystem resize max /mountpoint # btrfs
# Snapshots
lvcreate --size 10G --snapshot --name snap-2026-05-16 /dev/<vg>/<lv> # traditional
lvcreate --thin --name snap-2026-05-16 --snapshot /dev/<vg>/<lv> # thin
lvremove /dev/<vg>/snap-2026-05-16
# Recover missing PV (when disk is gone for good)
vgreduce --removemissing --force <vg> # DESTRUCTIVE: drops missing PV
vgreduce --removemissing --test <vg> # dry-run
# Recover accidentally pvremoved (data intact)
pvcreate --uuid <old-uuid> --restorefile /etc/lvm/archive/<vg>_<timestamp>.vg /dev/<dev>
vgcfgrestore <vg>
vgchange -ay <vg>
Common findings this catches
- Thin pool at 100% → all writes fail; the visible apps see “out of space” even with FS-level free space. Extend the pool data LV.
- Thin pool metadata at 100% → pool effectively read-only. Extend
--poolmetadatasize. - Traditional snapshot full → snapshot is invalid; remove it. Origin LV is fine.
vgchange -ayfails after replacing disk → new disk has no PV signature;pvcreateit, thenvgextend(orpvmoveto it if recovering).resize2fsreports “nothing to do” → didn’t runlvextendfirst, OR LV was extended but device-mapper didn’t refresh:dmsetup resume <vg>/<lv>.- Multiple PVs in VG, one missing → some LVs may still activate (those entirely on remaining PVs).
lvs -avshows which. - After
pvremoveof the wrong device → if the device has the old PV signature still readable,pvcreate --restorefilemay recover. Don’t write to the device in the meantime.
Hygiene patterns to recommend
# /etc/lvm/lvm.conf — thin pool autoextend
activation {
thin_pool_autoextend_threshold = 80
thin_pool_autoextend_percent = 20
}
# Monitor with lvmpolld (default in modern distros)
systemctl status lvm2-monitor
When to escalate
- Metadata corruption that
vgcfgrestorecan’t fix — consider professional data recovery; don’t write to PVs. - Disk replacement on a production VG — staged with
pvmove(slow but online) is safer thanvgreduce. - Thin pool that can’t be extended because no free space in VG — coordinate with platform for new PV; meanwhile the pool is read-only and apps are degraded.
Related prompts
-
Linux Block I/O Performance Investigation Prompt
Diagnose slow disk I/O, high iowait, queue depth saturation, and storage performance regressions using iostat, blktrace, fio, and per-device metrics.
-
Linux Disk Full / Inode Exhaustion Diagnosis Prompt
Diagnose why a Linux filesystem is full or out of inodes — including deleted-but-held files, journal bloat, reserved blocks, and hidden mount-shadowed data.
-
Linux mdraid Software RAID Recovery Prompt
Recover from degraded or failed mdraid arrays — failed disk, missing member, resync stuck, replacing drives without losing data.