You are a senior Linux storage engineer who has rescued countless LVM setups — missing PVs after disk replacement, full thin pools that broke production, snapshot autoextend gone wrong. I will provide: - The symptom (`vgchange` won't activate, mount fails, snapshot full, "Volume group XYZ has insufficient free space", out of metadata, accidentally `pvremove`d) - LVM topology: physical volumes, volume groups, logical volumes, thin pools, snapshots - Output of `pvs -a -o +pv_uuid,missing`, `vgs -a`, `lvs -a -o +devices` - Recent dmesg lines mentioning device or LVM - Whether the LV is currently mounted/in use Your job: 1. **Map the layout** in your head: - PVs (physical devices) → VGs (volume groups) → LVs (logical volumes) - Thin pools: data LV + metadata LV; thin LVs allocated from the pool - Snapshots: COW (traditional) or thin-pool snapshots (instant, share pool space) 2. **Diagnose by symptom**: - **"Cannot find device with uuid X"** / **VG has `[unknown]` PV** → PV missing (disk failure, reassignment, accidentally pulled). Use `vgreduce --removemissing` to drop after confirming data lives on remaining PVs. - **`vgchange -ay` fails with "not activating"** → autoactivation off, or thin pool metadata corrupt. Try `--ignoreactivationskip`. - **Snapshot full** → traditional snapshots become invalid when full. Thin snapshots block writes to pool. Either delete or extend. - **Thin pool full** → all writes fail. EXTEND the pool data LV (and metadata LV if at risk). Set `thin_pool_autoextend_threshold` in `lvm.conf` for the future. - **Thin pool metadata full** → very bad; pool effectively read-only until metadata extended (`lvextend --poolmetadatasize`). - **`pvresize` after disk grew** → unlocks new space at the LVM layer - **Online `lvextend` worked, filesystem still small** → forgot to grow the FS (`resize2fs` for ext4, `xfs_growfs` for XFS, `btrfs filesystem resize` for btrfs) 3. **Recovery actions in order of safety**: - **Inspect first**: `pvs -v`, `vgs -v`, `lvs -av -o +devices`, `lvdisplay -m` (mapping detail) - **Try `vgcfgrestore` if metadata damaged**: VG metadata is backed up to `/etc/lvm/backup/` and `/etc/lvm/archive/`. Restore from there before destructive ops. - **For missing PV**: confirm the disk truly gone (`lsblk`, `dmesg`, hardware status) before `vgreduce --removemissing`. Once removed, that PV's data is lost. - **For full thin pool**: extend the data LV. `lvextend -L +50G <pool>/<datalv>`. - **For accidentally `pvremove`d but data intact**: `pvcreate --uuid <old-uuid> --restorefile /etc/lvm/archive/<vg>_*.vg /dev/<dev>` then `vgcfgrestore <vg>`. - **For thin snapshots**: deleting an unused snapshot frees pool space immediately. `lvremove <vg>/<snap>`. 4. **For autoextend hygiene** going forward: ```ini # /etc/lvm/lvm.conf activation { thin_pool_autoextend_threshold = 80 thin_pool_autoextend_percent = 20 } ``` Monitor pool usage; set alerts. 5. **For RAID/mdraid + LVM stacks**: `pvscan` may not see PVs until mdraid is assembled. Order matters at boot. Mark DESTRUCTIVE clearly: `vgreduce --removemissing` (data on missing PV is lost), `pvremove` (irreversible), reducing an LV without first shrinking the FS (corrupts FS). --- Symptom: [DESCRIBE] Layout (output below): ``` pvs -a -o +pv_uuid,missing vgs -a lvs -a -o +devices,seg_pe_ranges ``` ``` [PASTE] ``` Recent dmesg / activation logs: ``` [PASTE] ``` What you tried: [DESCRIBE]

Why this prompt works

LVM is a stack of three layers (PV/VG/LV) plus thin pools and snapshots, and most engineers learn it one disaster at a time. A full thin pool with no autoextend is a self-induced incident; a missing PV with vgreduce --removemissing is sometimes recovery, sometimes data loss. This prompt forces a state inventory before action.

How to use it

Inventory first. pvs -a, vgs -a, lvs -a — without them you’re guessing.
Identify whether thin or traditional snapshots are in use; recovery differs.
Check /etc/lvm/archive/ before destructive ops. It’s your undo button.
Confirm the disk is truly gone before vgreduce --removemissing.

Useful commands

# Inventory
pvs -a -o +pv_uuid,missing
vgs -a -o +vg_attr,vg_free_count
lvs -a -o +devices,seg_pe_ranges,attr
lvdisplay -m                    # mapping detail

# Metadata backup / archive
ls -la /etc/lvm/backup/         # latest per-VG backup
ls -la /etc/lvm/archive/        # historical
vgcfgbackup                     # take a backup now
vgcfgrestore -l <vg>            # list available archives

# Activate / deactivate
vgchange -ay
vgchange -an
lvchange -ay <vg>/<lv>
lvchange -ay --ignoreactivationskip <vg>/<lv>

# Thin pool stats
lvs -o name,data_percent,metadata_percent,size,attr <vg>
dmsetup status <vg>-<thinpool>-tpool

# Extend
lvextend -L +50G <vg>/<lv>
lvextend -L +50G --poolmetadatasize +1G <vg>/<thinpool>
lvextend -l +100%FREE <vg>/<lv>          # use all remaining VG space

# Resize FS after LV extend
resize2fs /dev/<vg>/<lv>                 # ext4
xfs_growfs /mountpoint                   # XFS (online)
btrfs filesystem resize max /mountpoint  # btrfs

# Snapshots
lvcreate --size 10G --snapshot --name snap-2026-05-16 /dev/<vg>/<lv>          # traditional
lvcreate --thin --name snap-2026-05-16 --snapshot /dev/<vg>/<lv>               # thin
lvremove /dev/<vg>/snap-2026-05-16

# Recover missing PV (when disk is gone for good)
vgreduce --removemissing --force <vg>     # DESTRUCTIVE: drops missing PV
vgreduce --removemissing --test <vg>      # dry-run

# Recover accidentally pvremoved (data intact)
pvcreate --uuid <old-uuid> --restorefile /etc/lvm/archive/<vg>_<timestamp>.vg /dev/<dev>
vgcfgrestore <vg>
vgchange -ay <vg>

Common findings this catches

Thin pool at 100% → all writes fail; the visible apps see “out of space” even with FS-level free space. Extend the pool data LV.
Thin pool metadata at 100% → pool effectively read-only. Extend --poolmetadatasize.
Traditional snapshot full → snapshot is invalid; remove it. Origin LV is fine.
vgchange -ay fails after replacing disk → new disk has no PV signature; pvcreate it, then vgextend (or pvmove to it if recovering).
resize2fs reports “nothing to do” → didn’t run lvextend first, OR LV was extended but device-mapper didn’t refresh: dmsetup resume <vg>/<lv>.
Multiple PVs in VG, one missing → some LVs may still activate (those entirely on remaining PVs). lvs -av shows which.
After pvremove of the wrong device → if the device has the old PV signature still readable, pvcreate --restorefile may recover. Don’t write to the device in the meantime.

# /etc/lvm/lvm.conf — thin pool autoextend
activation {
    thin_pool_autoextend_threshold = 80
    thin_pool_autoextend_percent = 20
}

# Monitor with lvmpolld (default in modern distros)
systemctl status lvm2-monitor

When to escalate

Metadata corruption that vgcfgrestore can’t fix — consider professional data recovery; don’t write to PVs.
Disk replacement on a production VG — staged with pvmove (slow but online) is safer than vgreduce.
Thin pool that can’t be extended because no free space in VG — coordinate with platform for new PV; meanwhile the pool is read-only and apps are degraded.

LVM Troubleshooting Prompt

Why this prompt works

How to use it

Useful commands

Common findings this catches

When to escalate

Related prompts

Linux Block I/O Performance Investigation Prompt

Linux Disk Full / Inode Exhaustion Diagnosis Prompt

Linux mdraid Software RAID Recovery Prompt

Why this prompt works

How to use it

Useful commands

Common findings this catches

Hygiene patterns to recommend

When to escalate

Related prompts

Linux Block I/O Performance Investigation Prompt

Linux Disk Full / Inode Exhaustion Diagnosis Prompt

Linux mdraid Software RAID Recovery Prompt

Free: the DevOps AI Incident-Triage Cheat Sheet