Skip to content
CloudOps
Newsletter
All prompts
AI for Linux Admins Difficulty: Intermediate ClaudeChatGPT

LVM Troubleshooting Prompt

Diagnose and recover LVM problems — missing PV, VG inactive, snapshot full, thin pool exhausted, online/offline resize, and metadata corruption.

Target user
Linux sysadmins managing LVM-based storage
Difficulty
Intermediate
Tools
Claude, ChatGPT

The prompt

You are a senior Linux storage engineer who has rescued countless LVM setups — missing PVs after disk replacement, full thin pools that broke production, snapshot autoextend gone wrong.

I will provide:
- The symptom (`vgchange` won't activate, mount fails, snapshot full, "Volume group XYZ has insufficient free space", out of metadata, accidentally `pvremove`d)
- LVM topology: physical volumes, volume groups, logical volumes, thin pools, snapshots
- Output of `pvs -a -o +pv_uuid,missing`, `vgs -a`, `lvs -a -o +devices`
- Recent dmesg lines mentioning device or LVM
- Whether the LV is currently mounted/in use

Your job:

1. **Map the layout** in your head:
   - PVs (physical devices) → VGs (volume groups) → LVs (logical volumes)
   - Thin pools: data LV + metadata LV; thin LVs allocated from the pool
   - Snapshots: COW (traditional) or thin-pool snapshots (instant, share pool space)
2. **Diagnose by symptom**:
   - **"Cannot find device with uuid X"** / **VG has `[unknown]` PV** → PV missing (disk failure, reassignment, accidentally pulled). Use `vgreduce --removemissing` to drop after confirming data lives on remaining PVs.
   - **`vgchange -ay` fails with "not activating"** → autoactivation off, or thin pool metadata corrupt. Try `--ignoreactivationskip`.
   - **Snapshot full** → traditional snapshots become invalid when full. Thin snapshots block writes to pool. Either delete or extend.
   - **Thin pool full** → all writes fail. EXTEND the pool data LV (and metadata LV if at risk). Set `thin_pool_autoextend_threshold` in `lvm.conf` for the future.
   - **Thin pool metadata full** → very bad; pool effectively read-only until metadata extended (`lvextend --poolmetadatasize`).
   - **`pvresize` after disk grew** → unlocks new space at the LVM layer
   - **Online `lvextend` worked, filesystem still small** → forgot to grow the FS (`resize2fs` for ext4, `xfs_growfs` for XFS, `btrfs filesystem resize` for btrfs)
3. **Recovery actions in order of safety**:
   - **Inspect first**: `pvs -v`, `vgs -v`, `lvs -av -o +devices`, `lvdisplay -m` (mapping detail)
   - **Try `vgcfgrestore` if metadata damaged**: VG metadata is backed up to `/etc/lvm/backup/` and `/etc/lvm/archive/`. Restore from there before destructive ops.
   - **For missing PV**: confirm the disk truly gone (`lsblk`, `dmesg`, hardware status) before `vgreduce --removemissing`. Once removed, that PV's data is lost.
   - **For full thin pool**: extend the data LV. `lvextend -L +50G <pool>/<datalv>`.
   - **For accidentally `pvremove`d but data intact**: `pvcreate --uuid <old-uuid> --restorefile /etc/lvm/archive/<vg>_*.vg /dev/<dev>` then `vgcfgrestore <vg>`.
   - **For thin snapshots**: deleting an unused snapshot frees pool space immediately. `lvremove <vg>/<snap>`.
4. **For autoextend hygiene** going forward:
   ```ini
   # /etc/lvm/lvm.conf
   activation {
     thin_pool_autoextend_threshold = 80
     thin_pool_autoextend_percent = 20
   }
   ```
   Monitor pool usage; set alerts.
5. **For RAID/mdraid + LVM stacks**: `pvscan` may not see PVs until mdraid is assembled. Order matters at boot.

Mark DESTRUCTIVE clearly: `vgreduce --removemissing` (data on missing PV is lost), `pvremove` (irreversible), reducing an LV without first shrinking the FS (corrupts FS).

---

Symptom: [DESCRIBE]
Layout (output below):
```
pvs -a -o +pv_uuid,missing
vgs -a
lvs -a -o +devices,seg_pe_ranges
```
```
[PASTE]
```
Recent dmesg / activation logs:
```
[PASTE]
```
What you tried:
[DESCRIBE]

Why this prompt works

LVM is a stack of three layers (PV/VG/LV) plus thin pools and snapshots, and most engineers learn it one disaster at a time. A full thin pool with no autoextend is a self-induced incident; a missing PV with vgreduce --removemissing is sometimes recovery, sometimes data loss. This prompt forces a state inventory before action.

How to use it

  1. Inventory first. pvs -a, vgs -a, lvs -a — without them you’re guessing.
  2. Identify whether thin or traditional snapshots are in use; recovery differs.
  3. Check /etc/lvm/archive/ before destructive ops. It’s your undo button.
  4. Confirm the disk is truly gone before vgreduce --removemissing.

Useful commands

# Inventory
pvs -a -o +pv_uuid,missing
vgs -a -o +vg_attr,vg_free_count
lvs -a -o +devices,seg_pe_ranges,attr
lvdisplay -m                    # mapping detail

# Metadata backup / archive
ls -la /etc/lvm/backup/         # latest per-VG backup
ls -la /etc/lvm/archive/        # historical
vgcfgbackup                     # take a backup now
vgcfgrestore -l <vg>            # list available archives

# Activate / deactivate
vgchange -ay
vgchange -an
lvchange -ay <vg>/<lv>
lvchange -ay --ignoreactivationskip <vg>/<lv>

# Thin pool stats
lvs -o name,data_percent,metadata_percent,size,attr <vg>
dmsetup status <vg>-<thinpool>-tpool

# Extend
lvextend -L +50G <vg>/<lv>
lvextend -L +50G --poolmetadatasize +1G <vg>/<thinpool>
lvextend -l +100%FREE <vg>/<lv>          # use all remaining VG space

# Resize FS after LV extend
resize2fs /dev/<vg>/<lv>                 # ext4
xfs_growfs /mountpoint                   # XFS (online)
btrfs filesystem resize max /mountpoint  # btrfs

# Snapshots
lvcreate --size 10G --snapshot --name snap-2026-05-16 /dev/<vg>/<lv>          # traditional
lvcreate --thin --name snap-2026-05-16 --snapshot /dev/<vg>/<lv>               # thin
lvremove /dev/<vg>/snap-2026-05-16

# Recover missing PV (when disk is gone for good)
vgreduce --removemissing --force <vg>     # DESTRUCTIVE: drops missing PV
vgreduce --removemissing --test <vg>      # dry-run

# Recover accidentally pvremoved (data intact)
pvcreate --uuid <old-uuid> --restorefile /etc/lvm/archive/<vg>_<timestamp>.vg /dev/<dev>
vgcfgrestore <vg>
vgchange -ay <vg>

Common findings this catches

  • Thin pool at 100% → all writes fail; the visible apps see “out of space” even with FS-level free space. Extend the pool data LV.
  • Thin pool metadata at 100% → pool effectively read-only. Extend --poolmetadatasize.
  • Traditional snapshot full → snapshot is invalid; remove it. Origin LV is fine.
  • vgchange -ay fails after replacing disk → new disk has no PV signature; pvcreate it, then vgextend (or pvmove to it if recovering).
  • resize2fs reports “nothing to do” → didn’t run lvextend first, OR LV was extended but device-mapper didn’t refresh: dmsetup resume <vg>/<lv>.
  • Multiple PVs in VG, one missing → some LVs may still activate (those entirely on remaining PVs). lvs -av shows which.
  • After pvremove of the wrong device → if the device has the old PV signature still readable, pvcreate --restorefile may recover. Don’t write to the device in the meantime.

Hygiene patterns to recommend

# /etc/lvm/lvm.conf — thin pool autoextend
activation {
    thin_pool_autoextend_threshold = 80
    thin_pool_autoextend_percent = 20
}

# Monitor with lvmpolld (default in modern distros)
systemctl status lvm2-monitor

When to escalate

  • Metadata corruption that vgcfgrestore can’t fix — consider professional data recovery; don’t write to PVs.
  • Disk replacement on a production VG — staged with pvmove (slow but online) is safer than vgreduce.
  • Thin pool that can’t be extended because no free space in VG — coordinate with platform for new PV; meanwhile the pool is read-only and apps are degraded.

Related prompts

Newsletter

Free: the DevOps AI Incident-Triage Cheat Sheet

Subscribe and we’ll send you the one-page cheat sheet — plus weekly AI prompts, automation ideas, and tool reviews for infrastructure engineers. One email a week. No spam, unsubscribe anytime.

  • AI Incident-Triage Cheat Sheet (PDF)
  • Access to 1,603 DevOps AI prompts
  • One practical workflow email per week