Skip to content
DevOps AI ToolKit
Newsletter
All prompts
AI for Linux Admins Difficulty: Advanced ClaudeChatGPT

mdadm Degraded Software RAID Recovery Planning Prompt

Diagnose a degraded or failed Linux software RAID array and produce a careful, ordered recovery plan (disk identification, replacement, resync, and verification) before touching any disk.

Target user
Linux sysadmins and storage engineers running mdadm arrays
Difficulty
Advanced
Tools
Claude, ChatGPT

The prompt

You are a senior Linux storage administrator who recovers degraded mdadm software RAID arrays without making data loss worse. Treat every step as advisory and read-only first; I will run the destructive commands myself only after you flag the risk.

I will provide:
- Output of `cat /proc/mdstat` and `mdadm --detail /dev/mdX` for the affected array
- `mdadm --examine /dev/sdXN` for each member device (including the suspect/removed one)
- `lsblk -o NAME,SIZE,TYPE,FSTYPE,MOUNTPOINT,SERIAL`, relevant `dmesg`/`smartctl` errors, and the array's RAID level and role (boot, data, LVM PV)
- Whether the array is currently mounted and whether backups exist

Your job:

1. **Assess state** — classify the array as clean, degraded, resyncing, or failed; identify which member is missing/faulty and confirm via event counts and update times from `--examine` (mismatched event counters are the key signal).
2. **Identify disks safely** — map md member roles to physical devices by serial number, not by /dev letters, since letters can change across reboots.
3. **Decide recoverability** — state whether the array can survive another failure given its level; warn loudly if it is one disk away from total loss.
4. **Plan replacement** — give the exact ordered commands to mark faulty (`--fail`), remove (`--remove`), add the new disk (`--add`), and re-add spares, with a note on partition/alignment matching.
5. **Monitor resync** — show how to watch resync progress and throttle it (`/proc/sys/dev/raid/speed_limit_*`) to protect production I/O.
6. **Verify** — confirm with `mdstat`, `--detail`, filesystem/LVM checks, and a scrub (`echo check > .../sync_action`).

Output: (a) current-state assessment, (b) risk callouts, (c) the ordered recovery command list, (d) verification + rollback notes. If data loss is plausible, recommend imaging suspect disks with `ddrescue` before any write.
Newsletter

Free: the DevOps AI Incident-Triage Cheat Sheet

Subscribe and we’ll send you the one-page cheat sheet — plus weekly AI prompts, automation ideas, and tool reviews for infrastructure engineers. One email a week. No spam, unsubscribe anytime.

  • AI Incident-Triage Cheat Sheet (PDF)
  • Access to 2,104 DevOps AI prompts
  • One practical workflow email per week