Managing Software RAID with mdadm: Building, Monitoring, and Recovering
Software RAID with mdadm is rock-solid when you understand it. Here's how to build arrays, monitor health, and recover from a failed disk without losing data.
- #linux
- #raid
- #mdadm
- #storage
- #disks
- #recovery
Hardware RAID controllers are great until the controller dies and you discover your array is in a proprietary format no other card can read. Linux software RAID via mdadm has none of that lock-in: the metadata is open, the arrays are portable between machines, and it’s been battle-tested for two decades. I’ve recovered more data from mdadm arrays than from any hardware controller. Here’s how to build, watch, and rescue them.
Choosing a RAID level
Quick reality check before you create anything:
- RAID 1 (mirror) — two+ disks, full redundancy, simple, survives one disk loss. My default for OS/boot and small critical volumes.
- RAID 10 (mirror + stripe) — speed and redundancy, needs 4+ disks, my default for databases.
- RAID 5 — one parity disk, survives one failure, but rebuilds are slow and stressful on large modern drives. Acceptable for archival, risky for big arrays.
- RAID 6 — two parity disks, survives two failures, the sane choice over RAID 5 for large-capacity arrays.
RAID is not a backup. It protects against disk failure, not rm -rf, corruption, or fire. Keep backups regardless.
Building an array
Say you have /dev/sdb and /dev/sdc for a mirror:
sudo mdadm --create /dev/md0 --level=1 --raid-devices=2 /dev/sdb /dev/sdc
Watch the initial sync:
cat /proc/mdstat
/proc/mdstat is the heartbeat of software RAID — it shows every array, its disks, sync progress, and which members are missing. You’ll learn to read it at a glance.
Then make a filesystem and mount it:
sudo mkfs.ext4 /dev/md0
sudo mount /dev/md0 /data
Make the array reassemble on boot
This is the step people forget, and then the array doesn’t come up after a reboot. Save the array definition to the config and rebuild the initramfs:
sudo mdadm --detail --scan | sudo tee -a /etc/mdadm/mdadm.conf
sudo update-initramfs -u # Debian/Ubuntu
# or: sudo dracut -f # RHEL family
Add a /etc/fstab entry by UUID (get it from blkid /dev/md0), not by /dev/md0, since array device numbers can shift.
Monitoring: catch a failure before it’s a disaster
The whole point of RAID is surviving a disk failure — but only if you notice the first failure before the second one kills you. Set up email alerts:
# In /etc/mdadm/mdadm.conf
MAILADDR you@example.com
mdadm runs a monitor daemon that emails on Fail, DegradedArray, and SpareMissing events. Test it:
sudo mdadm --monitor --scan --test --oneshot
For health at a glance:
sudo mdadm --detail /dev/md0 # state, per-disk status, event count
cat /proc/mdstat
A healthy mirror shows [UU]. A degraded one shows [U_] — that underscore is the alarm. Also schedule periodic SMART checks (smartctl -a /dev/sdb); RAID protects against a dead disk, but a disk throwing read errors can quietly corrupt before it fully dies.
Recovering from a failed disk
Here’s the part that matters at 2am. A disk failed and the array is degraded but still serving data. The drill:
1. Identify the failed member:
sudo mdadm --detail /dev/md0
cat /proc/mdstat # the [U_] tells you which slot is down
2. Mark it failed and remove it (if not already):
sudo mdadm /dev/md0 --fail /dev/sdc
sudo mdadm /dev/md0 --remove /dev/sdc
3. Physically replace the drive, then add the new one:
sudo mdadm /dev/md0 --add /dev/sdc
The array immediately starts rebuilding onto the new disk. Watch it:
cat /proc/mdstat # shows recovery percentage and ETA
Do not reboot or remove a second disk during a RAID 5/6 rebuild — that window is exactly when arrays die for good. For RAID 5 on large drives, rebuilds can take many hours; plan for it.
Hot spares: automate the swap
For arrays you can’t babysit, add a hot spare. mdadm automatically pulls it in when a disk fails, starting the rebuild without you:
sudo mdadm --create /dev/md0 --level=5 --raid-devices=3 --spare-devices=1 \
/dev/sdb /dev/sdc /dev/sdd /dev/sde
Now a single failure triggers an automatic rebuild onto /dev/sde, buying you time to replace the dead disk on your own schedule.
Growing and reshaping
mdadm can grow arrays online — add a disk and expand:
sudo mdadm --add /dev/md0 /dev/sdf
sudo mdadm --grow /dev/md0 --raid-devices=4
sudo resize2fs /dev/md0 # then grow the filesystem
Reshaping is powerful but slow and risky on a live array — have backups and ideally do it during a maintenance window.
A recovery checklist worth saving
cat /proc/mdstat— what’s degraded?mdadm --detail /dev/mdX— which member, what state?--failthen--removethe bad disk.- Replace hardware,
--addthe new disk. - Watch
/proc/mdstatto 100% before relaxing. - Confirm
[UU]and re-test monitoring alerts.
Where AI helps
mdadm --detail and /proc/mdstat output is terse and easy to misread when you’re stressed and a customer’s data is on the line. Pasting it into a model and asking “which physical disk failed, is the array still serving data, and what’s the exact safe recovery sequence” turns cryptic status into a clear next step. I keep a few Linux admin prompts for exactly these storage-recovery moments.
Software RAID has saved my data more times than I can count, but only because I treated monitoring as non-negotiable and rehearsed the recovery before I needed it. Build the array, wire up the alerts, and practice a disk swap on a test box once — so the real one is muscle memory.
Generated commands and configs are assistive, not authoritative. Always verify against your own systems before applying changes in production.
Download the Free 500-Prompt DevOps AI Toolkit
500 battle-tested, copy-paste AI prompts engineered by a senior systems engineer — every one with fill-in placeholders and safety/back-out notes. Drop your email and it's yours.
- 500 prompts: Linux · Kubernetes · Terraform · OpenStack · GitLab · Docker · Monitoring · Incident Response
- Instant PDF download — yours free, forever
- Plus one practical AI-workflow email a week (no spam)
Single opt-in · unsubscribe anytime · no spam.