How to replace a failed disk with Linux software RAID (mdadm)
Recently one of the disks died in a RAID1 setup which uses the excellent Linux software RAID (mdadm). Here is a quick overview of the steps to exchange the failed disk with a new one. Obviously you should replace the X in /dev/mdX with the proper number of your RAID device. The same applies to /dev/sddY. Replace it with the actual device and partition you are using. So don’t copy and paste these commands. All commands should be executed as root or with proper sudo rights.
Step 1 – mark broken disk as failed
sudo mdadm --fail /dev/mdX /dev/sddY
Step 2 – remove broken disk from array
madadm --remove /dev/mdX /dev/sddY
Step 3 – create partition on the new disk
You can partition the new disk manually using your favourite tool (fdisk, gparted, sfdisk, etc.). But it’s probably faster and less error prone to copy the partition from an existing RAID disk to the new disk. Here’s an example how to do that using sfdisk. Requirement: the disk must be smaller than 2TB and should not be using GPT.
sfdisk -d /dev/sdc | sfdisk /dev/sdd
Make sure that all the partitions that are part of the RAID set have the proper ‘fd’ ID (Linux RAID autodetect). And example to set the ID of partition 1, 2 and 3 on the new disk to ‘fd’:
for partition in 1 2 3; do sfdisk --change-id /dev/sdd $partition fd; done
Step 4 – add new disk to RAID array
mdadm --add /dev/mdX /dev/sddY
Step 5 – check status of RAID array
mdadm --detail /dev/mdX
Step 6 – increase speed of the rebuild
By default mdadm will rebuild the RAID array in the background. If you want to speed up the resync process you can change the values of:
/proc/sys/dev/raid/speed_limit_max
/proc/sys/dev/raid/speed_limit_min
For example to set the minimum resync speed to 150MB/s:
sudo echo 150000 >/proc/sys/dev/raid/speed_limit_min
You can watch the progress of the resync in a terminal with the following command:
watch -n 1 cat /proc/mdstat
Finally, for your next RAID setup also have a look at Partitionable RAID.