1. Check RAID Status
Use the following command to inspect the current state of the RAID:
sudo mdadm --detail /dev/md0
State : clean, degraded
Active Devices : 1
Failed Devices : 1

2. Mark and Remove the Faulty Disk
If the faulty disk is still present but failing, run:
sudo mdadm --fail /dev/md0 /dev/nvme1n1p1
sudo mdadm --remove /dev/md0 /dev/nvme1n1p1
If the disk has already been physically removed, you can skip this step.
3. Replace the Faulty Disk
Power down the server (if required), physically remove the faulty disk, and install a new one. Boot back into the system.
4. Partition the New Disk
RAID 1 mirrors data inside partitions, not the partition layout itself. The new disk must match the layout of the healthy one.
Clone the partition table from the working disk:
sudo sgdisk -R=/dev/nvme1n1 /dev/nvme0n1
sudo sgdisk -G /dev/nvme1n1
-R=/dev/nvme1n1 /dev/nvme0n1: Copies the partition table.
-G: Regenerates the disk GUID to avoid conflicts.
Check the layout:
lsblk
Make sure /dev/nvme1n1p1 exists and matches the partition size of /dev/nvme0n1p2.
5. Add the New Disk to the RAID Array
Once partitioned, re-add the new disk:
sudo mdadm --add /dev/md0 /dev/nvme1n1p1
6. Monitor the Rebuild Process
Use this command to check the rebuild progress:
cat /proc/mdstat

7. Confirm Rebuild Completion
When the rebuild is done, confirm the RAID status:
sudo mdadm --detail /dev/md0
Expected output:
State : clean
Active Devices : 2
Failed Devices : 0

Optional: Update mdadm Config for Boot Persistence
Run the following to ensure mdadm config is saved for future boots:
sudo mdadm --detail --scan | sudo tee /etc/mdadm/mdadm.conf
sudo update-initramfs -u
With these steps, you’ve successfully replaced a failed RAID 1 disk and restored redundancy.


