First I got an email telling me of a fail event detected on my RAID device. Then 12 minutes later a second email this time from SMART monitoring complaining that one of my hard drives could not be opened. There is a difference between a failed to open device and a failed device situation.
So I verified these situations with the terminal. I verified the SMART monitoring report.
#smartctl -a /dev/questionable-device
I verified the status of the RAID array.
#mdadm –detail /dev/md0
I have a degraded array, one device is removed and faulty. The device is /dev/sdd. This is the 1TB hard
drive which is the oldest of the 4 I’m using.
I opened the black box. I pulled the SATA cables and examined them for anything out of the ordinary. I put them back again. I opened the terminal to check if I have access to the hard drives.
#smartctl -a /dev/all-the-drives
I added the faulty drive to the RAID array with:
#mdadm –manage –add /dev/md0 /dev/faulty-drive
I checked the status of the RAID array.
donato@desktop:~$ cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4] [linear] [multipath] [raid0] [raid1] [raid10]
md0 : active raid5 sdd sdb1 sdc1
1953260544 blocks super 1.2 level 5, 512k chunk, algorithm 2 [3/2] [UU_]
[=====>……………] recovery = 26.4% (257972512/976630272) finish=104.2min speed=114919K/sec
bitmap: 6/8 pages [24KB], 65536KB chunk
I guess this time it’s for real. The RAID array is rebuilding itself. Back to my monitoring station then.