An Unfortunate Series Of Events

First I got an email telling me of a fail event detected on my RAID device. Then 12 minutes later a second email this time from SMART monitoring complaining that one of my hard drives could not be opened. There is a difference between a failed to open device and a failed device situation.

So I verified these situations with the terminal. I verified the SMART monitoring report.

#smartctl -a /dev/questionable-device

I verified the status of the RAID array.

#mdadm –detail /dev/md0

I have a degraded array, one device is removed and faulty. The device is /dev/sdd. This is the 1TB hard
drive which is the oldest of the 4 I’m using.

I opened the black box. I pulled the SATA cables and examined them for anything out of the ordinary. I put them back again. I opened the terminal to check if I have access to the hard drives.

#smartctl -a /dev/all-the-drives

I added the faulty drive to the RAID array with:

#mdadm –manage –add /dev/md0 /dev/faulty-drive

I checked the status of the RAID array.

#cat /proc/mdstat

donato@desktop:~$ cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4] [linear] [multipath] [raid0] [raid1] [raid10]
md0 : active raid5 sdd[3] sdb1[0] sdc1[1]
1953260544 blocks super 1.2 level 5, 512k chunk, algorithm 2 [3/2] [UU_]
[=====>……………] recovery = 26.4% (257972512/976630272) finish=104.2min speed=114919K/sec
bitmap: 6/8 pages [24KB], 65536KB chunk

unused devices:

I guess this time it’s for real. The RAID array is rebuilding itself. Back to my monitoring station then.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s