double disk failure on crock

| | Comments (1)
So crock's raid looks like it has had 2 disks fail from the same mirror, which is quite horrible. /proc/mdstat shows this now
Personalities : [raid1] 
md2 : active raid1 sdc2[1] sdb2[0]
      477901504 blocks [2/2] [UU]
      
md1 : active raid1 sdd2[2](F) sda2[0]
      477901504 blocks [2/1] [U_]
      
md0 : active raid1 sdd1[4](F) sdc1[2] sdb1[1] sda1[5](F)
      10482304 blocks [4/2] [_UU_]
      
unused devices: <none>
and so sda1 has failed from md0 and so sda2 should probably also fail from md1 but that wouldn't leave any disks in md1 at all! I'm going to go to mpt and take a spare disk there, then see about trying to replace sdd :-/ I may end up having to do an offline recovery or something though :( I'm going to power off crock until I get there and try to recover in rescue mode.

Update at 20:27: After booting up in single user mode, I added a new drive to the raid and it was able to fully recover! At least according to the raid rebuild, we still have to see the results of fscks on the users filesystems. Before booting up multiuser again, I'm rebuilding the raid onto a second new drive as well, so when we boot it up multiuser it will be completely set. At the current estimate, that will be done in 42 minutes. -Nick

Update at 21:29: The raid has completely finished rebuilding with 4 good drives, and guests are starting up. Let us know if you have any data loss! We will also be giving a free month to all users on crock. Thanks, Nick

Update at 23:14: crock crashed because I didn't fix the ipv6 multicast problem correctly, but now it is working. I think I'm done here, I'm not being careful enough about this anymore. Somehow ipv6 is working also without having set the ports to be multicast routers.

1 Comments

There has been talk in the past about moving VPSs off of crock and onto more stable hardware/OS. Is there anything in the works with respect to this?

Leave a comment

About this Entry

This page contains a single entry by nick published on July 22, 2012 3:58 PM.

network problem at rippleweb was the previous entry in this blog.

lozenges crash is the next entry in this blog.

Find recent content on the main index or look in the archives to find all content.