rebooting halter to replace disks

| | Comments (0)
Halter has 2 failed disks. They are on different mirrors, so there is no data loss, but it really needs to be taken care of. Because halter doesn't have the ahci sata chipset, we need to reboot it to detect the new disks. We will check on vps that don't boot up by themself, but if you still have problems email Thanks! -Nick

we are attempting to fix this with the next kernel revision.  If nothing else, we'll be cycling that hardware out fairly soon anyhow.

edit at 1:01 :  guest domains are going down now.

edit at 2:09:   drives are replaced, rebuilding.   We will bring customers online in approx. an hour.  

Personalities : [raid1]                                                         
md1 : active raid1 sdc2[2] sdb2[1]                                              
      477901504 blocks [2/1] [_U]                                               
      [==>..................]  recovery = 13.4% (64153152/477901504) finish=65.6
min speed=105075K/sec                                                           
md2 : active raid1 sda2[2] sdd2[1]                                              
      477901504 blocks [2/1] [_U]                                               
      [==>..................]  recovery = 11.2% (53639488/477901504) finish=75.6
min speed=93412K/sec                                                            
this is one of the old servers that uses 2 raid1s that are striped with lvm rather than 1 raid10. raid10 rebuilds /much/ faster under load than the striped raid1 setup we used on halter and all older servers.  Considering that the thing lost two disks in as many weeks, we will keep it down until it's done rebuilding.     (the rebuild would take 10x-15x as long if we did it while customers were online)

edit at 2:27:  ignore my approx an hour comment.  current mdstat output:

Personalities : [raid1]                                                         
md1 : active raid1 sdc2[2] sdb2[1]                                              
      477901504 blocks [2/1] [_U]                                               
      [======>..............]  recovery = 34.2% (163703680/477901504) finish=93.
9min speed=55756K/sec                                                           
md2 : active raid1 sda2[2] sdd2[1]                                              
      477901504 blocks [2/1] [_U]                                               
      [======>..............]  recovery = 31.0% (148422272/477901504) finish=98.
2min speed=55877K/sec  

edit at 4:43:  the disks are finally done rebuilding and customers are coming back up.  

edit at 4:48:  halter is back, all users on halter are back.  

we need to give all you all credit.  

Leave a comment

About this Entry

This page contains a single entry by nick published on June 14, 2011 12:13 AM.

cauldron crash was the previous entry in this blog.

server move on Tuesday and Wednesday night is the next entry in this blog.

Find recent content on the main index or look in the archives to find all content.