July 2011 Archives

crock hung; rebooted

| | Comments (0)
coming back up now.  we will follow up after vacation.  

It does seem like crock has been having more than it's fair share of issues, so I will be digging deeper.  
So I went home and got another drive, I'm back at the co-lo and about to replace sda in coins.  there should be no downtime, but expect degraded performance.

Every 2.0s: cat /proc/mdstat                            Sat Jul 30 17:50:28 2011

Personalities : [raid1] [raid10]
md1 : active raid10 sdf2[4] sde2[2] sdb2[1] sda2[5](F) sdc2[6](F) sdd2[3]
      1048578048 blocks 256K chunks 2 near-copies [4/3] [_UUU]
      [>....................]  recovery =  0.1% (582400/524289024) finish=512.7m
in speed=17020K/sec

unless something goes very wrong, all you should notice is some degraded performance.  It's raid10, which seems to rebuild faster than our stripes of two raid1 sets, so hopefully it won't be as bad as usual.  

edit 15:50:   the raid is rebuilding.  

md1 : active raid10 sde2[4] sdd2[5](F) sdc2[2] sdb2[1] sda2[0]
      955802624 blocks 256K chunks 2 near-copies [4/3] [UUU_]
      [>....................]  recovery =  0.1% (555264/477901312) finish=315.1min speed=25239K/sec

 I note Coins also has a bad disk.   I will be replacing that one as soon as I can run back to the office and get another spare.   
Coins is one of the servers that is new enough that it has the AHCI sata controllers, so there's no reboot required, but it's old enough that I was still buying 1tb drives (four of them) and short-stroking them.  This should dramatically improve sequential performance, such as an unloaded raid rebuild, but I do not think it will have as dramatic an effect on random access.  Really, I should benchmark under controlled conditions.  

md1 : active raid10 sde2[4] sdb2[1] sda2[0] sdc2[5](F) sdd2[3]
      1048578048 blocks 256K chunks 2 near-copies [4/3] [UU_U]
      [>....................]  recovery =  0.1% (932864/524289024) finish=592.5min speed=14720K/sec