nick: July 2012 Archives

lozenges crash

| | Comments (0)
lozenges has crashed now for the second time tonight also, after crashing while trying to start up the vps, so I booted it up in single user mode. The raid is trying to resync, hopefully it will be able to boot up again after the raid is sorted. I'm looking for clues to a bad disk as well.

Update at 22:38: the raid finished resyncing in single user mode, and guests are booting up again. I hope it sticks this time, if not I'm guessing lozenges has some sort of hardware failure.

double disk failure on crock

| | Comments (1)
So crock's raid looks like it has had 2 disks fail from the same mirror, which is quite horrible. /proc/mdstat shows this now
Personalities : [raid1] 
md2 : active raid1 sdc2[1] sdb2[0]
      477901504 blocks [2/2] [UU]
md1 : active raid1 sdd2[2](F) sda2[0]
      477901504 blocks [2/1] [U_]
md0 : active raid1 sdd1[4](F) sdc1[2] sdb1[1] sda1[5](F)
      10482304 blocks [4/2] [_UU_]
unused devices: <none>
and so sda1 has failed from md0 and so sda2 should probably also fail from md1 but that wouldn't leave any disks in md1 at all! I'm going to go to mpt and take a spare disk there, then see about trying to replace sdd :-/ I may end up having to do an offline recovery or something though :( I'm going to power off crock until I get there and try to recover in rescue mode.

Update at 20:27: After booting up in single user mode, I added a new drive to the raid and it was able to fully recover! At least according to the raid rebuild, we still have to see the results of fscks on the users filesystems. Before booting up multiuser again, I'm rebuilding the raid onto a second new drive as well, so when we boot it up multiuser it will be completely set. At the current estimate, that will be done in 42 minutes. -Nick

Update at 21:29: The raid has completely finished rebuilding with 4 good drives, and guests are starting up. Let us know if you have any data loss! We will also be giving a free month to all users on crock. Thanks, Nick

Update at 23:14: crock crashed because I didn't fix the ipv6 multicast problem correctly, but now it is working. I think I'm done here, I'm not being careful enough about this anymore. Somehow ipv6 is working also without having set the ports to be multicast routers.

network problem at rippleweb

| | Comments (0)
There just seems to have been a network outage with Rippleweb who is our provider at the Herakles data center in Sacramento. don't have any information about what the problem was yet, but I was able to reach Rippleweb on the phone and they said they will tell us what happened after it is fixed.

network downtime

| | Comments (0)
We just had a short network downtime, when I restarted quagga after upgrading the package and it didn't start up properly. I'm still not sure why, but I reinstalled the old package and it started properly. We also now have all the customers who we moved from SVTIX to Market Post Tower now routed through the router at Market Post Tower. This way, when we have another provider at Market Post Tower the customers there will have a more direct route. We still need to install the newer version of quagga (it fixes a BGP security flaw). I will try upgrading quagga again tomorrow (at midnight PDT), we also need to work on getting the outgoing prefix list correct which caused a problem before. Hopefully if that causes more downtime also I will be able to figure out whats wrong tomorrow. Thanks.

About this Archive

This page is a archive of recent entries written by nick in July 2012.

nick: June 2012 is the previous archive.

nick: August 2012 is the next archive.

Find recent content on the main index or look in the archives to find all content.