luke: April 2012 Archives

update on rehnquist

| | Comments (2)
well, it's down again, so I don't know what the heck is going on.  I'm going to swap to new hardware this evening (will involve a graceful shutdown) 

Note, until then, all new provisioning is on hold.

taking it down for reboot now.

Ugh.  that took way longer than it should have, but it's done now.  it's back.  sorry.   I need to test my netboot rescue images sometime when it's not an emergency (and I probably should have a backup rescue usb key on me, and all of that would not have been required if I had remembered to put the new driver in the initrd before swapping cards.) 

rhenquist rebooted again

| | Comments (0)
sorry, I should not have waited to replace that sata card.  I'm bringing the new one down right now.

rehnquist crash.

| | Comments (2)

sata_mv 0000:03:06.0: PCI ERROR; PCI IRQ cause=0x30000040

I rebooted it, it will be returning shortly.  Unless that error means something rather different than I think, I will be shutting down to replace the sata_mv card soon (pci-x card... used, I should not have used it.)

The sata_mv card in question is the older marvell supermicro 8 port sata card:

which I only used because it was all I could find;   the store was out of what I use on
some of the other rebuilt mcp55 servers like burger:

Update: rehnquist crashed again this morning, and I rebooted it. -Nick 11:36
We're going to be replacing sphinx with manticore.   It should be a matter of moving the cables; if we don't screw it up, it should be a matter of seconds.   worst case, 5 minutes downtime and we roll back.    It's the same quagga config, it's just better hardware.

And we're back.  It was about 5 minutes, but spread, which makes it worse.    around 30 seconds around 17:17 then around a minute at 17:23, then around two minutes around 17:53

We screwed up the vlan config;  we use a quagga software router, and the vlans are written in /etc/network/interfaces, while everything else is in quagga.   Being as we haven't rebooted the router in... a long time[1] this means that  we had a error in our interfaces file.  We rolled back, figured out the problem, fixed it, and rolled forward.

Anyhow, we're back online with a quagga box with a rather more powerful CPU (an E3-1220;  the full power 3.1ghz quad core version, not the dual core low power version I've been talking about using as a utility server) and we're keeping the old quagga server around just in case something horrible happens.  

[1]root@sphinx:~# uptime
 17:39:21 up 190 days, 17:47,  5 users,  load average: 0.00, 0.00, 0.00
Also note, the mac address of the router changed, so people that had statically routed to the link local address fe80::230:48ff:febc:a19a were broken until just now, when nick bound it to the new router.  Don't use that as the default gateway, please.   

About this Archive

This page is a archive of recent entries written by luke in April 2012.

luke: March 2012 is the previous archive.

luke: May 2012 is the next archive.

Find recent content on the main index or look in the archives to find all content.