network outage at SVTIX again

| | Comments (2)
I think it lasted about 20 minutes.   I believe it was an upstream problem (though this is not confirmed at the moment, and boy, won't I be embarrassed if it turns out to be my router crashing.)    This upstream was pretty good for the first year or so, but they've been getting less reliable.  I guess we need to quit talking about it and build our own BGP router and get a secondary upstream.  

I've been holding back on this project just 'cause nick and I don't have a lot of experience running BGP;  we're announcing some swamp one of our customers has as a start, but even that hasn't been without hiccups.   Generally speaking, I would think that leaving it to my upstream would result in a more reliable system,  but of late, that hasn't been the case, so I suppose we need to roll our BGP systems in to production.  


edit:  here is what our upstream had to say:


"EGIHosting.com - Support" writes:

Hi Luke,
There was an emergency maint that was done on the BACKBONE fiber going to SJ DC.  Everything is done and back up normal now.


--

they've been blaming most of the recent outages on XO, who apparently provides them point to point links to whoever they actually buy bandwidth from.

I need to find out who can get me transit at SVTIX without going through those (apparently fragile) XO lines.

2 Comments

Watch out with the fiber - it's really easy to think you have more redundancy than you do.

A datacenter provider I've worked with (nameless, of course) thought they had two fully redundant fiber circuits (self healing SONET, etc). Instead, they had one, because the two fiber providers had swapped pairs to create the loop, each building out half, and hadn't told them. They were very surprised when both did maintenance the same night that was only supposed to impact one side of a redundant loop...

I've also seen where provider B claimed to be on totally different infrastructure than A, but actually used A for point to point transport.

(I'm not saying not to pay for redundancy, just to make sure you actually get it... it's harder than you would think, telcos like to play games, etc)


We have a setup there @ svtix, and use EGI as one of our uplinks. We do have occasional hiccups with EGI but for the most part they're pretty solid and their support has been pretty responsive for us.

We have a second link up to Cogent using BGP - they have transit in the building and it has been equally as reliable as EGI. (That is to say, a few small outages here and there). The plus side is that the two links have been unreliable at different times, so we've stayed up. :-)

I believe NTT has a lot of transit in the building but it is probably significantly more expensive. IIRC we got Cogent for high single digits per megabit.

Feel free to contact me via the email address in my account (my username at gmail.com) if you'd like to chat about stuff at svtix - we've been there for going on three years now.

Leave a comment

About this Entry

This page contains a single entry by luke published on February 18, 2011 12:59 AM.

hamper reboot was the previous entry in this blog.

knife reboot is the next entry in this blog.

Find recent content on the main index or look in the archives to find all content.