I am replacing Table tonight

| | Comments (0)
from the email I just sent table users:

Your vps is on a piece of hardware that we call table;  it's a 6 core
socket c32 opteron with 16GiB ram.   Ever since the power outage at he.net
Fremont, it has rebooted once every 24-48 hours.   On a hunch, we replaced
the power supply, but this just delayed the expected reboot another 12
hours; just long enough to let us think we had solved the problem.

Anyhow, there's not much more I can do with the server online, so
here is what I'm going to do.  I'm pulling one of my spare 8 core dual-socket
nehalam xeon boxes out of the spares pool, and putting all the spare ram I
can in it (I have 2GiB modules coming out my ears)  so this server only has
12GiB ram, but as there is 5GiB free on table, this should work just fine.
This new server has considerably more horsepower than the old one.

The plan tonight is to gracefully shut down table, then to swap the
drives in to this new server, then take table out for better testing.
Assuming that the problem is something other than the hard drives or the
data on the hard drives, this should solve our problem.

Now, I know the reliability of table has been completely unacceptable,
so in Afton to the 1/4 month of credit everyone on prgmr.com servers
at he.net Fremont is getting, all table users will get another free month.
I understand this doesn't make up for the problem, but consider it an
apology.


edit: table is back up, I'm bringing up the xendomains as we speak.

edit: all domains are back up.  complain loudly if yours is still broken. 

Leave a comment

About this Entry

This page contains a single entry by luke published on May 13, 2011 6:23 PM.

table outage was the previous entry in this blog.

It has begun is the next entry in this blog.

Find recent content on the main index or look in the archives to find all content.