Horn (and the VMs on it) will be rebooted shortly

| | Comments (6)
we've got to replace a disk. 


update:  it is coming back up now, restoring domains.  those of you running debian might need another reboot (I've had problems with debian save/restore)

update:  it looks like all domains but one came back up successfully (with a save/restore, no reboot) 

we'll be rebuilding the RAID now, so expect disk performance to suck for a while. 

Update 2010-03-11 02:28 PST:

wow, disk is sucking a lot more than we expected.  I am wondering if the remaining disk also has problems.   Either way, we've gotta wait for it to finish rebuilding, and that will take on the order of another 24 hours, according to my best estimate.   All users of horn will receive a month worth of credit.  

6 Comments

so, uh, looking at another box, if nobody is using it (e.g. if I don't bring up any Xen domains) it looks like a rebuild takes around 3-6 hours with healthy disks. So whatdya think? Next time I have a rebuild, should I take the system down for 3-6 hours, do the rebuild, then bring everyone back up?

The other option is to limit the disk build speed. it will slow you down less, but it will take longer (on the order of a week)

That's why I get wierd value from a ping :o
64 bytes from nuq04s01-in-f99.1e100.net (74.125.19.99): icmp_seq=1 ttl=57 time=0.000 ms
64 bytes from nuq04s01-in-f99.1e100.net (74.125.19.99): icmp_seq=2 ttl=57 time=0.000 ms

3-6 hours of downtime it's not that bad during the night.

wow, that is weird. and the reboot should not have caused that. uname -a? are you using debian/ubuntu? are you seeing 'time went backwards' weirdness in dmesg?

Yes I'm using Debian 64bits and I also have all those lines you mentioned :

mail:~# ping google.com
PING google.com (74.125.19.105) 56(84) bytes of data.
64 bytes from nuq04s01-in-f105.1e100.net (74.125.19.105): icmp_seq=1 ttl=57 time=0.000 ms

Mar 13 03:17:29 me kernel: [5200136.971506] __ratelimit: 457 messages suppressed
Mar 13 03:17:29 me kernel: [5200136.971506] clocksource/0: Time went backwards: ret=fd03dd4ef54c delta=-5892430425574250 shadow=fd03d439a92a offset=915cabb

Linux mail 2.6.26-2-xen-amd64 #1 SMP Thu Nov 5 04:27:12 UTC 2009 x86_64 GNU/Linux

The connexion is very unstable, i'm not able to ping or do something for a long period :/

Eno: a reboot should fix your problem. shutdown -r now.

hey, so I'd also like feedback about the performance now that the disk is done. One guy says his I/O is still unacceptable, but looking, he is using much more I/O than anyone else on the box, I think it's just his replication system catching up. (I did give him a refund, as he wanted to leave because of the problem. From talking to him, his use case was I/O bound anyhow, meaning a VPS was a poor match for his needs anyhow; if anyone else needs a refund for similar reasons, mail support.)

Leave a comment