stables and birds going down for update and move

| | Comments (15)
they are in one of our supermicro 2 in 1u units

15 Comments

hm. something up with the order of our init scripts. on stables, ssh is killed before the xendomains stop runs, but on birds, it is killed after (much more convenient, I think) I will need to document this later.

birds is good and down. stables is taking it's time timing out domains that don't 'xm save' - nothing I can do as it killed my ssh, and the console getty. I am going to give it a bit longer, as we're still waiting for hydra as well

I must be on one of these. Just sent an email to support.

thanks for the update... Guess I should have reloaded before I posted my comment. Sorry!

birds is coming back up.

birds is up. I'm rebuilding it's raid on a fresh disk, so IO will suck more than usual for a while.

stables is finally up. uh, yeah. I'm going to have to investigate how I screwed that one up. the xen domains are being restored as we speak.

ok, everyone should be back up now. Looks like I owe some SLA credits. sorry. I'll deal with that later. Please let me know if you are still not up.

sachalayatan.net/com hosted on sachalayatan2.xen.prgmr.com is down for about 15 hrs now. No response from support! :(

Today, my slice appears to be running (can log into stables, and even ssh to my domain hosted thereon), but web and eMail services seem to be down on my hosted domains. Apache and Dovecot appear to be up and running (CentOS), but not answering to requests for service. I know those are my responsibility to keep running, but something changed as a result of your upgrades last night, and offhand not sure what that might be. Wondered if you might have any idea as to what might cause this, given the changes recently made? I did reboot my slice, but that didn’t seem to change anything.

I have a deadline i need to meet today, so haven’t the time to look into the matter any further right now. It’ll probably be tomorrow before i can get to troubleshooting this, but thought i’d mention it in case you had any ideas off the top of your head.

I made a horrible mistake and networking is broken on stables. it should be back shortly

ok, that seemed to do it. the chain INPUT firewall rules (to protect the dom0) where applied to the chain FORWARD policy (and forward should only have antispoof rules

ok, that seemed to do it. the chain INPUT firewall rules (to protect the dom0) where applied to the chain FORWARD policy (and forward should only have antispoof rules.) the real problem is that I didn't test things well enough before I left (at least in part because we didn't give ourselves enough time, so we were doing at 2am what we planned on doing at 2pm.)

That fixed it. Thanks!

A firewall problem had crossed my mind, but i knew i hadn’t changed *my* settings/rules, so the solution eluded me. I hadn’t considered it could be the firewall rules of dom0, but that certainly makes sense in retrospect. Glad you got it fixed. Many thanks.

my vps was slow and running commands like htop wouldn't run properly so I rebooted then had lots of messages like
[3356869.625208] clocksource/0: Time went backwards: ret=d8aa6af4a49 delta=-24471617390611268 shadow=d8a774c474e offset=2f63c283
[3356869.625208] __ratelimit: 326 messages suppressed
[3356869.625208] clocksource/0: Time went backwards: ret=d8bd1914a59 delta=-24471612376189748 shadow=d8ba1531622 offset=303ea347
[3356869.625208] __ratelimit: 203 messages suppressed
[3356869.625208] clocksource/0: Time went backwards: ret=d8d018d0b88 delta=-24471607276194309 shadow=d8ccb595eb0 offset=36341d19
[3356869.625209] __ratelimit: 218 messages suppressed
[3356869.625209] clocksource/0: Time went backwards: ret=d8e255ff251 delta=-24471602380202812 shadow=d8df55ff06a offset=3000774a
[3356869.625209] __ratelimit: 242 messages suppressed
[3356869.625209] clocksource/0: Time went backwards: ret=d8f4ec967ee delta=-24471597390458271 shadow=d8f1f66a42b offset=2f633d55
[3356869.625209] __ratelimit: 269 messages suppressed
[3356869.625209] clocksource/0: Time went backwards: ret=d90792f0999 delta=-24471592384187380 shadow=d90496d7d38 offset=2fc209c8
[3356869.625210] __ratelimit: 229 messages suppressed

in the prgmr console and the date was set to jan 10th

a hard shutdown fixed that though

Leave a comment