hang/unclean reboot of coat

| | Comments (0)
Not sure what the problem is;  will troubleshoot more after sleep.

Note, the reboot didn't help.    I'm booting into non-xen and poking around.  will upgrade the kernel and see if that helps. 


14:10 <@prgmrcom> see, it's not packet loss.  It's... something slowing down
                  coat so much that logins on the serial console time out.
14:11 <@prgmrcom> fortunately I ssh'd in before that... but it's not doing me a
                  whole hell of a lot of good.
14:11 <@prgmrcom> typing on that link is also really slow.  Top isn't coming up.
14:12 <@prgmrcom> I mean, I type 'top\n'  and it types t....o....p....   and
                  then it sits there. 
14:12 <@prgmrcom> Cpu(s):  2.8%us,  3.5%sy,  0.0%ni, 32.3%id,  5.5%wa,  0.0%hi,
                  55.6%si,  0.3%st
14:13 <@prgmrcom> top - 06:12:23 up 36 min,  2 users,  load average: 42.76,
                  38.79, 29.70
14:13 <@prgmrcom> huh.
14:13 <@prgmrcom> the load is terrible, but 32% id isn't terrible. 
14:13 <@prgmrcom> Cpu(s):100.0%us,  0.0%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi,
                  0.0%si,  0.0%st
14:13 <@prgmrcom> oh
14:13 <@prgmrcom> that's more what I expect
14:13 <@prgmrcom>  3300 root      17   0  230m  25m 1664 S 99.9  2.5   0:21.62
                  xend              
14:13 <@prgmrcom> hm?
14:14 <@prgmrcom> weird


So, uh, I found one domain using a /lot/ of soft interupts.   I have disabled that domain and we are back (except for that user, who I have emailed.  )

15:29 < LowRadio> taking a long time to boot, i guess everyone else vps is booting also
15:29 <@prgmrcom> fuck.   I feel really uncomfortable, 'cause I don't really understand what was
                  going on.  I mean, /proc/interrupts was incremetning for that guy's vm... but
                  that much?  I don't know.  
15:29 < LowRadio> luke I would'nt know
15:30 <@prgmrcom> but yeah, disable that vm and the problem goes away and the system is as it
                  should be (which is kinda slow;  it's an old box, it's rebuilding a RAID, and
                  everyone is booting at once, so yeah.)
15:31 < LowRadio> that is odd that vm would effect the whole system
15:31 <@prgmrcom> not odd... very bad.  
15:31 <@prgmrcom> but I don't really understand how interupts work.

Leave a comment

About this Entry

This page contains a single entry by luke published on December 13, 2012 5:39 AM.

extended network outage for 71.19.154/24 due to me being a tired idiot was the previous entry in this blog.

looks like that point to point link is down again is the next entry in this blog.

Find recent content on the main index or look in the archives to find all content.