We needed to reboot whetstone this morning because of
BUG: soft lockup detected on CPU#0!

Call Trace:
 <IRQ> [<ffffffff8025894a>] softlockup_tick+0xce/0xe0
 [<ffffffff8020df6c>] timer_interrupt+0x3a8/0x402
 [<ffffffff80258c34>] handle_IRQ_event+0x4e/0x96
 [<ffffffff80258d20>] __do_IRQ+0xa4/0x105
 [<ffffffff8020bd6c>] do_IRQ+0x44/0x4d
 [<ffffffff80351f4c>] evtchn_do_upcall+0x19e/0x256
 [<ffffffff80209d8e>] do_hypervisor_callback+0x1e/0x2c
 <EOI> [<ffffffff8035d93e>] show_rd_sect+0x0/0x68
 [<ffffffff802ee0bc>] __read_lock_failed+0x8/0x14
 [<ffffffff803494de>] get_device+0x17/0x20
 [<ffffffff804024cd>] .text.lock.spinlock+0x53/0x8a
 [<ffffffff8035d965>] show_rd_sect+0x27/0x68
 [<ffffffff802be588>] sysfs_read_file+0xa5/0x12c
 [<ffffffff8028031c>] vfs_read+0xcb/0x171
 [<ffffffff802806fb>] sys_read+0x45/0x6e
 [<ffffffff802097b2>] tracesys+0xab/0xb5

We have seen this before on some of our other dom0s so we're planning to upgrade them eventually to xen 4 if they have this problem. The downtime lasted 6 hours, users on whetstone will get a free month.


we'll be getting some debug help on this one shortly.

Mainly during early attempts (think RHEL/CentOS 5.0 and 5.1, default Xen) at doing 32-on-64 virtualization.

I think our fix at the time was "don't do that".

to be fair, 32 on 64 wasn't supported until a few revs in, if I remember right.
I'm certain it wasn't supported on rhel 5.0 (I mean, you could do it, but it was considered unstable/unsupported.) It's since become much more, uh, mainstream.

Yeah, I know it wasn't supported, but we were trying it anyway as a (failed) experiment.

I think it's become pretty mainstream for small stuff, though anything "serious" I'm dealing with these days is 100% 64-bit.

(but 32 makes an awful lot more sense if you're using a 256MB domU as a backup MX/DNS or similar)

