luke: August 2010 Archives

horn froze up this morning. it was down for about 3 hours before I rebooted it. Unfortunately, I'm an asshole and didn't remember that horn and chariot are both in the same chassis, so by yanking power to horn, I also yanked power to chariot. So chariot also got an unclean reboot (though, as it wasn't frozen up, total downtime there was more like 15 -30 minutes, depending on what order your domain is started in.

Everyone should be back up at this point

BUG: soft lockup detected on CPU#0!

Call Trace:
  [] softlockup_tick+0xce/0xe0
 [] timer_interrupt+0x3a8/0x402
 [] handle_IRQ_event+0x4e/0x96
 [] __do_IRQ+0xa4/0x105
 [] do_IRQ+0x44/0x4d
 [] evtchn_do_upcall+0x19e/0x256
 [] do_hypervisor_callback+0x1e/0x2c
  [] show_rd_sect+0x0/0x68
 [] __read_lock_failed+0x8/0x14
 [] get_device+0x17/0x20
 [] .text.lock.spinlock+0x53/0x8a
 [] show_rd_sect+0x27/0x68
 [] sysfs_read_file+0xa5/0x12c
 [] vfs_read+0xcb/0x171
 [] sys_read+0x45/0x6e
 [] tracesys+0xab/0xb5

I will be tracking my debugging process here. (as of this moment, the server has been rebooted, and all domains should be back within 10 minutes or so.)

everyone ought to be back up now, please complain to support@ if you still have issues.

Edit: we're now having a 'infinite retry' disk error

SCSI device sda: drive cache: write back
ata1.00: limiting speed to UDMA/16
ata1.00: exception Emask 0x40 SAct 0x1 SErr 0x800 action 0x2
ata1.00: (irq_stat 0x40000008)
ata1.00: tag 0 cmd 0x60 Emask 0x41 stat 0x41 err 0x4 (internal error)
SCSI device sda: 1953525168 512-byte hdwr sectors (1000205 MB)
sda: Write Protect is off
SCSI device sda: drive cache: write back
ata1.00: limiting speed to PIO4
ata1.00: exception Emask 0x40 SAct 0x1 SErr 0x800 action 0x2
ata1.00: (irq_stat 0x40000008)
ata1.00: tag 0 cmd 0x60 Emask 0x41 stat 0x41 err 0x4 (internal error)
end_request: I/O error, dev sda, sector 603497953
SCSI device sda: 1953525168 512-byte hdwr sectors (1000205 MB)
sda: Write Protect is off
SCSI device sda: drive cache: write back
ata1.00: limiting speed to PIO3
ata1.00: exception Emask 0x40 SAct 0x0 SErr 0x800 action 0x2
ata1.00: (irq_stat 0x40000001)
ata1.00: tag 0 cmd 0x24 Emask 0x41 stat 0x41 err 0x4 (internal error)
SCSI device sda: 1953525168 512-byte hdwr sectors (1000205 MB)
sda: Write Protect is off
SCSI device sda: drive cache: write back
which is weird, as I'd bet money that's an 'enterprise grade' drive that ought to fail straight out rather than looping like that. I'm heading down now.

About this Archive

This page is a archive of recent entries written by luke in August 2010.

luke: July 2010 is the previous archive.

luke: September 2010 is the next archive.

Find recent content on the main index or look in the archives to find all content.