nick: August 2010 Archives

crock reboot again

| | Comments (0)
Crock suffered from the "soft lockup detected on CPU#0!" so I rebooted it. Hopefully this is all fixed in xen 4 so we could upgrade the dom0s that have this problem.

rebooting cattle and girdle

| | Comments (2)
Cattle and girdle both have failed disks in their raid mirrors. We will need to reboot them this afternoon (in the next few hours) to replace the disks, because hot swap sata is broken in the sata_nv nvidia chipset driver. If all goes well everything will be back up by 5 pm pacific time and the raid will be rebuilding, so there will be slow io for some time.

Edit:  there was a problem;  we brought the wrong disk.  cattle and girdle were built back when we were using consumer grade drives like morons, and we sized the raid for 1.5tb disks... so the 1tb 'enterprise' disks won't work without a /whole lot/ of work.  We ended up hitting frys and buying two more 2tb 'consumer grade' disks that we'll just short-stroke down to 1.5tb, 'cause they didn't have any 1.5tb disks that were faster than 6000rpm... the last disks lasted north of a year;  if we can get another year out of these new disks, I'll be happy. 

Anyhow, cattle is coming back up as we speak;  when cattle is up we will shut down girdle and replace that drive, too. 

possible dish disk failure

| | Comments (0)
While the vps on dish were all restarting from the unclean reboot earlier today, one of the disks in the raid started having alot of sata link errors (following) and the load average became very high. After 1 hour, the sata link stopped having errors, the linux raid driver has started to rebuild the mirror, and the load is back to normal. We are going to run more smart tests on the drive and may need to replace it later this week, hopefully we can also find what was wrong with the sata link. There should be no data loss because the other disk in the mirror is still working well.

SCSI device sda: drive cache: write back
ata1.00: exception Emask 0x40 SAct 0x1 SErr 0x800 action 0x2
ata1.00: (irq_stat 0x40000008)
ata1.00: tag 0 cmd 0x60 Emask 0x49 stat 0x41 err 0x40 (internal error)
SCSI device sda: 1953525168 512-byte hdwr sectors (1000205 MB)
sda: Write Protect is off

About this Archive

This page is a archive of recent entries written by nick in August 2010.

nick: July 2010 is the previous archive.

nick: September 2010 is the next archive.

Find recent content on the main index or look in the archives to find all content.