reboot of black due to apparent disk problem

| | Comments (0)
so yeah, uh,

16:43 <+nb> INFO: task blkback.16.xvda:12340 blocked for more than 120 seconds.
16:44 <+nb> ata3.00: exception Emask 0x0 SAct 0x7 SErr 0x0 action 0x0
16:44 <+nb> ata3.00: irq_stat 0x40000008
16:44 <+nb> ata3.00: cmd 60/e8:08:e9:91:e5/00:00:08:00:00/40 tag 1 ncq 118784 in
16:44 <+nb>          res 41/40:00:04:92:e5/00:00:08:00:00/40 Emask 0x409 (media
            error) <F>
16:44 <+nb> ata3.00: status: { DRDY ERR }
16:44 <+nb> [Mon Jul 22 16:49:59 2013]ata3.00: error: { UNC }
16:44 <+nb> Jul 22 09:01:51 black kernel: ata3.00: exception Emask 0x0 SAct 0x7
            SErr 0x0 action 0x0
16:44 <+nb> SCSI device sdc: 3907029168 512-byte hdwr sectors (2000399 MB)
16:44 <+nb> sdc: Write Protect is off
16:44 <+nb> SCSI device sdc: drive cache: write back


16:51 < prgmrcom> nb
16:51 < prgmrcom> oh no
16:52 < prgmrcom> gonna reboot it
16:53 < prgmrcom> fuuuck.  and I paid for the expensive disks that aren't
                  sopposed to do that.  I'm pissed.




but yeah.  the upshot here is that one of our disks went bad... in a way that a disk half as expensive would be expected to go bad.   Not a good morning. 



SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed without error       00%     12687         -
# 2  Short offline       Completed without error       00%     12663         -
# 3  Short offline       Completed without error       00%     12641         -
# 4  Conveyance offline  Completed: read failure       10%     12618         188061143
# 5  Short offline       Aborted by host               10%     12617         -
# 6  Short offline       Completed without error       00%     12595         -
# 7  Short offline       Completed: read failure       10%     12570         188061143
# 8  Short offline       Completed without error       00%     12545         -
# 9  Short offline       Completed without error       00%     12523         -
#10  Short offline       Completed without error       00%     12499         -
#11  Extended offline    Completed without error       00%     12487         -
#12  Short offline       Completed without error       00%     12475         -
#13  Short offline       Completed without error       00%     12457         -
#14  Short offline       Completed without error       00%     12357         -
#15  Short offline       Completed without error       00%     12334         -
#16  Extended offline    Completed without error       00%     12320         -
#17  Short offline       Completed without error       00%     12310         -
#18  Short offline       Completed without error       00%     12287         -
#19  Short offline       Completed without error       00%     12264         -
#20  Short offline       Completed without error       00%     12241         -
#21  Short offline       Completed without error       00%     12221         -



so yeah, I thought I remembered smart errors on sdc (which was the problem in this case)  my plan was to leave the drive in until I bought a replacement, which was clearly a mistake.   yanking the drive and heading to central right now.  

Leave a comment

About this Entry

This page contains a single entry by luke published on July 22, 2013 10:02 AM.

Possible log messages about xvde and xvdf was the previous entry in this blog.

dao.prgmr.com disk hung; rebooting is the next entry in this blog.

Find recent content on the main index or look in the archives to find all content.