nick: December 2011 Archives

taney ethernet going up and down

| | Comments (0)
Taney's ethernet link seems to be going up and down:
[13275237.374946] eth0: port 1(peth0) entering disabled state
Dec 27 07:11:36 taney kernel: [13275240.495729] e1000e: peth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: None
Dec 27 07:11:36 taney kernel: [13275240.496284] eth0: port 1(peth0) entering forwarding state
Dec 27 07:12:17 taney kernel: [13275281.374172] eth0: port 1(peth0) entering disabled state
Dec 27 07:12:20 taney kernel: [13275284.454919] e1000e: peth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: None
Dec 27 07:12:20 taney kernel: [13275284.455487] eth0: port 1(peth0) entering forwarding state
The log on the switch just says something similar, with no errors counted on either end.

Luke is going to Market Post Tower now to try replacing the ethernet cable.
Update: Luke plugged the ethernet into eth1 instead of eth0 and it seems to be fixed now. -Nick

rutledge raid rebuild

| | Comments (0)
Last night we replaced a hard drive in rutledge and the raid was rebuilding normally until the disk completely froze. I rebooted it and I'm letting the raid rebuild in single user mode now. When its done, I will update the blog here. Email support@prgmr.com if you have any questions. Thanks!

update 20111214 8:36AM PST:  Luke here.   the dang thing rebuilt, then rebuilt again.  I'm suspecting a bad drive.   Smart on the thing hangs, and it reports drive errors (that all have to do with smart)   So I don't have real solid evidence that the drive is bad, but no smart, if you ask me, is enough reason to trash the drive anyhow.  

Error 1 occurred at disk power-on lifetime: 0 hours (0 days + 0 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  04 51 01 37 00 00 00  Error: ABRT

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  b0 d6 01 be 4f c2 00 00      00:00:46.516  SMART WRITE LOG
  b0 d5 20 bf 4f c2 00 00      00:00:46.016  SMART READ LOG
  b0 d6 01 be 4f c2 00 00      00:00:46.006  SMART WRITE LOG
  b0 d5 01 bf 4f c2 00 00      00:00:45.517  SMART READ LOG
  b0 d6 01 be 4f c2 00 00      00:00:45.507  SMART WRITE LOG

edit at 20111214 10:30am PST:
the thing rebuilt successfully and was rebooted about an hour ago.  SMART tests now look good (at least short and conveyance tests.   Long test still has 10% to go; I'll update when that's done) 

I'm no longer at all sure it was a disk problem;  I've seen errors like this when it was rebuilding too fast (there's a /sys/ entry that lets you limit rebuilt speed, and we need to tweak that down next time.  Used to be it limited itself to something reasonable.) 

I'll update again when the smart error clears;  for now, the machine is up, and I don't expect any more reboots.

disk replacement on birds

| | Comments (0)
Birds has a bad disk that needs replacing, and its been making everything slow. People have been complaining about guests not being able to start after shutting down also, so hopefully this will work once the raid mirror is synced again with a new disk and running at full speed. If there are still problems, we will work on moving people off of birds to a newer system. Thanks!
Enhanced by Zemanta

About this Archive

This page is a archive of recent entries written by nick in December 2011.

nick: September 2011 is the previous archive.

nick: January 2012 is the next archive.

Find recent content on the main index or look in the archives to find all content.