September 2014 Archives

As you may know, girdle and cattle (on  are the only boxes that are not in coresite santa clara or coresite san jose.     They are the only boxes I don't have 24x7 physical access to.   these are in Sacramento.

So yeah.   my pager went off;  now, usually when Sacramento goes offline, it's just a network issue, and so I go and ping the hosts, and they come up while I am pinging, so I go about my business.  

An hour or so later,  then my provider sent me an email:

"The reboot device that you are connected to did reset when we updated our configuration. On boot, it looks like there are some errors on the box. I have enabled the KVM so that you can take a look at the console."

I get that sick feeling in my stomach, you know, when you realized that you just ignored an important page. 

So, I login to girdle... it is mostly okay, but the xen packages got screwed up during the last upgrade.   srn unfucked them, then spent some time applying her network security patches, and that screwed things up because it had an old version of the xen networking scripts, but she is figuring it out. but girdle should be back up by the time this blog posts.  (note, people were up on girdle with broken network for a bit, but should be okay now.)    

Cattle?  Cattle on the other hand, somehow managed to completely screw itself in the power outage.  no ssh, no nothing.  I go and hit it from the girdle serial console,  and I'm in single-user mode.     While I'm trying to figure out what's going on, I reboot it into a non-xen kernel with a console=xvc0... which, of course, locks me out. 

The upshot?  Girdle is back for tonight, cattle is down until my provider wakes up.

The lessons here are

1. I need to make sure I have physical access to my stuff.  

I have been talking about moving out of forever now, I need to finish migrating everyone to new servers in coresite with new IPs.

2.  there is a reason for read-only friday;   as far as I can tell, my provider spent all day unfucking his broken rebooter, and then left me on a KVM, went home, turned off his pager and went to sleep.     This is why you don't start things when you are tired or when you are planning on leaving.   Sometimes things take longer than planned;  plan for this by giving yourself time. 


3. the first time hardware burns you, okay, fine. maybe you were using it wrong?  but the second or third time a rebooter causes an outage?  Defenestrate that shit. 

Edit:   we're back.   turns out my provider was still at the datacenter, just, you know, doing shit rather than answering his phone, so he rebooted cattle, and srn fixed it, so everyone should be back/coming back.

cattle / girdle down

| | Comments (0)
UPDATE 2014-09-27 02:52 -0700 PST: cattle is coming back up.  girdle will not need to be rebooted.

UPDATE 2014-09-27 02:11 -0700 PST:  Outbound network was broken on girdle because our customized xen scripts broke when upgrading xen did not overwrite the older standard script versions.  It should be fixed now.

UPDATE 2014-09-27 01:22 -0700 PST:  The link to our remote rebooter is giving a 404 and we have been unable to contact our provider.  In the meantime we have decided to bring up the guests on girdle.  Unfortunately this means guests on girdle will need to be rebooted again, hopefully in the near future.
Our provider for these two boxes "updated their configuration" and reset our machines unexpectedly.  They are physically both in the same box so it is not possible to externally reboot them separately.  girdle should be OK but cattle needs to be rebooted into a different configuration, so it will be at least a little while before both are back up.
Here is a link to the xen security problem response process:

upgrade your bash

| | Comments (0)
I've been busy.  On the upside?  our ansible setup mostly works now.   we got hit with the bash upgrade, and fortunately I noticed a few hours after the embargo was up, and I think had everything patched within a few hours:

and... yeah, then the next day?

I've upgraded all the infrastructure twice now.  

Now, you need to do the same thing.   at a minimum:

'yum upgrade bash' 

or on debian,

'apt-get update && apt-get install --only-upgrade bash'

if you are on something crusty and ancient like etch, you might need to build/patch bash for yourself.  the following worked for me

How do you test?  

try the following:

lsc@before-patch:~$ env x='() { :;}; echo vulnerable' bash -c echo

obviously, the above host is vulnerable.   after patching, it will look something like this:

lsc@after-patch:~$ env x='() { :;}; echo vulnerable' bash -c echo
bash: warning: x: ignoring function definition attempt
bash: error importing function definition for `x'

edit:  IMPORTANT  that only covers you from the first hole.  note, the

    for i in $(seq -f "%03g" 1 18); do

in that file.  change the 18 to 19 to get the latest patch

   for i in $(seq -f "%03g" 1 19); do

like that, and re-run it. 

test for the new patch:

lsc@host:~$ export X="() { (a)=>\\"
lsc@billing-internal:~$ bash -c 'echo date'
bash: X: line 1: syntax error near unexpected token `='
bash: X: line 1: `'
bash: error importing function definition for `X'
lsc@host:~$ cat echo
Fri Sep 26 21:38:49 UTC 2014

that means it's vulnerable.  So I re-compile, like I said, adding in patch 19, re-install and

lsc@host:~$ rm echo
lsc@host:~$ export X="() { (a)=>\\"
lsc@host:~$ bash -c 'echo date'
bash: X: line 1: syntax error near unexpected token `='
bash: X: line 1: `'
bash: error importing function definition for `X'
lsc@host:~$ cat echo
cat: echo: No such file or directory

About this Archive

This page is an archive of entries from September 2014 listed from newest to oldest.

July 2014 is the previous archive.

October 2014 is the next archive.

Find recent content on the main index or look in the archives to find all content.