April 2008 Archives

So, we were sitting around the prgmr.com world headquarters, when the subject of Catch-22 came up.  I found myself the only person in the room who'd actually enjoyed it, and had to defend my position relying on memory alone.

It went poorly.  But now, a couple of days later, I'm working with LVM mirroring, and I get:

# lvextend -r -v -L +1G lion_domU/cicero
Finding volume group lion_domU
Extending 2 mirror images.
Mirrors cannot be resized while active yet.

Okay, so I deactivate it.  No problem.

# lvchange -an lion_domU/cicero

And then try again.

# lvextend -r -v -L +1G lion_domU/cicero
Finding volume group lion_domU
Extending 2 mirror images.
Logical volume cicero must be activated before resizing filesystem

*sigh*  It's a good thing I can resize manually, otherwise I'd really be out of luck.
paypal subscriptions appear to no longer work. the user thinks s/he signed up, they get a subscription number, but paypal doesn't see it. Calling paypal, of course, is of no help whatsoever.

I'm attempting now to setup a FreeSide billing system as we speak, as that should give me more flexability anyhow. The paypal subscriptions are... incomplete.

mirroring LVM tested

| | Comments (0)
today we tested yesterday's entry.   I'm not going over the md stuff, as there are good guides for md all over the place, but restoring the LVM mirroring is a bit tricky.

once you reboot after removing the disk (as in our test) to make things work, you must 

vgreduce --remove-missing vg_name


vgchange -ay vg_name

this will get you back up and running.  

Then, once you have the disk replaced, to restore redundancy:

pvcreate /dev/newdiskpartition

vgextend vg_name pv_name

lvconvert -m1 /dev/vg_name/lv_name 

check status by typing 'lvs'  and looking at the 'copy percent' log
Let me describe to you how we're partitioning the current machines.  The goal is to have everything mirrored, so that we can survive an outage on our cheap and crappy SATA disks.  (I mean, they're Seagate and Maxtors.  Could be much worse.  But they're still very commodity.  You know how it goes.  Worse is better.)

Anyway, let's make a table.  All of the md devices are mirrored:

/dev/md0         /             10G
/dev/sd(1,0)     (swap)     (memsize)
/dev/md1         LVM PV  5G

The rest of each drive is an LVM PV.  The three PVs (one mirrored, two not) are then ganged into a VG.  (One of the first things that we do is create an LV to use for /var, sized the same as swap.  This is so that we can save the domUs at machine shutdown and restore them on reboot.  We might just make / bigger, though.)

When we create a domU, the backing store gets created something like this:

 # lvcreate -L 5G -m 1 vgname

Which puts each leg of the mirror on a separate disk, and the mirror log on /dev/md2.

This is fine and dandy.  One problem, though, is that GRUB isn't installed properly on both devices.  If drive 0 fails in a way that renders it invisible to the BIOS, the machine won't boot.  Here's the solution:

 grub> root (hd1,0)
 grub> install /boot/grub/stage1 (hd1) (hd1)1+15 p (hd0,0)/boot/grub/stage2 /boot/grub/grub.conf

Unfortunately, if the failed drive 0 is still visible to the BIOS, GRUB won't load at all.  My understanding of GRUB isn't complete enough to specify a stage2 relative to the boot sector, although it seems like the sort of thing that should be possible.

the hardware of prgmr.com.

| | Comments (0)
It's upgrade time at prgmr.com. Our new x86_64 server at the Freemont location, Lion, has an Intel core2quad q6600, 8Gb of brand-name unbuffered ECC ram and 2x1TB 7500RPM sata drives. The Xen kernel/Dom0 is x86_64, running the version of Xen 3.1 that comes with CentOS 5.1; 32-on-64 seems to work, so customers will have the choice of i386-PAE and x86_64.

we have a stack of intel SR1530AHLX chassis/motherboard combos. - I'm considering renting out whole servers. e-mail me if interested.

We went down last weekend due to a bad hard-drive;  all customers in the accounting system got e-mails about it... if you are a customer and didn't get an email, please complain loudly, as it means I don't have you in the accounting system.

The problem was resolved by monday;  if you are still having problems, again complain loudly.  lsc@prgmr.com is a good address to complain at. 

So, our servers are  co-located in Sacramento (at rippleweb. - I like them quite a lot for hosting 1U boxes- they are an especially good deal if you have high-density high-power boxes, as they use 208v power and don't charge extra for the power sucking older dual xeons that draw 100-200 watts per U)

So late Friday night a drive failed, and the way I have things setup, mirroring is optional and not-default, which means most customers don't.

Compounding matters, I striped the swap for the Dom0 across both drives (I mirrored everything else) so the box went down with the bad drive.

We did, however, manage to get the bad drive to spin up (after sticking it in the freezer for a while) so we should be back in business sometime late tonight.


Also, as I pointed out, these servers are in Sacramento. I am in sunnyvale. First, I login to the remote kvm setup and see what I can do in bios. Nothing; it can't even see the drive. So, I ask my friend Chris, the guy who is partnering with me on my xen book venture to take a look at it.

So, Chris drags the server back to his house. He plugs it into a new computer. No dice. He even swaps the circut board with a spare drive of the same model. Nothing. (the spare drive, with the circuit board from the bad drive, works fine) Clearly, we have a catastrophic failure in the drive itself. Violence wasn't helping, either. (sometimes drives suffering from stuction can be cured with concussive force) So, we think, why not try freezing it? people say freezing a drive can sometimes help if your bearings are failing; but it's never worked for me.

After several hours in the freezer, however, the drive spins up, and it stays up long enough for us to retrieve all the data. As Chris said, "freezing it. . . totally not myth."

bulk e-mail

| | Comments (0)
So today I got an automated abuse report (incidentally from http://junkemailfilter.com they seem to be up on things;  Marc Perkel answered my questions right quick, and the automated message was clear and had all the info I needed to track down the problem.) It Turns out a free trial customer (who is no longer a customer)  had a business of mailing 'opt-in' lists He even provided documentation of the sign up (But the message looked a lot like spam to me, and was blocked not because the reciever complained, but because it contained a link to a site that is blacklisted)  Because he had documentation of the double opt-in,  I'm not taking action against this customer, aside from terminating my business relationship with them, but I am changing the prgmr.com AUP to disallow all bulk mail (you can ask me for an exception if you want to run a mailing list)