January 2011 Archives

replacing a bad disk in knife

| | Comments (0)
it's the old nvidia mcp55 chipset, so this involves a reboot.   

edit: this is still in progress... see http://twitter.com/prgmrcom  for exact timeline

edit: we are now done.   everyone on knife is back up.  expect degraded disk performance for the next day or so as the RAID rebuilds.  

migration off boar

| | Comments (1)
boar.prgmr.com had a bad disk, and now it is its time to die. There are only 6 customers left on boar, and the prgmr.com website itself. I'm going to move the web server vps first, then the other domUs to coral.prgmr.com, but this blog is on a different vps and won't be affected anyway. The disk allocations on boar are relatively big so it will take a few hours for each vps, but it should be done today. Each customer will also be emailed before their migration starts. I will also update the blog here when the migration is complete.

edit:  nick finished this some days ago.  Boar is ready for the test lab, a donation, or scrap.  
which can cause your resolver to take a long time to time out.  I've pestered he.net support.  If it's affecting you right now, remove nameserver  from your /etc/resolv.conf and use the backup.  

edit:  he.net support got back to me, they said:

Hi Luke,

Thank you for bringing this to our attention.

One of our system administrators has resolved the issue.
Your customer should be able to use it to resolve.

Thank you for your patience,

hydra shutdown and boar reboot

| | Comments (0)
We've moved everybody off hydra that we can while its still online, and there is 1 customer we are going to have to do an offline recovery for, then hydra will be shutdown for good. We're also going to reboot boar to replace a disk, and we're planning to move people off of it soon also.

We're done with hydra.  we lost 3.5 megabytes of data from one customer's disk.  The customer has been compensated with more disk and double the ram at the same price.   
hydra is having some problems;  we're moving everyone off to a new server; the new server that we were going to add new users to, so ordering will be down a while longer/

council network outage

| | Comments (0)
So council ran out of memory in the dom0, it didn't have any swap setup, and so the network stopped working. When I looked on the serial console, setting the peth0 interface down and up fixed the network, then I saw the memory error in dmesg and added swap space. Let us know at support@prgmr.com if there are any more problems. The downtime was about 2 hours.

mares ethernet driver

| | Comments (0)
Mares was having the peth0: too many iterations (6) in nv_nic_irq. problem so I reloaded the forcedeth driver with a higher max_interrupt_work setting. I also need to make a wiki page documenting the steps for this (and re-adding an interface to the bridge).