luke: February 2013 Archives

About the new data center setup

| | Comments (0)
As you may have heard, we're moving data centers.  

Right now?   we have two 3.84kw racks of prgmr.com/xen stuff at svtix, 250 stockton in san jose;  (we have 2x 1.9kw racks of quarter-rack co-lo there, too) 

We have two 3.8kw racks in suite 1460 at 55 s. market, rented through egihosting (they also rent us a 1 gigabit port with a 200Mbps commit, and a gigabit connection from 55 s. market to 250 stockton.) 

We have 1 3.3Kw rack in suite 1435 at 55 s. market, direct with coresite.  

(we also have two servers with rippleweb in sacramento) 

That's it.

So, now we're moving to coresite santa clara;   We're getting 4x 5kw racks there, and moving out of everything but the one 3.3kw rack at 55 s. market; (and the two  this will actually decrease our costs (slightly.  Not by a lot)  and it will significantly increase our capacity. 

This also will put us in Santa Clara, which means we can sell bandwidth to anyone on the SVP fiber ring.     

We have an upcoming project with unixsurplus.com that will likely result in inexpensive (and unlike the prgmr.com 'servers of opportunity' stuff, actually available in a timely manner)   dedicated servers.  

We need to be out of the old stuff by 2013-05-13.  Emails will be going out this week to users that will be moved. 


Note, the new switching infrastructure should be considerably better;  I'm moving to switches with 10Gbe uplinks, so the problems we've had in the past with the network falling over when we went over 500Mbps will go away.   We've got a 10Gbe cogent port (5G commit) at coresite santa clara, 1gbe from coresite santa clara to 55 s. market, and a 1Gbe he.net port at 55 s. market that we are keeping.    (there is actually some trouble moving the cogent port... but I'm sure that will be worked out before it becomes a huge deal.)

About the new datacentere setup

| | Comments (0)
As you may have heard, we're moving datacenteres.  

Right now?   we have two 3.84kw racks of prgmr.com/xen stuff at svtix, 250 stockton in san jose;  (we have 2x 1.9kw racks of quarter-rack co-lo there, too) 

We have two 3.8kw racks in suite 1460 at 55 s. market, rented through egihosting (they also rent us a 1 gigabit port with a 200Mbps commit, and a gigabit connection from 55 s. market to 250 stockton.) 

We have 1 3.3Kw rack in suite 1433 at 55 s. market, direct with coresite.  

(we also have two servers with rippleweb in sacramento) 

That's it.

So, now we're moving to coresite santa clara;   We're getting 4x 5kw racks there, and moving out of everything but the one 3.3kw rack at 55 s. market; (and the two  this will actually decrease our costs (slightly.  Not by a lot)  and it will significantly increase our capacity. 

This also will put us in Santa Clara, which means we can sell bandwidth to anyone on the SVP fiber ring.     

We have an upcoming project with unixsurplus.com that will likely result in inexpensive (and unlike the prgmr.com 'servers of opportunity' stuff, actually available in a timely manner)   dedicated servers.  

We need to be out of the old stuff by 2013-05-13.  Emails will be going out this week to users that will be moved. 

Rehnquist is back up

| | Comments (0)
It was smartctl crashing the box.

04:31 <@prgmrcom> looks like it's the -data  (while using a marvell sas card)
                  that causes the panic
04:33 <@prgmrcom> -d marvell doesn't work either, but no -d at all seems to be
                  okay on the command line.  lets see if it crashes the box.
04:34 <@prgmrcom> while technically incorrect (-d ata, that is)  it shouldn't
                  crash the thing, and it didn't crash the thing before the
                  upgrade.
04:35 <@prgmrcom> okay, it didn't crash.  booting xen, I guess.
04:42 <@prgmrcom> replacing disk too
04:45 < srn_prgmr> nb: it looks like smartd.conf should be edited to remove
                   explicit references to "-d" before performing a yum upgrade
04:47 < Bugged> that seems failure-prone
04:48 < srn_prgmr> Well, that's what was causing the crash
04:48 < srn_prgmr> I haven't been able to find a related ticket

further breakage on rehnquist

| | Comments (0)
Update:
<prgmrcom> [18:25:50] huh... is my system trying to use dmraid rather than mdadm?
<prgmrcom> [19:25:14] I am suspecting hardware.
<prgmrcom> [19:25:53] leaving
<ryk> [19:29:50] prgmrcom: the list got the email the 1st time
<E6Dev> [20:07:17] So is there any update on rehnquist then?
<prgmrcom> [21:14:58] ugh.
<prgmrcom> [21:15:02] I'm at the colo
<prgmrcom> [21:15:07] messing with rehnquist now.
<prgmrcom> [21:15:17] fucking spare hardware won't boot at all.
<prgmrcom> [21:15:35] so I'm reduced to in-field screwing around with hardware, always a bad idea
<prgmrcom> [21:15:44] always.

Starting puppet: [  OK  ]
Starting smartd: Unable to handle kernel paging request at 0000000000002e38 RIP:
 [<ffffffff880ddc75>] :libata:ata_find_dev+0x24/0x73
PGD 0
Oops: 0000 [1] SMP
last sysfs file: /devices/system/cpu/cpu0/topology/thread_siblings
CPU 7
Modules linked in: ipt_MASQUERADE iptable_nat ip_nat bridge lockd sunrpc cpufreq_ondemand powernow_k8 freq_table mperf ip_conntrack_netbios_ns ipt_REJECT xt_state ip_conntrack nfnetlink iptable_filter ip_tables ip6t_REJECT xt_tcpudp ip6table_filter ip6_tables x_tables be2iscsi ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp bnx2i cnic ipv6 xfrm_nalgo crypto_api uio cxgb3i libcxgbi cxgb3 8021q libiscsi_tcp libiscsi2 scsi_transport_iscsi2 scsi_transport_iscsi dm_multipath scsi_dh video backlight sbs power_meter i2c_ec dell_wmi wmi button battery asus_acpi acpi_memhotplug ac parport_pc lp parport sg k10temp i2c_nforce2 hwmon pcspkr serio_raw i2c_core amd64_edac_mod edac_mc e1000e tpm_tis tpm tpm_bios dm_snapshot dm_zero dm_mirror dm_log dm_mod sata_mv raid10 shpchp mvsas libsas libata scsi_transport_sas sd_mod scsi_mod raid1 ext3 jbd uhci_hcd ohci_hcd ehci_hcd
Pid: 4051, comm: smartd Not tainted 2.6.18-348.1.1.el5 #1
RIP: 0010:[<ffffffff880ddc75>]  [<ffffffff880ddc75>] :libata:ata_find_dev+0x24/0x73
RSP: 0018:ffff81032428fcb0  EFLAGS: 00010286
RAX: 00000000000023f0 RBX: 00007fff449760f0 RCX: ffff810626c02418
RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff810626c00028
RBP: ffff810627ffe000 R08: 0000000000000000 R09: ffff810332b708c0
R10: 0000000000000000 R11: 0000000000000000 R12: 00007fff449760f0
R13: 000000000000030d R14: ffff8106266b4680 R15: ffff81010b154858
FS:  00002b6cf1489b50(0000) GS:ffff810332acb440(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000002e38 CR3: 0000000325fc7000 CR4: 00000000000006e0
Process smartd (pid: 4051, threadinfo ffff81032428e000, task ffff810326e9e7e0)
Stack:  ffffffff880ddd7a 00007fff449760f0 ffffffff880e0100 0000000000000000
 00000000ffffffed 000000000000030d ffff810627ffe000 00007fff449760f0
 ffffffea0b154740 000000000000030d ffff810627ffe000 00007fff449760f0
Call Trace:
 [<ffffffff880ddd7a>] :libata:ata_scsi_find_dev+0x6/0x21
 [<ffffffff880e0100>] :libata:ata_scsi_ioctl+0x92/0x1b5
 [<ffffffff88085dd3>] :scsi_mod:scsi_ioctl+0x2cc/0x2f5
 [<ffffffff8014df26>] blkdev_driver_ioctl+0x5d/0x72
 [<ffffffff8014e577>] blkdev_ioctl+0x63c/0x697
 [<ffffffff800227f1>] __up_read+0x19/0x7f
 [<ffffffff800671cf>] do_page_fault+0x4cc/0x842
 [<ffffffff800e8e37>] block_ioctl+0x1b/0x1f
 [<ffffffff80042496>] do_ioctl+0x21/0x6b
 [<ffffffff800304e0>] vfs_ioctl+0x457/0x4b9
 [<ffffffff800baf21>] audit_syscall_entry+0x1a8/0x1d3
 [<ffffffff8004c89a>] sys_ioctl+0x59/0x78
 [<ffffffff8005d29e>] tracesys+0xd5/0xdf


Code: 48 3b 8a 38 2e 00 00 75 0b f6 42 18 01 b8 02 00 00 00 75 05
RIP  [<ffffffff880ddc75>] :libata:ata_find_dev+0x24/0x73
 RSP <ffff81032428fcb0>
CR2: 0000000000002e38
 <0>Kernel panic - not syncing: Fatal exception
 


 
That 5 minute outage for some of you just now?   that was me trying to add another vlan to a trunk.  

 switchport trunk allowed vlan <newvlan>  


when you already have a bunch of allowed vlans?  bad idea.  

Fortunately, as I hadn't saved the switch config, a reboot of the switch later, we were back.   I'm sorry.

About this Archive

This page is a archive of recent entries written by luke in February 2013.

luke: January 2013 is the previous archive.

luke: March 2013 is the next archive.

Find recent content on the main index or look in the archives to find all content.