June 2012 Archives

rebuilding disk on chase

No downtime expected.  

[root@chase ~]# cat /proc/mdstat
Personalities : [raid1] [raid10]
md1 : active raid10 sdg2[6] sdf2[5] sde2[4] sdd2[3] sdc2[2] sdb2[7](F) sda2[0]
      1433703936 blocks 256K chunks 2 near-copies [6/5] [U_UUUU]
      [>....................]  recovery =  0.2% (1116288/477901312) finish=460.9min speed=17237K/sec
md0 : active raid1 sdg1[1] sdf1[5] sde1[4] sdd1[3] sdc1[2] sdb1[6](F) sda1[0]
      10482304 blocks [6/6] [UUUUUU]
unused devices: <none>

Ugh.   well Yay and ugh.   IPv6, apparently, uses multicast, not broadcast.   Multcast should be propigated to all ports on a bridge, but it's not.   Sometimes tweaking sys entries fixes it.   Sometimes quickly tweaking sys entries (I had a for loop with no sleep that set all ports to receive multicasts) crashes the box.  ugh.  sorry. 

(note, uh, it was pointed out that I said 'IPv6 uses multicast, not unicast' which doesn't really make sense.  I mean that in the places where IPv4 would use broadcast (FF:FF:FF:FF:FF) ethernet frames to map IPs to mac addresses, IPv6 uses multicast (x3:xx:xx:xx:xx:xx, but specifically 33:xx:xx:xx:xx:xx for IPv6 neigh discovery)  which should be treated the same by a bridge but apparently isn't always.)

downtime on table.

ugh.  so table threw a drive yesterday, and I though I'd do the kernel upgrade before replacing the drive, but it is causing... problems.  More after I figure it out.

crock crash.

it's back now, but clearly screwy.  we need to get people off of this box. 

ip6_tables: (C) 2000-2006 Netfilter Core Team
Unable to handle kernel NULL pointer dereference at 0000000000000028 RIP:
 [<ffffffff8845b76d>] :ipv6:fib6_del+0xfa/0x467
PGD 3a862067 PUD 3a863067 PMD 0
Oops: 0000 [1] SMP
 last sysfs file: /class/net/xenbr0/bridge/multicast_startup_query_interval
Modules linked in: ip6table_filter ip6_tables ebtable_broute ebtable_nat ebtable_filter ebtables netloop netbk blktap blkbk ipt_MASQUERADE iptable_nat ip_nat lockd sunrpc ip_conntrack_netbios_ns ipt_REJECT xt_state ip_conntrack nfnetlink xt_physdev bridge iptable_filter ip_tables ip6t_REJECT xt_tcpudp x_tables be2iscsi ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp bnx2i cnic ipv6 xfrm_nalgo crypto_api uio cxgb3i libcxgbi cxgb3 libiscsi_tcp libiscsi2 scsi_transport_iscsi2 scsi_transport_iscsi dm_multipath scsi_dh video backlight sbs power_meter i2c_ec dell_wmi wmi button battery asus_acpi ac lp floppy pcspkr i2c_nforce2 i2c_core sg forcedeth 8021q k10temp shpchp hwmon serial_core parport_pc parport ide_cd tpm_tis tpm tpm_bios cdrom dm_raid45 dm_message dm_region_hash dm_mem_cache dm_snapshot dm_zero dm_mirror dm_log dm_mod usb_storage sata_nv libata sd_mod scsi_mod raid1 ext3 jbd uhci_hcd ohci_hcd ehci_hcd
Pid: 0, comm: swapper Not tainted 2.6.18-308.8.1.el5xen #1
[Thu Jun  7 16:43:52 2012]RIP: e030:[<ffffffff8845b76d>]  [<ffffffff8845b76d>] :ipv6:fib6_del+0xfa/0x467
RSP: e02b:ffffffff8079ddd0  EFLAGS: 00010207
RAX: 0000000000000000 RBX: ffff88003e6587c0 RCX: ffff88003e6587e0
RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffff88490580
RBP: ffffffff8079de60 R08: 00000001008fb353 R09: 0000000000000000
R10: ffffffff803b916c R11: ffffffff8079de60 R12: ffff88002d73e580
R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
FS:  00002af1991366e0(0000) GS:ffffffff80635000(0000) knlGS:0000000000000000
CS:  e033 DS: 0000 ES: 0000
Process swapper (pid: 0, threadinfo ffffffff80756000, task ffffffff80503b80)
[Thu Jun  7 16:43:52 2012]Stack:  0000000000000000  ffff88002d73e580  ffffffff8079de60  00000000000000fe
 0000000000000000  0000000000000000  ffffffff8845b2c0  ffffffff8845bb0a
 00000001008fb353  ffffffff8079de60
Call Trace:
 <IRQ>  [<ffffffff8845b2c0>] :ipv6:fib6_age+0x0/0x65
 [<ffffffff8845bb0a>] :ipv6:fib6_clean_node+0x30/0x83
 [<ffffffff8845b3f1>] :ipv6:fib6_walk_continue+0x87/0xf2
  [<ffffffff8845b4ad>] :ipv6:fib6_walk+0x51/0x8d
 [<ffffffff8845b511>] :ipv6:fib6_clean_tree+0x28/0x2d
 [<ffffffff8845bada>] :ipv6:fib6_clean_node+0x0/0x83
[Thu Jun  7 16:43:52 2012] [<ffffffff8845b2c0>] :ipv6:fib6_age+0x0/0x65
 [<ffffffff8845b576>] :ipv6:fib6_clean_all+0x4c/0x72
 [<ffffffff8845b59c>] :ipv6:fib6_run_gc+0x0/0xd7
 [<ffffffff8845b627>] :ipv6:fib6_run_gc+0x8b/0xd7
 [<ffffffff80294680>] run_timer_softirq+0x191/0x242
 [<ffffffff80212eb8>] __do_softirq+0x8d/0x13b
 [<ffffffff8025fda4>] call_softirq+0x1c/0x278
 [<ffffffff8026db89>] do_softirq+0x31/0x90
 [<ffffffff8025f8d6>] do_hypervisor_callback+0x1e/0x2c
 <EOI>  [<ffffffff802063aa>] hypercall_page+0x3aa/0x1000
[Thu Jun  7 16:43:52 2012] [<ffffffff802063aa>] hypercall_page+0x3aa/0x1000
 [<ffffffff8026efc8>] raw_safe_halt+0x87/0xab
 [<ffffffff8026c573>] xen_idle+0x38/0x4a
 [<ffffffff8024ac05>] cpu_idle+0x97/0xba
 [<ffffffff80760b11>] start_kernel+0x21f/0x224
 [<ffffffff807601e5>] _sinittext+0x1e5/0x1eb

Code: 80 7a 28 02 75 17 4c 39 62 20 75 11 49 8b 04 24 48 85 c0 48
RIP  [<ffffffff8845b76d>] :ipv6:fib6_del+0xfa/0x467
[Thu Jun  7 16:43:52 2012] RSP <ffffffff8079ddd0>
CR2: 0000000000000028
 <0>Kernel panic - not syncing: Fatal exception
 (XEN) Domain 0 crashed: rebooting machine in 5 seconds.

reboot crock to fix ipv6

We are doing a clean reboot of crock, because there has been a problem with bridging ipv6. I suspect it is because the bridge setup scripts being mismatched with the kernel upgrade a while ago, so they are now matching and hopefully it will fix the ipv6 problem on crock.

Update: crock is back up now, but the problem is still not fixed. 

routing change for mpt users

Over the next few days the vps users on servers at Market Post Tower will have their gateway ip addresses moved from the router manticore to the router gryphon. Currently those vlans are carried over EGI's network to SVTIX where manticore is, but when we move the gateway ips to gryphon at SVTIX, we can route the traffic over one vlan to manticore or other providers we connect to at MPT. This will also allow EGI to stop carrying our local traffic between SVTIX and MPT, and we won't take so many of their vlans so it will help them as well. If your vps is on the following servers, you will be affected for only a few seconds when we move the address to the new router:
mares, bull, cerberus, apples, council, bowl, branch, robe, jewel, seashell, hydra, beak, dao, waite, gladwynn, , sword, horn, chariot, halter, knife, cauldron, crock, coat, whetstone, dish, mantle, pearl, lozenges, chime, coins, chessboard, jay, rutledge, taney, marshall
We will also ping the addresses in the subnet before and after moving it, so it will help us if your vps responds to ping over ipv4 and ipv6. Thanks!

Update: This didn't go well with the first subnet I tried, so I put it back on manticore. I'm guessing the problem is with using ospf redistribute connected instead of passive ospf interfaces, but I need to experiment more. The connected route on manticore didn't seem to properly disappear when removing the address, or the route from gryphon by ospf get added to manticore's routing table.