Date: Fri, 18 Mar 2011 16:43:59 +0100 From: Viktor Petersson <petersson@gmail.com> To: freebsd-net@freebsd.org Subject: Possible CARP bug? Message-ID: <00612801-A0F4-4EDC-9BED-3364A86E4F9C@gmail.com>
next in thread | raw e-mail | index | archive | help
Hey guys, First, a big thanks to the developers for all the hard work. You guys = rock! Now to the issue. I've been using CARP on a few servers in the past = without any issues. It usually works without any hick-ups. Now I'm = planning to move our company's infrastructure from physical hardware to = a virtual environment over at CloudSigma (http://www.cloudsigma.com). = Unfortunately I'm having some issues with getting CARP to work there. = For the record, they're using Qemu as the virtualization platform. Let me start by describing my setup in more details. I have two nodes: nas0 and nas1. Both these nodes have two interfaces, = one public and one private. I'm obviously using the private one for = CARP. nas0 is using the IP 192.168.1.11 and nas1 is using the IP = 192.168.1.12. The CARP interface is configured to use the IP = 192.168.1.10. The internal network is using a dedicated VLAN. Only these = two nodes are using this VLAN to eliminate any possible conflicts. I've also disabled all software firewalls, so we should also be able to = exclude that from the equation. Both nodes are using FreeBSD 8.2, and both the internal and external = interfaces are working (ie. the two nodes can ping each other on the = private interfaces). In rc.conf on nas0, I have the following lines: cloned_interfaces=3D"carp0" ifconfig_carp0=3D"vhid 1 pass foobar 192.168.1.10/24" On nas1 (which is the failover), the equivalent lines are: cloned_interfaces=3D"carp0" ifconfig_carp0=3D"vhid 1 advskew 100 pass foobar = 192.168.1.10/24" (note the advskew value on nas1) To verify that CARP is enabled and configured etc., here's the sysctl = output (same on both nodes): net.inet.carp.allow: 1 net.inet.carp.preempt: 1 net.inet.carp.log: 1 net.inet.carp.arpbalance: 0 net.inet.carp.suppress_preempt: 0 Normally, that should be it. nas0 should automatically become the = master, and nas1 the backup/failover. Unfortunately that doesn't happen. = Instead, what I get this on the node with the lowest advskew value = (nas0, but if I raise the advskew on nas0, the error moves to nas1): Mar 7 14:42:57 nas0 kernel: carp0: MASTER -> BACKUP (more = frequent advertisement received) Mar 7 14:42:57 nas0 kernel: carp0: 2 link states coalesced Mar 7 14:42:57 nas0 kernel: carp0: link state changed to DOWN When checking the CARP interface status, I get the following on nas0: carp0: flags=3D49<UP,LOOPBACK,RUNNING> metric 0 mtu 1500 inet 192.168.1.10 netmask 0xffffff00 carp: BACKUP vhid 1 advbase 1 advskew 0 and the following on nas1: carp0: flags=3D49<UP,LOOPBACK,RUNNING> metric 0 mtu 1500 inet 192.168.1.10 netmask 0xffffff00 carp: BACKUP vhid 1 advbase 1 advskew 100 I've google'd this error (carp0: 2 link states coalesced), and some of = the forum posts mentioned that they've seen this with faulty NICs or = switches. However, I've reached out to CloudSigma, and they've been very = helpful and set up a replication of the setup, but on 8.1). Their head = network guy was able to reproduce the same errors as I got, and he was = also able to confirm that the packages were indeed sent and received on = both nodes (using tcpdump). His conclusion was that this was likely a = bug in CARP (or possibly a driver).=20 It is also worth mentioning that CARP does work under OpenBSD 4.3 and = VRRT work under Linux.=20 Since it's also in their interest to get this working for us (as this is = what is holding us back from moving), they've been kind enough to = provide access to their CARP test-nodes to any developer that want to = take a stab at it. I have the credentials and details, but I don't want = to post them here, but will provide them to anyone interested. Regards, Viktor Petersson WireLoad
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?00612801-A0F4-4EDC-9BED-3364A86E4F9C>