Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 18 Mar 2011 16:43:59 +0100
From:      Viktor Petersson <petersson@gmail.com>
To:        freebsd-net@freebsd.org
Subject:   Possible CARP bug?
Message-ID:  <00612801-A0F4-4EDC-9BED-3364A86E4F9C@gmail.com>

next in thread | raw e-mail | index | archive | help
Hey guys,

First, a big thanks to the developers for all the hard work. You guys =
rock!

Now to the issue. I've been using CARP on a few servers in the past =
without any issues. It usually works without any hick-ups. Now I'm =
planning to move our company's infrastructure from physical hardware to =
a virtual environment over at CloudSigma (http://www.cloudsigma.com). =
Unfortunately I'm having some issues with getting CARP to work there. =
For the record, they're using Qemu as the virtualization platform.

Let me start by describing my setup in more details.

I have two nodes: nas0 and nas1. Both these nodes have two interfaces, =
one public and one private. I'm obviously using the private one for =
CARP. nas0 is using the IP 192.168.1.11 and nas1 is using the IP =
192.168.1.12. The CARP interface is configured to use the IP =
192.168.1.10. The internal network is using a dedicated VLAN. Only these =
two nodes are using this VLAN to eliminate any possible conflicts.

I've also disabled all software firewalls, so we should also be able to =
exclude that from the equation.

Both nodes are using FreeBSD 8.2, and both the internal and external =
interfaces are working (ie. the two nodes can ping each other on the =
private interfaces).

In rc.conf on nas0, I have the following lines:
	cloned_interfaces=3D"carp0"
	ifconfig_carp0=3D"vhid 1 pass foobar 192.168.1.10/24"

On nas1 (which is the failover), the equivalent lines are:
	cloned_interfaces=3D"carp0"
	ifconfig_carp0=3D"vhid 1 advskew 100 pass foobar =
192.168.1.10/24"
(note the advskew value on nas1)

To verify that CARP is enabled and configured etc., here's the sysctl =
output (same on both nodes):
	net.inet.carp.allow: 1
	net.inet.carp.preempt: 1
	net.inet.carp.log: 1
	net.inet.carp.arpbalance: 0
	net.inet.carp.suppress_preempt: 0

Normally, that should be it. nas0 should automatically become the =
master, and nas1 the backup/failover. Unfortunately that doesn't happen. =
Instead, what I get this on the node with the lowest advskew value =
(nas0, but if I raise the advskew on nas0, the error moves to nas1):
	Mar  7 14:42:57 nas0 kernel: carp0: MASTER -> BACKUP (more =
frequent advertisement received)
	Mar  7 14:42:57 nas0 kernel: carp0: 2 link states coalesced
	Mar  7 14:42:57 nas0 kernel: carp0: link state changed to DOWN

When checking the CARP interface status, I get the following on nas0:
	carp0: flags=3D49<UP,LOOPBACK,RUNNING> metric 0 mtu 1500
       	inet 192.168.1.10 netmask 0xffffff00
       	carp: BACKUP vhid 1 advbase 1 advskew 0

and the following on nas1:
	carp0: flags=3D49<UP,LOOPBACK,RUNNING> metric 0 mtu 1500
       	inet 192.168.1.10 netmask 0xffffff00
       	carp: BACKUP vhid 1 advbase 1 advskew 100

I've google'd this error (carp0: 2 link states coalesced), and some of =
the forum posts mentioned that they've seen this with faulty NICs or =
switches. However, I've reached out to CloudSigma, and they've been very =
helpful and set up a replication of the setup, but on 8.1). Their head =
network guy was able to reproduce the same errors as I got, and he was =
also able to confirm that the packages were indeed sent and received on =
both nodes (using tcpdump). His conclusion was that this was likely a =
bug in CARP (or possibly a driver).=20

It is also worth mentioning that CARP does work under OpenBSD 4.3 and =
VRRT work under Linux.=20

Since it's also in their interest to get this working for us (as this is =
what is holding us back from moving), they've been kind enough to =
provide access to their CARP test-nodes to any developer that want to =
take a stab at it. I have the credentials and details, but I don't want =
to post them here, but will provide them to anyone interested.

Regards,
Viktor Petersson
WireLoad



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?00612801-A0F4-4EDC-9BED-3364A86E4F9C>