Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 10 Apr 2015 11:48:05 -0400
From:      J David <j.david.lists@gmail.com>
To:        freebsd-net@freebsd.org,  "freebsd-questions@freebsd.org" <freebsd-questions@freebsd.org>
Subject:   Re: Why is backup carp node seeing traffic?
Message-ID:  <CABXB=RTLoLter-w6Tvu80gw0b2iV5eAC3nJk85qZOKA2K=V_TA@mail.gmail.com>
In-Reply-To: <CABXB=RT3kuJ-4vbrDEtUXGbgRftEye0wa_d05yR5xqzHoe8ShA@mail.gmail.com>
References:  <CABXB=RT3kuJ-4vbrDEtUXGbgRftEye0wa_d05yR5xqzHoe8ShA@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Fri, Apr 10, 2015 at 7:03 AM, J David <j.david.lists@gmail.com> wrote:
> Why are packets showing up at the backup node?  Why is it
> intermittent?  Can anything be done to eliminate this issue?

To follow up on this, I think I have identified the problem behavior.

Here is a tcpdump of carp traffic seen by the web test client (on vlan4):

$ sudo tcpdump -e -i net0 -n carp and ether src 00:00:5e:00:01:04
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on net0, link-type EN10MB (Ethernet), capture size 65535 bytes
15:20:36.659743 00:00:5e:00:01:04 > 01:00:5e:00:00:12, ethertype IPv4
(0x0800), length 70: 192.168.0.251 > 224.0.0.18: VRRPv2,
Advertisement, vrid 4, prio 0, authtype none, intvl 1s, length 36
15:20:37.660173 00:00:5e:00:01:04 > 01:00:5e:00:00:12, ethertype IPv4
(0x0800), length 70: 192.168.0.251 > 224.0.0.18: VRRPv2,
Advertisement, vrid 4, prio 0, authtype none, intvl 1s, length 36
15:20:38.660787 00:00:5e:00:01:04 > 01:00:5e:00:00:12, ethertype IPv4
(0x0800), length 70: 192.168.0.251 > 224.0.0.18: VRRPv2,
Advertisement, vrid 4, prio 0, authtype none, intvl 1s, length 36
15:20:39.661177 00:00:5e:00:01:04 > 01:00:5e:00:00:12, ethertype IPv4
(0x0800), length 70: 192.168.0.251 > 224.0.0.18: VRRPv2,
Advertisement, vrid 4, prio 0, authtype none, intvl 1s, length 36
15:20:40.661614 00:00:5e:00:01:04 > 01:00:5e:00:00:12, ethertype IPv4
(0x0800), length 70: 192.168.0.251 > 224.0.0.18: VRRPv2,
Advertisement, vrid 4, prio 0, authtype none, intvl 1s, length 36
15:20:41.662149 00:00:5e:00:01:04 > 01:00:5e:00:00:12, ethertype IPv4
(0x0800), length 70: 192.168.0.251 > 224.0.0.18: VRRPv2,
Advertisement, vrid 4, prio 0, authtype none, intvl 1s, length 36
15:20:42.662593 00:00:5e:00:01:04 > 01:00:5e:00:00:12, ethertype IPv4
(0x0800), length 70: 192.168.0.251 > 224.0.0.18: VRRPv2,
Advertisement, vrid 4, prio 0, authtype none, intvl 1s, length 36

This is exactly what one would expect to see: carp advertisements only
from the master node, every 1 second, like clockwork.

Now, here is a tcpdump of carp traffic seen by the web server (on
vlan2, where the problem is seen):

$ sudo tcpdump -e -n -i vlan2 carp and ether src 00:00:5e:00:01:02
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on vlan2, link-type EN10MB (Ethernet), capture size 65535 bytes
15:22:24.946405 00:00:5e:00:01:02 > 01:00:5e:00:00:12, ethertype IPv4
(0x0800), length 70: 192.168.80.251 > 224.0.0.18: VRRPv2,
Advertisement, vrid 2, prio 0, authtype none, intvl 1s, length 36
15:22:25.946910 00:00:5e:00:01:02 > 01:00:5e:00:00:12, ethertype IPv4
(0x0800), length 70: 192.168.80.251 > 224.0.0.18: VRRPv2,
Advertisement, vrid 2, prio 0, authtype none, intvl 1s, length 36
15:22:26.947406 00:00:5e:00:01:02 > 01:00:5e:00:00:12, ethertype IPv4
(0x0800), length 70: 192.168.80.251 > 224.0.0.18: VRRPv2,
Advertisement, vrid 2, prio 0, authtype none, intvl 1s, length 36
15:22:27.299658 00:00:5e:00:01:02 > 01:00:5e:00:00:12, ethertype IPv4
(0x0800), length 70: 192.168.80.252 > 224.0.0.18: VRRPv2,
Advertisement, vrid 2, prio 240, authtype none, intvl 1s, length 36
15:22:27.947905 00:00:5e:00:01:02 > 01:00:5e:00:00:12, ethertype IPv4
(0x0800), length 70: 192.168.80.251 > 224.0.0.18: VRRPv2,
Advertisement, vrid 2, prio 0, authtype none, intvl 1s, length 36
15:22:28.948407 00:00:5e:00:01:02 > 01:00:5e:00:00:12, ethertype IPv4
(0x0800), length 70: 192.168.80.251 > 224.0.0.18: VRRPv2,
Advertisement, vrid 2, prio 0, authtype none, intvl 1s, length 36
15:22:29.948906 00:00:5e:00:01:02 > 01:00:5e:00:00:12, ethertype IPv4
(0x0800), length 70: 192.168.80.251 > 224.0.0.18: VRRPv2,
Advertisement, vrid 2, prio 0, authtype none, intvl 1s, length 36
15:22:30.692661 00:00:5e:00:01:02 > 01:00:5e:00:00:12, ethertype IPv4
(0x0800), length 70: 192.168.80.252 > 224.0.0.18: VRRPv2,
Advertisement, vrid 2, prio 240, authtype none, intvl 1s, length 36

As before, the master is advertising once per second, like clockwork.
Yet every few seconds, the backup advertises itself anyway, using the
carp interface MAC to do it.  It seems certain that this causes the
switch to decide that MAC has now moved to the backup's switch port,
so it sends traffic there until the next packet arrives with this
source address from the master, switching it back.

tcpdump run on the backup shows that it is receiving the
advertisements from the master just fine, then acting like it didn't:

$ sudo tcpdump -e -n -i vlan2 carp and ether src 00:00:5e:00:01:02
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on vlan2, link-type EN10MB (Ethernet), capture size 65535 bytes
15:27:22.099936 00:00:5e:00:01:02 > 01:00:5e:00:00:12, ethertype IPv4
(0x0800), length 70: 192.168.80.251 > 224.0.0.18: VRRPv2,
Advertisement, vrid 2, prio 0, authtype none, intvl 1s, length 36
15:27:22.099938 00:00:5e:00:01:02 > 01:00:5e:00:00:12, ethertype IPv4
(0x0800), length 70: 192.168.80.251 > 224.0.0.18: VRRPv2,
Advertisement, vrid 2, prio 0, authtype none, intvl 1s, length 36
15:27:22.495786 00:00:5e:00:01:02 > 01:00:5e:00:00:12, ethertype IPv4
(0x0800), length 70: 192.168.80.252 > 224.0.0.18: VRRPv2,
Advertisement, vrid 2, prio 240, authtype none, intvl 1s, length 36

The web server shows that the "leaking" advertisements from the backup
happen every 3.2 seconds:

$ sudo tcpdump -c 4 -e -n -i vlan2 carp and ether src
00:00:5e:00:01:02 and host 192.168.80.252
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on vlan2, link-type EN10MB (Ethernet), capture size 65535 bytes
15:34:33.547068 00:00:5e:00:01:02 > 01:00:5e:00:00:12, ethertype IPv4
(0x0800), length 70: 192.168.80.252 > 224.0.0.18: VRRPv2,
Advertisement, vrid 2, prio 240, authtype none, intvl 1s, length 36
15:34:36.745077 00:00:5e:00:01:02 > 01:00:5e:00:00:12, ethertype IPv4
(0x0800), length 70: 192.168.80.252 > 224.0.0.18: VRRPv2,
Advertisement, vrid 2, prio 240, authtype none, intvl 1s, length 36
15:34:39.943078 00:00:5e:00:01:02 > 01:00:5e:00:00:12, ethertype IPv4
(0x0800), length 70: 192.168.80.252 > 224.0.0.18: VRRPv2,
Advertisement, vrid 2, prio 240, authtype none, intvl 1s, length 36
15:34:43.141084 00:00:5e:00:01:02 > 01:00:5e:00:00:12, ethertype IPv4
(0x0800), length 70: 192.168.80.252 > 224.0.0.18: VRRPv2,
Advertisement, vrid 2, prio 240, authtype none, intvl 1s, length 36
4 packets captured
13788 packets received by filter
0 packets dropped by kernel

That is with advskew=0 on the master and advskew=50 on the backup.
Setting advskew=250 on the backup increases the interval to 3.98
seconds:

$ sudo tcpdump -c 4 -e -n -i vlan2 carp and ether src
00:00:5e:00:01:02 and host 192.168.80.252
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on vlan2, link-type EN10MB (Ethernet), capture size 65535 bytes
15:36:05.363113 00:00:5e:00:01:02 > 01:00:5e:00:00:12, ethertype IPv4
(0x0800), length 70: 192.168.80.252 > 224.0.0.18: VRRPv2,
Advertisement, vrid 2, prio 250, authtype none, intvl 1s, length 36
15:36:09.342124 00:00:5e:00:01:02 > 01:00:5e:00:00:12, ethertype IPv4
(0x0800), length 70: 192.168.80.252 > 224.0.0.18: VRRPv2,
Advertisement, vrid 2, prio 250, authtype none, intvl 1s, length 36
15:36:13.321122 00:00:5e:00:01:02 > 01:00:5e:00:00:12, ethertype IPv4
(0x0800), length 70: 192.168.80.252 > 224.0.0.18: VRRPv2,
Advertisement, vrid 2, prio 250, authtype none, intvl 1s, length 36
15:36:17.300134 00:00:5e:00:01:02 > 01:00:5e:00:00:12, ethertype IPv4
(0x0800), length 70: 192.168.80.252 > 224.0.0.18: VRRPv2,
Advertisement, vrid 2, prio 250, authtype none, intvl 1s, length 36
4 packets captured
19658 packets received by filter
0 packets dropped by kernel

So, the carp backup leaks packets at the rate of exactly 3 seconds +
the advskew interval (1s * advskew/255).

And in fact the logs on the carp backup do indicate that it thinks
it's doing the world a favor.  It's full of:

Apr 10 15:44:53 fw2 kernel: carp2: link state changed to UP
Apr 10 15:44:53 fw2 kernel: carp2: MASTER -> BACKUP (more frequent
advertisement received)
Apr 10 15:44:53 fw2 kernel: carp2: link state changed to DOWN
Apr 10 15:44:57 fw2 kernel: carp2: link state changed to UP
Apr 10 15:44:57 fw2 kernel: carp2: MASTER -> BACKUP (more frequent
advertisement received)
Apr 10 15:44:57 fw2 kernel: carp2: link state changed to DOWN
Apr 10 15:45:00 fw2 kernel: carp2: link state changed to UP
Apr 10 15:45:00 fw2 kernel: carp2: MASTER -> BACKUP (more frequent
advertisement received)
Apr 10 15:45:00 fw2 kernel: carp2: link state changed to DOWN

(Unfortunately I was only looking at the master's logs before. :( )

Does this shed any light on why this might be happening?

Thanks!



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CABXB=RTLoLter-w6Tvu80gw0b2iV5eAC3nJk85qZOKA2K=V_TA>