Date: Fri, 10 Apr 2015 11:48:05 -0400 From: J David <j.david.lists@gmail.com> To: freebsd-net@freebsd.org, "freebsd-questions@freebsd.org" <freebsd-questions@freebsd.org> Subject: Re: Why is backup carp node seeing traffic? Message-ID: <CABXB=RTLoLter-w6Tvu80gw0b2iV5eAC3nJk85qZOKA2K=V_TA@mail.gmail.com> In-Reply-To: <CABXB=RT3kuJ-4vbrDEtUXGbgRftEye0wa_d05yR5xqzHoe8ShA@mail.gmail.com> References: <CABXB=RT3kuJ-4vbrDEtUXGbgRftEye0wa_d05yR5xqzHoe8ShA@mail.gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On Fri, Apr 10, 2015 at 7:03 AM, J David <j.david.lists@gmail.com> wrote: > Why are packets showing up at the backup node? Why is it > intermittent? Can anything be done to eliminate this issue? To follow up on this, I think I have identified the problem behavior. Here is a tcpdump of carp traffic seen by the web test client (on vlan4): $ sudo tcpdump -e -i net0 -n carp and ether src 00:00:5e:00:01:04 tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on net0, link-type EN10MB (Ethernet), capture size 65535 bytes 15:20:36.659743 00:00:5e:00:01:04 > 01:00:5e:00:00:12, ethertype IPv4 (0x0800), length 70: 192.168.0.251 > 224.0.0.18: VRRPv2, Advertisement, vrid 4, prio 0, authtype none, intvl 1s, length 36 15:20:37.660173 00:00:5e:00:01:04 > 01:00:5e:00:00:12, ethertype IPv4 (0x0800), length 70: 192.168.0.251 > 224.0.0.18: VRRPv2, Advertisement, vrid 4, prio 0, authtype none, intvl 1s, length 36 15:20:38.660787 00:00:5e:00:01:04 > 01:00:5e:00:00:12, ethertype IPv4 (0x0800), length 70: 192.168.0.251 > 224.0.0.18: VRRPv2, Advertisement, vrid 4, prio 0, authtype none, intvl 1s, length 36 15:20:39.661177 00:00:5e:00:01:04 > 01:00:5e:00:00:12, ethertype IPv4 (0x0800), length 70: 192.168.0.251 > 224.0.0.18: VRRPv2, Advertisement, vrid 4, prio 0, authtype none, intvl 1s, length 36 15:20:40.661614 00:00:5e:00:01:04 > 01:00:5e:00:00:12, ethertype IPv4 (0x0800), length 70: 192.168.0.251 > 224.0.0.18: VRRPv2, Advertisement, vrid 4, prio 0, authtype none, intvl 1s, length 36 15:20:41.662149 00:00:5e:00:01:04 > 01:00:5e:00:00:12, ethertype IPv4 (0x0800), length 70: 192.168.0.251 > 224.0.0.18: VRRPv2, Advertisement, vrid 4, prio 0, authtype none, intvl 1s, length 36 15:20:42.662593 00:00:5e:00:01:04 > 01:00:5e:00:00:12, ethertype IPv4 (0x0800), length 70: 192.168.0.251 > 224.0.0.18: VRRPv2, Advertisement, vrid 4, prio 0, authtype none, intvl 1s, length 36 This is exactly what one would expect to see: carp advertisements only from the master node, every 1 second, like clockwork. Now, here is a tcpdump of carp traffic seen by the web server (on vlan2, where the problem is seen): $ sudo tcpdump -e -n -i vlan2 carp and ether src 00:00:5e:00:01:02 tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on vlan2, link-type EN10MB (Ethernet), capture size 65535 bytes 15:22:24.946405 00:00:5e:00:01:02 > 01:00:5e:00:00:12, ethertype IPv4 (0x0800), length 70: 192.168.80.251 > 224.0.0.18: VRRPv2, Advertisement, vrid 2, prio 0, authtype none, intvl 1s, length 36 15:22:25.946910 00:00:5e:00:01:02 > 01:00:5e:00:00:12, ethertype IPv4 (0x0800), length 70: 192.168.80.251 > 224.0.0.18: VRRPv2, Advertisement, vrid 2, prio 0, authtype none, intvl 1s, length 36 15:22:26.947406 00:00:5e:00:01:02 > 01:00:5e:00:00:12, ethertype IPv4 (0x0800), length 70: 192.168.80.251 > 224.0.0.18: VRRPv2, Advertisement, vrid 2, prio 0, authtype none, intvl 1s, length 36 15:22:27.299658 00:00:5e:00:01:02 > 01:00:5e:00:00:12, ethertype IPv4 (0x0800), length 70: 192.168.80.252 > 224.0.0.18: VRRPv2, Advertisement, vrid 2, prio 240, authtype none, intvl 1s, length 36 15:22:27.947905 00:00:5e:00:01:02 > 01:00:5e:00:00:12, ethertype IPv4 (0x0800), length 70: 192.168.80.251 > 224.0.0.18: VRRPv2, Advertisement, vrid 2, prio 0, authtype none, intvl 1s, length 36 15:22:28.948407 00:00:5e:00:01:02 > 01:00:5e:00:00:12, ethertype IPv4 (0x0800), length 70: 192.168.80.251 > 224.0.0.18: VRRPv2, Advertisement, vrid 2, prio 0, authtype none, intvl 1s, length 36 15:22:29.948906 00:00:5e:00:01:02 > 01:00:5e:00:00:12, ethertype IPv4 (0x0800), length 70: 192.168.80.251 > 224.0.0.18: VRRPv2, Advertisement, vrid 2, prio 0, authtype none, intvl 1s, length 36 15:22:30.692661 00:00:5e:00:01:02 > 01:00:5e:00:00:12, ethertype IPv4 (0x0800), length 70: 192.168.80.252 > 224.0.0.18: VRRPv2, Advertisement, vrid 2, prio 240, authtype none, intvl 1s, length 36 As before, the master is advertising once per second, like clockwork. Yet every few seconds, the backup advertises itself anyway, using the carp interface MAC to do it. It seems certain that this causes the switch to decide that MAC has now moved to the backup's switch port, so it sends traffic there until the next packet arrives with this source address from the master, switching it back. tcpdump run on the backup shows that it is receiving the advertisements from the master just fine, then acting like it didn't: $ sudo tcpdump -e -n -i vlan2 carp and ether src 00:00:5e:00:01:02 tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on vlan2, link-type EN10MB (Ethernet), capture size 65535 bytes 15:27:22.099936 00:00:5e:00:01:02 > 01:00:5e:00:00:12, ethertype IPv4 (0x0800), length 70: 192.168.80.251 > 224.0.0.18: VRRPv2, Advertisement, vrid 2, prio 0, authtype none, intvl 1s, length 36 15:27:22.099938 00:00:5e:00:01:02 > 01:00:5e:00:00:12, ethertype IPv4 (0x0800), length 70: 192.168.80.251 > 224.0.0.18: VRRPv2, Advertisement, vrid 2, prio 0, authtype none, intvl 1s, length 36 15:27:22.495786 00:00:5e:00:01:02 > 01:00:5e:00:00:12, ethertype IPv4 (0x0800), length 70: 192.168.80.252 > 224.0.0.18: VRRPv2, Advertisement, vrid 2, prio 240, authtype none, intvl 1s, length 36 The web server shows that the "leaking" advertisements from the backup happen every 3.2 seconds: $ sudo tcpdump -c 4 -e -n -i vlan2 carp and ether src 00:00:5e:00:01:02 and host 192.168.80.252 tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on vlan2, link-type EN10MB (Ethernet), capture size 65535 bytes 15:34:33.547068 00:00:5e:00:01:02 > 01:00:5e:00:00:12, ethertype IPv4 (0x0800), length 70: 192.168.80.252 > 224.0.0.18: VRRPv2, Advertisement, vrid 2, prio 240, authtype none, intvl 1s, length 36 15:34:36.745077 00:00:5e:00:01:02 > 01:00:5e:00:00:12, ethertype IPv4 (0x0800), length 70: 192.168.80.252 > 224.0.0.18: VRRPv2, Advertisement, vrid 2, prio 240, authtype none, intvl 1s, length 36 15:34:39.943078 00:00:5e:00:01:02 > 01:00:5e:00:00:12, ethertype IPv4 (0x0800), length 70: 192.168.80.252 > 224.0.0.18: VRRPv2, Advertisement, vrid 2, prio 240, authtype none, intvl 1s, length 36 15:34:43.141084 00:00:5e:00:01:02 > 01:00:5e:00:00:12, ethertype IPv4 (0x0800), length 70: 192.168.80.252 > 224.0.0.18: VRRPv2, Advertisement, vrid 2, prio 240, authtype none, intvl 1s, length 36 4 packets captured 13788 packets received by filter 0 packets dropped by kernel That is with advskew=0 on the master and advskew=50 on the backup. Setting advskew=250 on the backup increases the interval to 3.98 seconds: $ sudo tcpdump -c 4 -e -n -i vlan2 carp and ether src 00:00:5e:00:01:02 and host 192.168.80.252 tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on vlan2, link-type EN10MB (Ethernet), capture size 65535 bytes 15:36:05.363113 00:00:5e:00:01:02 > 01:00:5e:00:00:12, ethertype IPv4 (0x0800), length 70: 192.168.80.252 > 224.0.0.18: VRRPv2, Advertisement, vrid 2, prio 250, authtype none, intvl 1s, length 36 15:36:09.342124 00:00:5e:00:01:02 > 01:00:5e:00:00:12, ethertype IPv4 (0x0800), length 70: 192.168.80.252 > 224.0.0.18: VRRPv2, Advertisement, vrid 2, prio 250, authtype none, intvl 1s, length 36 15:36:13.321122 00:00:5e:00:01:02 > 01:00:5e:00:00:12, ethertype IPv4 (0x0800), length 70: 192.168.80.252 > 224.0.0.18: VRRPv2, Advertisement, vrid 2, prio 250, authtype none, intvl 1s, length 36 15:36:17.300134 00:00:5e:00:01:02 > 01:00:5e:00:00:12, ethertype IPv4 (0x0800), length 70: 192.168.80.252 > 224.0.0.18: VRRPv2, Advertisement, vrid 2, prio 250, authtype none, intvl 1s, length 36 4 packets captured 19658 packets received by filter 0 packets dropped by kernel So, the carp backup leaks packets at the rate of exactly 3 seconds + the advskew interval (1s * advskew/255). And in fact the logs on the carp backup do indicate that it thinks it's doing the world a favor. It's full of: Apr 10 15:44:53 fw2 kernel: carp2: link state changed to UP Apr 10 15:44:53 fw2 kernel: carp2: MASTER -> BACKUP (more frequent advertisement received) Apr 10 15:44:53 fw2 kernel: carp2: link state changed to DOWN Apr 10 15:44:57 fw2 kernel: carp2: link state changed to UP Apr 10 15:44:57 fw2 kernel: carp2: MASTER -> BACKUP (more frequent advertisement received) Apr 10 15:44:57 fw2 kernel: carp2: link state changed to DOWN Apr 10 15:45:00 fw2 kernel: carp2: link state changed to UP Apr 10 15:45:00 fw2 kernel: carp2: MASTER -> BACKUP (more frequent advertisement received) Apr 10 15:45:00 fw2 kernel: carp2: link state changed to DOWN (Unfortunately I was only looking at the master's logs before. :( ) Does this shed any light on why this might be happening? Thanks!
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CABXB=RTLoLter-w6Tvu80gw0b2iV5eAC3nJk85qZOKA2K=V_TA>