From owner-freebsd-questions@FreeBSD.ORG Fri Apr 10 15:48:06 2015 Return-Path: Delivered-To: freebsd-questions@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id B4DA4650; Fri, 10 Apr 2015 15:48:06 +0000 (UTC) Received: from mail-ig0-x22a.google.com (mail-ig0-x22a.google.com [IPv6:2607:f8b0:4001:c05::22a]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 7F8B9B55; Fri, 10 Apr 2015 15:48:06 +0000 (UTC) Received: by igbqf9 with SMTP id qf9so1152250igb.1; Fri, 10 Apr 2015 08:48:05 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:date:message-id:subject :from:to:content-type; bh=mIO7gwiJhU+0RVpOw2abRRNRPcY7zj3wUV/I7pfZU8w=; b=E1QG6mFQm9qTNwp1JJEuJsVCB65DdFhJ1mb2Dqnoatbqa13Flx2dJHN6qs7Cf1alyb wR2SPiKucbNa/aFm50DbKTth8ENn2/rivkRdwLmjLWeA3C0/Z/5ViMJlihYxgZbKvOuW 04tpazsJs4su+yphRrP7Fx4w6J7C4Pjt9H2RvtMcqrphhn7y77LLEnZH/BHpNaEaaR4z Yax5wuj2bsdtN3r/sFQvFddfvzu0MKDZrX7q7ZZBGfzh/IfMvPYEuKywTTNGogyiPgyC 20UywolBRZ/fZr9cLCtPq0XUNhXx7RVKwn0OcXU6iGV1BNqxJdWkN1PHpwrZc6jgeXsY Zzyw== MIME-Version: 1.0 X-Received: by 10.42.119.142 with SMTP id b14mr4502945icr.29.1428680885753; Fri, 10 Apr 2015 08:48:05 -0700 (PDT) Sender: jdavidlists@gmail.com Received: by 10.36.67.139 with HTTP; Fri, 10 Apr 2015 08:48:05 -0700 (PDT) In-Reply-To: References: Date: Fri, 10 Apr 2015 11:48:05 -0400 X-Google-Sender-Auth: sKZiDtAYXeQ_Mo_dLNJ_wBeLCRY Message-ID: Subject: Re: Why is backup carp node seeing traffic? From: J David To: freebsd-net@freebsd.org, "freebsd-questions@freebsd.org" Content-Type: text/plain; charset=UTF-8 X-BeenThere: freebsd-questions@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: User questions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 10 Apr 2015 15:48:06 -0000 On Fri, Apr 10, 2015 at 7:03 AM, J David wrote: > Why are packets showing up at the backup node? Why is it > intermittent? Can anything be done to eliminate this issue? To follow up on this, I think I have identified the problem behavior. Here is a tcpdump of carp traffic seen by the web test client (on vlan4): $ sudo tcpdump -e -i net0 -n carp and ether src 00:00:5e:00:01:04 tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on net0, link-type EN10MB (Ethernet), capture size 65535 bytes 15:20:36.659743 00:00:5e:00:01:04 > 01:00:5e:00:00:12, ethertype IPv4 (0x0800), length 70: 192.168.0.251 > 224.0.0.18: VRRPv2, Advertisement, vrid 4, prio 0, authtype none, intvl 1s, length 36 15:20:37.660173 00:00:5e:00:01:04 > 01:00:5e:00:00:12, ethertype IPv4 (0x0800), length 70: 192.168.0.251 > 224.0.0.18: VRRPv2, Advertisement, vrid 4, prio 0, authtype none, intvl 1s, length 36 15:20:38.660787 00:00:5e:00:01:04 > 01:00:5e:00:00:12, ethertype IPv4 (0x0800), length 70: 192.168.0.251 > 224.0.0.18: VRRPv2, Advertisement, vrid 4, prio 0, authtype none, intvl 1s, length 36 15:20:39.661177 00:00:5e:00:01:04 > 01:00:5e:00:00:12, ethertype IPv4 (0x0800), length 70: 192.168.0.251 > 224.0.0.18: VRRPv2, Advertisement, vrid 4, prio 0, authtype none, intvl 1s, length 36 15:20:40.661614 00:00:5e:00:01:04 > 01:00:5e:00:00:12, ethertype IPv4 (0x0800), length 70: 192.168.0.251 > 224.0.0.18: VRRPv2, Advertisement, vrid 4, prio 0, authtype none, intvl 1s, length 36 15:20:41.662149 00:00:5e:00:01:04 > 01:00:5e:00:00:12, ethertype IPv4 (0x0800), length 70: 192.168.0.251 > 224.0.0.18: VRRPv2, Advertisement, vrid 4, prio 0, authtype none, intvl 1s, length 36 15:20:42.662593 00:00:5e:00:01:04 > 01:00:5e:00:00:12, ethertype IPv4 (0x0800), length 70: 192.168.0.251 > 224.0.0.18: VRRPv2, Advertisement, vrid 4, prio 0, authtype none, intvl 1s, length 36 This is exactly what one would expect to see: carp advertisements only from the master node, every 1 second, like clockwork. Now, here is a tcpdump of carp traffic seen by the web server (on vlan2, where the problem is seen): $ sudo tcpdump -e -n -i vlan2 carp and ether src 00:00:5e:00:01:02 tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on vlan2, link-type EN10MB (Ethernet), capture size 65535 bytes 15:22:24.946405 00:00:5e:00:01:02 > 01:00:5e:00:00:12, ethertype IPv4 (0x0800), length 70: 192.168.80.251 > 224.0.0.18: VRRPv2, Advertisement, vrid 2, prio 0, authtype none, intvl 1s, length 36 15:22:25.946910 00:00:5e:00:01:02 > 01:00:5e:00:00:12, ethertype IPv4 (0x0800), length 70: 192.168.80.251 > 224.0.0.18: VRRPv2, Advertisement, vrid 2, prio 0, authtype none, intvl 1s, length 36 15:22:26.947406 00:00:5e:00:01:02 > 01:00:5e:00:00:12, ethertype IPv4 (0x0800), length 70: 192.168.80.251 > 224.0.0.18: VRRPv2, Advertisement, vrid 2, prio 0, authtype none, intvl 1s, length 36 15:22:27.299658 00:00:5e:00:01:02 > 01:00:5e:00:00:12, ethertype IPv4 (0x0800), length 70: 192.168.80.252 > 224.0.0.18: VRRPv2, Advertisement, vrid 2, prio 240, authtype none, intvl 1s, length 36 15:22:27.947905 00:00:5e:00:01:02 > 01:00:5e:00:00:12, ethertype IPv4 (0x0800), length 70: 192.168.80.251 > 224.0.0.18: VRRPv2, Advertisement, vrid 2, prio 0, authtype none, intvl 1s, length 36 15:22:28.948407 00:00:5e:00:01:02 > 01:00:5e:00:00:12, ethertype IPv4 (0x0800), length 70: 192.168.80.251 > 224.0.0.18: VRRPv2, Advertisement, vrid 2, prio 0, authtype none, intvl 1s, length 36 15:22:29.948906 00:00:5e:00:01:02 > 01:00:5e:00:00:12, ethertype IPv4 (0x0800), length 70: 192.168.80.251 > 224.0.0.18: VRRPv2, Advertisement, vrid 2, prio 0, authtype none, intvl 1s, length 36 15:22:30.692661 00:00:5e:00:01:02 > 01:00:5e:00:00:12, ethertype IPv4 (0x0800), length 70: 192.168.80.252 > 224.0.0.18: VRRPv2, Advertisement, vrid 2, prio 240, authtype none, intvl 1s, length 36 As before, the master is advertising once per second, like clockwork. Yet every few seconds, the backup advertises itself anyway, using the carp interface MAC to do it. It seems certain that this causes the switch to decide that MAC has now moved to the backup's switch port, so it sends traffic there until the next packet arrives with this source address from the master, switching it back. tcpdump run on the backup shows that it is receiving the advertisements from the master just fine, then acting like it didn't: $ sudo tcpdump -e -n -i vlan2 carp and ether src 00:00:5e:00:01:02 tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on vlan2, link-type EN10MB (Ethernet), capture size 65535 bytes 15:27:22.099936 00:00:5e:00:01:02 > 01:00:5e:00:00:12, ethertype IPv4 (0x0800), length 70: 192.168.80.251 > 224.0.0.18: VRRPv2, Advertisement, vrid 2, prio 0, authtype none, intvl 1s, length 36 15:27:22.099938 00:00:5e:00:01:02 > 01:00:5e:00:00:12, ethertype IPv4 (0x0800), length 70: 192.168.80.251 > 224.0.0.18: VRRPv2, Advertisement, vrid 2, prio 0, authtype none, intvl 1s, length 36 15:27:22.495786 00:00:5e:00:01:02 > 01:00:5e:00:00:12, ethertype IPv4 (0x0800), length 70: 192.168.80.252 > 224.0.0.18: VRRPv2, Advertisement, vrid 2, prio 240, authtype none, intvl 1s, length 36 The web server shows that the "leaking" advertisements from the backup happen every 3.2 seconds: $ sudo tcpdump -c 4 -e -n -i vlan2 carp and ether src 00:00:5e:00:01:02 and host 192.168.80.252 tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on vlan2, link-type EN10MB (Ethernet), capture size 65535 bytes 15:34:33.547068 00:00:5e:00:01:02 > 01:00:5e:00:00:12, ethertype IPv4 (0x0800), length 70: 192.168.80.252 > 224.0.0.18: VRRPv2, Advertisement, vrid 2, prio 240, authtype none, intvl 1s, length 36 15:34:36.745077 00:00:5e:00:01:02 > 01:00:5e:00:00:12, ethertype IPv4 (0x0800), length 70: 192.168.80.252 > 224.0.0.18: VRRPv2, Advertisement, vrid 2, prio 240, authtype none, intvl 1s, length 36 15:34:39.943078 00:00:5e:00:01:02 > 01:00:5e:00:00:12, ethertype IPv4 (0x0800), length 70: 192.168.80.252 > 224.0.0.18: VRRPv2, Advertisement, vrid 2, prio 240, authtype none, intvl 1s, length 36 15:34:43.141084 00:00:5e:00:01:02 > 01:00:5e:00:00:12, ethertype IPv4 (0x0800), length 70: 192.168.80.252 > 224.0.0.18: VRRPv2, Advertisement, vrid 2, prio 240, authtype none, intvl 1s, length 36 4 packets captured 13788 packets received by filter 0 packets dropped by kernel That is with advskew=0 on the master and advskew=50 on the backup. Setting advskew=250 on the backup increases the interval to 3.98 seconds: $ sudo tcpdump -c 4 -e -n -i vlan2 carp and ether src 00:00:5e:00:01:02 and host 192.168.80.252 tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on vlan2, link-type EN10MB (Ethernet), capture size 65535 bytes 15:36:05.363113 00:00:5e:00:01:02 > 01:00:5e:00:00:12, ethertype IPv4 (0x0800), length 70: 192.168.80.252 > 224.0.0.18: VRRPv2, Advertisement, vrid 2, prio 250, authtype none, intvl 1s, length 36 15:36:09.342124 00:00:5e:00:01:02 > 01:00:5e:00:00:12, ethertype IPv4 (0x0800), length 70: 192.168.80.252 > 224.0.0.18: VRRPv2, Advertisement, vrid 2, prio 250, authtype none, intvl 1s, length 36 15:36:13.321122 00:00:5e:00:01:02 > 01:00:5e:00:00:12, ethertype IPv4 (0x0800), length 70: 192.168.80.252 > 224.0.0.18: VRRPv2, Advertisement, vrid 2, prio 250, authtype none, intvl 1s, length 36 15:36:17.300134 00:00:5e:00:01:02 > 01:00:5e:00:00:12, ethertype IPv4 (0x0800), length 70: 192.168.80.252 > 224.0.0.18: VRRPv2, Advertisement, vrid 2, prio 250, authtype none, intvl 1s, length 36 4 packets captured 19658 packets received by filter 0 packets dropped by kernel So, the carp backup leaks packets at the rate of exactly 3 seconds + the advskew interval (1s * advskew/255). And in fact the logs on the carp backup do indicate that it thinks it's doing the world a favor. It's full of: Apr 10 15:44:53 fw2 kernel: carp2: link state changed to UP Apr 10 15:44:53 fw2 kernel: carp2: MASTER -> BACKUP (more frequent advertisement received) Apr 10 15:44:53 fw2 kernel: carp2: link state changed to DOWN Apr 10 15:44:57 fw2 kernel: carp2: link state changed to UP Apr 10 15:44:57 fw2 kernel: carp2: MASTER -> BACKUP (more frequent advertisement received) Apr 10 15:44:57 fw2 kernel: carp2: link state changed to DOWN Apr 10 15:45:00 fw2 kernel: carp2: link state changed to UP Apr 10 15:45:00 fw2 kernel: carp2: MASTER -> BACKUP (more frequent advertisement received) Apr 10 15:45:00 fw2 kernel: carp2: link state changed to DOWN (Unfortunately I was only looking at the master's logs before. :( ) Does this shed any light on why this might be happening? Thanks!