From owner-freebsd-net@FreeBSD.ORG Fri Apr 10 11:03:22 2015 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id B542719E; Fri, 10 Apr 2015 11:03:22 +0000 (UTC) Received: from mail-ig0-x229.google.com (mail-ig0-x229.google.com [IPv6:2607:f8b0:4001:c05::229]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 7CDA922E; Fri, 10 Apr 2015 11:03:22 +0000 (UTC) Received: by iget9 with SMTP id t9so12340521ige.1; Fri, 10 Apr 2015 04:03:21 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:date:message-id:subject:from:to:content-type; bh=ipjR6BYI6i0KX5KleHVkXZOHvVUxcnK22SGyr/Cr6dQ=; b=ulyMyVOyzUlMJrcQnKSKDePYH7NrI1601VjRveC0jPQFYbSlbDUTzvCg5cCqEbI4IG qeUEJ1PIVnGGQQ3RzWRIe6/25uHh0se9bj9s6R7zJT25bDfzaCM8oLaTfGzZjmArfrzR Mmh4DsbpZ1LlxKSQ/PtehTi1JN5lwRfi5X9NQrAChcayPlviGgHB03cjqfHYPB1sfop4 RbQKXYbKfBSoMWpF3JOpH6jRIp7zxk2XrG4gKwWdoZsi7XkvFAEHQOK43vBkFLc8ARao t/E7J2tHRFNsYmEWQWMpA9wPCsoKyR4iPY/ORUO64VG7kmgu6gUWF88Wr9aIGA8B2ZpE Mm3w== MIME-Version: 1.0 X-Received: by 10.107.130.145 with SMTP id m17mr1451070ioi.89.1428663801815; Fri, 10 Apr 2015 04:03:21 -0700 (PDT) Sender: jdavidlists@gmail.com Received: by 10.36.67.139 with HTTP; Fri, 10 Apr 2015 04:03:21 -0700 (PDT) Date: Fri, 10 Apr 2015 07:03:21 -0400 X-Google-Sender-Auth: eJSSFAe4gFi88H-8Zncz_cSyl4M Message-ID: Subject: Why is backup carp node seeing traffic? From: J David To: freebsd-net@freebsd.org, "freebsd-questions@freebsd.org" Content-Type: text/plain; charset=UTF-8 X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 10 Apr 2015 11:03:22 -0000 With a setup containing two carp/pf firewall/load-balancers in front of six web servers, some packets are popping up on the backup carp node. All eight machines are connected to a (fault tolerant) 10Gbit switch over dual ix connections in an LACP lagg. The carp/pf nodes use VLANs to split incoming and outgoing traffic. With some careful tcpdumping, we have identified that the problem is that every now and then (about 1 in 1000 connections), the first SYN-ACK packet from a web server is returned to the backup carp node instead of the primary. There, it gets dropped and there's a 3-second delay until the client host retransmits the SYN packet. Actually quite a bit more traffic than that winds up on the backup node, maybe as much as a few % of total traffic, but it seems like the initial SYN packet is the only one that gets dropped. This may be because pfsync hasn't created the state on the backup node yet? Here's what the tcpdump looks like from each machine, showing all the packets on the web server, all but one of the packets on the primary carp node, and the "missing" SYN-ACK packet on the backup carp node. Web server: 07:02:05.257968 IP 192.168.0.17.30892 > 192.168.80.2.80: Flags [S], seq 1377231734, win 65535, options [mss 1460,nop,wscale 6,sackOK,TS val 371382342 ecr 0], length 0 07:02:05.258016 IP 192.168.80.2.80 > 192.168.0.17.30892: Flags [S.], seq 1053809273, ack 1377231735, win 0, options [mss 1460], length 0 07:02:08.254929 IP 192.168.0.17.30892 > 192.168.80.2.80: Flags [S], seq 1377231734, win 65535, options [mss 1460,nop,wscale 6,sackOK,TS val 371385342 ecr 0], length 0 07:02:08.254943 IP 192.168.80.2.80 > 192.168.0.17.30892: Flags [S.], seq 1053809273, ack 1377231735, win 0, options [mss 1460], length 0 07:02:08.255332 IP 192.168.0.17.30892 > 192.168.80.2.80: Flags [.], ack 1, win 65535, length 0 07:02:08.255383 IP 192.168.80.2.80 > 192.168.0.17.30892: Flags [.], ack 1, win 65535, length 0 07:02:08.256155 IP 192.168.0.17.30892 > 192.168.80.2.80: Flags [P.], seq 1:28, ack 1, win 65535, length 27 Primary carp node: (client is identical) 07:02:05.258324 IP 192.168.0.17.30892 > 192.168.80.2.80: Flags [S], seq 1377231734, win 65535, options [mss 1460,nop,wscale 6,sackOK,TS val 371382342 ecr 0], length 0 07:02:08.255287 IP 192.168.0.17.30892 > 192.168.80.2.80: Flags [S], seq 1377231734, win 65535, options [mss 1460,nop,wscale 6,sackOK,TS val 371385342 ecr 0], length 0 07:02:08.255331 IP 192.168.80.2.80 > 192.168.0.17.30892: Flags [S.], seq 1053809273, ack 1377231735, win 0, options [mss 1460], length 0 07:02:08.255691 IP 192.168.0.17.30892 > 192.168.80.2.80: Flags [.], ack 1, win 65535, length 0 07:02:08.255766 IP 192.168.80.2.80 > 192.168.0.17.30892: Flags [.], ack 1, win 65535, length 0 07:02:08.256519 IP 192.168.0.17.30892 > 192.168.80.2.80: Flags [P.], seq 1:28, ack 1, win 65535, length 27 Backup carp node: 07:02:05.263024 IP 192.168.80.2.80 > 192.168.0.17.30892: Flags [S.], seq 1053809273, ack 1377231735, win 0, options [mss 1460], length 0 There are no carp failovers, interface write errors are <1/million and the error counters don't increase while the problem is happening. Switch says no errors, no LACP drops, nothing. The total traffic levels are low. This problem happens in waves, weirdly usually at low-usage times like the middle of the night. The primary carp node is ~98% idle. All the hardware checks out, and they are all physical machines, no virtualization or jails or anything going on. The logs are silent while this is happening. Six of the eight systems, including both carp nodes, are running 9.3. Two of the web servers are running 10.1. The problem does happen with all six web servers. The problem is definitely related to carp. If the default route is changed from the carp address to the master node's address on that same interface on some or all of the web servers, the problem will go away for only those servers. As soon as it's switched back, the problem returns. Similarly, running a flood ping against the carp address from a web server: $ sudo ping -c 100000 -f 192.168.80.254 PING 192.168.80.254 (192.168.80.254): 56 data bytes ......................................................... --- 192.168.80.254 ping statistics --- 100000 packets transmitted, 99944 packets received, 0.1% packet loss round-trip min/avg/max/stddev = 0.017/0.024/0.535/0.013 ms Produced exactly 56 stray ICMP echo request (100000 - 99944 = 56) on the backup server: $ sudo tcpdump -n -i vlan2 host 192.168.80.254 and icmp tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on vlan2, link-type EN10MB (Ethernet), capture size 65535 bytes 10:36:34.432811 IP 192.168.80.6 > 192.168.80.254: ICMP echo request, id 22737, seq 16720, length 64 10:36:34.443833 IP 192.168.80.6 > 192.168.80.254: ICMP echo request, id 22737, seq 16721, length 64 10:36:34.454830 IP 192.168.80.6 > 192.168.80.254: ICMP echo request, id 22737, seq 16722, length 64 10:36:34.465831 IP 192.168.80.6 > 192.168.80.254: ICMP echo request, id 22737, seq 16723, length 64 [ ... etc ... ] ^C 56 packets captured 1219 packets received by filter 0 packets dropped by kernel One thing of note is that the stray packets all appear to be about 11ms apart. That pattern is consistent across all 56 packets and is repeatable. Finally, here is the ifconfig on the carp nodes, from the carp interface all the way down to the wire. Master: carp2: flags=49 metric 0 mtu 1500 inet 192.168.80.254 netmask 0xfffffff0 inet6 fd00:0:0:80::254 prefixlen 64 nd6 options=21 carp: MASTER vhid 2 advbase 1 advskew 0 vlan2: flags=8943 metric 0 mtu 1500 options=303 ether 90:e2:ba:0d:0a:98 inet 192.168.80.251 netmask 0xfffffff0 broadcast 192.168.80.255 inet6 fe80::92e2:baff:fe0d:a98%vlan2 prefixlen 64 scopeid 0xa inet6 fd00:0:0:80::251 prefixlen 64 nd6 options=21 media: Ethernet autoselect status: active vlan: 2 parent interface: lagg0 lagg0: flags=8943 metric 0 mtu 1500 options=407bb ether 90:e2:ba:0d:0a:98 inet6 fe80::92e2:baff:fe0d:a98%lagg0 prefixlen 64 scopeid 0x8 nd6 options=29 media: Ethernet autoselect status: active laggproto lacp lagghash l2,l3,l4 laggport: ix1 flags=1c laggport: ix0 flags=1c ix0: flags=8943 metric 0 mtu 1500 options=407bb ether 90:e2:ba:0d:0a:98 inet6 fe80::92e2:baff:fe0d:a98%ix0 prefixlen 64 scopeid 0x1 nd6 options=29 media: Ethernet autoselect (10Gbase-Twinax ) status: active ix1: flags=8943 metric 0 mtu 1500 options=407bb ether 90:e2:ba:0d:0a:98 inet6 fe80::92e2:baff:fe0d:a99%ix1 prefixlen 64 scopeid 0x2 nd6 options=29 media: Ethernet autoselect (10Gbase-Twinax ) status: active Backup: carp2: flags=49 metric 0 mtu 1500 inet 192.168.80.254 netmask 0xfffffff0 inet6 fd00:0:0:80::254 prefixlen 64 nd6 options=21 carp: BACKUP vhid 2 advbase 1 advskew 150 vlan2: flags=8943 metric 0 mtu 1500 options=303 ether 90:e2:ba:0d:0a:7d inet 192.168.80.252 netmask 0xfffffff0 broadcast 192.168.80.255 inet6 fe80::92e2:baff:fe0d:a7d%vlan2 prefixlen 64 scopeid 0xa inet6 fd00:0:0:80::252 prefixlen 64 nd6 options=21 media: Ethernet autoselect status: active vlan: 2 parent interface: lagg0 lagg0: flags=8943 metric 0 mtu 1500 options=407bb ether 90:e2:ba:0d:0a:7d inet6 fe80::92e2:baff:fe0d:a7d%lagg0 prefixlen 64 scopeid 0x8 nd6 options=29 media: Ethernet autoselect status: active laggproto lacp lagghash l2,l3,l4 laggport: ix0 flags=1c laggport: ix1 flags=1c ix0: flags=8943 metric 0 mtu 1500 options=407bb ether 90:e2:ba:0d:0a:7d inet6 fe80::92e2:baff:fe0d:a7c%ix0 prefixlen 64 scopeid 0x1 nd6 options=29 media: Ethernet autoselect (10Gbase-Twinax ) status: active ix1: flags=8943 metric 0 mtu 1500 options=407bb ether 90:e2:ba:0d:0a:7d inet6 fe80::92e2:baff:fe0d:a7d%ix1 prefixlen 64 scopeid 0x2 nd6 options=29 media: Ethernet autoselect (10Gbase-Twinax ) status: active Why are packets showing up at the backup node? Why is it intermittent? Can anything be done to eliminate this issue? Thanks for any advice!