From owner-freebsd-net@FreeBSD.ORG Sun Jul 26 23:25:14 2009 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id CF17E1065670 for ; Sun, 26 Jul 2009 23:25:14 +0000 (UTC) (envelope-from mgrooms@shrew.net) Received: from shrew.net (shrew.net [206.223.169.85]) by mx1.freebsd.org (Postfix) with ESMTP id 9EFED8FC1B for ; Sun, 26 Jul 2009 23:25:14 +0000 (UTC) (envelope-from mgrooms@shrew.net) Received: from localhost (unknown [206.223.169.82]) by shrew.net (Postfix) with ESMTP id 6916379E31B; Sun, 26 Jul 2009 18:25:14 -0500 (CDT) Received: from shrew.net ([206.223.169.85]) by localhost (mx1.hub.org [206.223.169.82]) (amavisd-new, port 10024) with ESMTP id 26118-10; Sun, 26 Jul 2009 23:25:14 +0000 (UTC) Received: from hole.shrew.net (cpe-66-25-161-129.austin.res.rr.com [66.25.161.129]) by shrew.net (Postfix) with ESMTP id B97BA79E2DD; Sun, 26 Jul 2009 18:25:13 -0500 (CDT) Received: from [10.22.200.30] (elon.shrew.net [10.22.200.30]) by hole.shrew.net (8.14.3/8.14.3) with ESMTP id n6QNNQ1w016517 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NOT); Sun, 26 Jul 2009 18:23:26 -0500 (CDT) (envelope-from mgrooms@shrew.net) Message-ID: <4A6CE5D6.3020709@shrew.net> Date: Sun, 26 Jul 2009 18:25:10 -0500 From: Matthew Grooms User-Agent: Thunderbird 2.0.0.22 (Windows/20090605) MIME-Version: 1.0 To: Max Laier References: <4A638E76.2060706@shrew.net> <4A63A4B3.6090500@modulus.org> <3D3254E2-4E45-4C67-84D2-DB05660D768F@shrew.net> <200907201318.08122.max@love2party.net> In-Reply-To: <200907201318.08122.max@love2party.net> Content-Type: multipart/mixed; boundary="------------030001080308090008010309" Cc: freebsd-net@freebsd.org Subject: Re: FreeBSD + carp on VMWare ESX X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 26 Jul 2009 23:25:15 -0000 This is a multi-part message in MIME format. --------------030001080308090008010309 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Max Laier wrote: > > There is clearly something very wrong with how the vswitch works and it's not > really FreeBSD's job to work around these issues. The patch you posted is > rather intrusive and certainly not something we want in the tree. You should > talk to VMWare's support to fix the obvious short-comings in the vswitch > design. > I agree completely. We have a support contract with VMWare and I intend to open a ticket. However I think the likelihood of them providing a fix for this just about zero. Who knows, maybe they will surprise me. > As for your patch - you want "IF_ADDR_[UN]LOCK(ifp);" around walking the > address list. Don't forget to unlock before the return. > Thanks for the help. Here is an updated patch against 7.2 if anyone else has a VMWare + FreeBSD + CARP problem child and needs a fix today ... http://www.shrew.net/static/patches/esx-carp.diff The IPv6 code path is untested. Also, the changes were placed under a sysctl conditional so the following is required in /etc/sysctl.conf to enable it at boot time ... net.inet.carp.drop_echoed=1 Thanks again, -Matthew --------------030001080308090008010309 Content-Type: text/plain; name="esx-carp.diff" Content-Transfer-Encoding: 7bit Content-Disposition: inline; filename="esx-carp.diff" Index: ip_carp.c =================================================================== RCS file: /home/ncvs/src/sys/netinet/ip_carp.c,v retrieving revision 1.52.2.3 diff -u -r1.52.2.3 ip_carp.c --- ip_carp.c 9 May 2009 00:35:38 -0000 1.52.2.3 +++ ip_carp.c 26 Jul 2009 22:46:19 -0000 @@ -143,6 +143,8 @@ &carp_opts[CARPCTL_LOG], 0, "log bad carp packets"); SYSCTL_INT(_net_inet_carp, CARPCTL_ARPBALANCE, arpbalance, CTLFLAG_RW, &carp_opts[CARPCTL_ARPBALANCE], 0, "balance arp responses"); +SYSCTL_INT(_net_inet_carp, CARPCTL_DROPECHOED, drop_echoed, CTLFLAG_RW, + &carp_opts[CARPCTL_DROPECHOED], 0, "drop packets echoed to sender"); SYSCTL_INT(_net_inet_carp, OID_AUTO, suppress_preempt, CTLFLAG_RD, &carp_suppress_preempt, 0, "Preemption is suppressed"); @@ -552,6 +554,28 @@ return; } + /* + * verify that the source address is not valid + * for the interface it was received on. this + * tends to happen with VMWare ESX vSwitches. + */ + if (carp_opts[CARPCTL_DROPECHOED]) { + struct ifnet *ifp = m->m_pkthdr.rcvif; + struct ifaddr *ifa; + IF_ADDR_LOCK(ifp); + TAILQ_FOREACH(ifa, &ifp->if_addrlist, ifa_list) { + struct in_addr in4; + in4 = ifatoia(ifa)->ia_addr.sin_addr; + if (ifa->ifa_addr->sa_family == AF_INET && + in4.s_addr == ip->ip_src.s_addr) { + IF_ADDR_UNLOCK(ifp); + m_freem(m); + return; + } + } + IF_ADDR_UNLOCK(ifp); + } + /* verify that the IP TTL is 255. */ if (ip->ip_ttl != CARP_DFLTTL) { carpstats.carps_badttl++; @@ -644,6 +668,28 @@ return (IPPROTO_DONE); } + /* + * verify that the source address is not valid + * for the interface it was received on. this + * tends to happen with VMWare ESX vSwitches. + */ + if (carp_opts[CARPCTL_DROPECHOED]) { + struct ifnet *ifp = m->m_pkthdr.rcvif; + struct ifaddr *ifa; + IF_ADDR_LOCK(ifp); + TAILQ_FOREACH(ifa, &ifp->if_addrlist, ifa_list) { + struct in6_addr in6; + in6 = ifatoia6(ifa)->ia_addr.sin6_addr; + if (ifa->ifa_addr->sa_family == AF_INET6 && + memcmp(&in6, &ip6->ip6_src, sizeof(in6)) == 0) { + IF_ADDR_UNLOCK(ifp); + m_freem(m); + return (IPPROTO_DONE); + } + } + IF_ADDR_UNLOCK(ifp); + } + /* verify that the IP TTL is 255 */ if (ip6->ip6_hlim != CARP_DFLTTL) { carpstats.carps_badttl++; Index: ip_carp.h =================================================================== RCS file: /home/ncvs/src/sys/netinet/ip_carp.h,v retrieving revision 1.3 diff -u -r1.3 ip_carp.h --- ip_carp.h 1 Dec 2006 18:37:41 -0000 1.3 +++ ip_carp.h 26 Jul 2009 22:46:19 -0000 @@ -140,7 +140,8 @@ #define CARPCTL_LOG 3 /* log bad packets */ #define CARPCTL_STATS 4 /* statistics (read-only) */ #define CARPCTL_ARPBALANCE 5 /* balance arp responses */ -#define CARPCTL_MAXID 6 +#define CARPCTL_DROPECHOED 6 /* drop packets echoed to the sender */ +#define CARPCTL_MAXID 7 #define CARPCTL_NAMES { \ { 0, 0 }, \ --------------030001080308090008010309--