From owner-svn-src-all@freebsd.org Fri Dec 18 13:40:34 2015 Return-Path: Delivered-To: svn-src-all@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id C05FAA4B7C5 for ; Fri, 18 Dec 2015 13:40:34 +0000 (UTC) (envelope-from steven@multiplay.co.uk) Received: from mail-wm0-x22a.google.com (mail-wm0-x22a.google.com [IPv6:2a00:1450:400c:c09::22a]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 53F7E1AC4 for ; Fri, 18 Dec 2015 13:40:34 +0000 (UTC) (envelope-from steven@multiplay.co.uk) Received: by mail-wm0-x22a.google.com with SMTP id p187so65388268wmp.0 for ; Fri, 18 Dec 2015 05:40:34 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=multiplay-co-uk.20150623.gappssmtp.com; s=20150623; h=subject:to:references:cc:from:message-id:date:user-agent :mime-version:in-reply-to:content-type:content-transfer-encoding; bh=SPT3lUx/kxT+B0JidiZ0YwKC/EmPjlFkOJCI0rlqCsI=; b=JnET0daxU3XxvM7DEq9dV6AKyx/7LxLQqbkvnkFoaTSuOfnQmZ4qPr3q7bluvsXQDM VQ66kdBv+HqSs9k3314vZwFREI9BGA8qx8WkyAWPODyDGG2W2PxGB6zzOibJQqbzIX0u sitT2ndGBURvlop1QwR+Gri04KRnV7rX4m5MwKiV/IGBDBkxeZgSoIoAUjQPRV+OrLbJ WZrHjyRggQGmglrLUeWsgQSA8+pInU72BRIxvdaFvypSIleZNTeFEgSH7VWoVeX4/6B2 AqJ4KF4SaayugPy/8wpCsOL4//JN4T0g2dsFK2uS4PjHxsFjFEZN2J74G/2cMTF88U/v jXzQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:subject:to:references:cc:from:message-id:date :user-agent:mime-version:in-reply-to:content-type :content-transfer-encoding; bh=SPT3lUx/kxT+B0JidiZ0YwKC/EmPjlFkOJCI0rlqCsI=; b=X65f+fUa+1HpKLLpAkAXpYjIoSmLC/7WIrgtveulIATwxj/NNyfkFElwK/mPkuHZDP d77MBOc58wgEhWxWCNPoFjtYsYxynNZB1l4Bx2Z0X0vk0WHPI5JSJaeq+RvEbFecZaaY 3EbV9jBDnCHol+UcgHT/sWuxGefdMer0XqIFFZ8ykevrlJOk84bQQihqBF97VSMwFWtQ oeO6adkh92GU80AcmhQ4O6TF1J7LZ31mqKOLcQY5Tto3h8ADo4Nj7NmJuazSIzapDmKs ya3be3ELT5P+B1IXsQ/FHZ5n3ZsGd3EPiV+7Yo63Rhwxg1JFl8rn2XTZFd06fTMvY16Q QhWA== X-Gm-Message-State: ALoCoQlxEgrunMrwrduDVE3iaLPAR7BPjpsXtjPj+4obSWwYI2vLfKg885Yq+Vsm59FGJVCpewV6ABMYVg7NTN1Pq3ffNHE6pw== X-Received: by 10.28.189.5 with SMTP id n5mr3375799wmf.76.1450446032767; Fri, 18 Dec 2015 05:40:32 -0800 (PST) Received: from [10.10.1.58] (liv3d.labs.multiplay.co.uk. [82.69.141.171]) by smtp.gmail.com with ESMTPSA id u126sm6723693wme.3.2015.12.18.05.40.31 (version=TLSv1/SSLv3 cipher=OTHER); Fri, 18 Dec 2015 05:40:31 -0800 (PST) Subject: Re: svn commit: r292379 - in head/sys: netinet netinet6 To: Gleb Smirnoff References: <201512162226.tBGMQSvs098886@repo.freebsd.org> <20151217003824.GG42340@FreeBSD.org> <5672C6AE.7070407@freebsd.org> <20151217191630.GL42340@FreeBSD.org> <567344BC.20501@multiplay.co.uk> <20151217234630.GX42340@FreeBSD.org> Cc: src-committers@freebsd.org, svn-src-all@freebsd.org, svn-src-head@freebsd.org From: Steven Hartland Message-ID: <56740CCF.2070101@multiplay.co.uk> Date: Fri, 18 Dec 2015 13:40:31 +0000 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:38.0) Gecko/20100101 Thunderbird/38.4.0 MIME-Version: 1.0 In-Reply-To: <20151217234630.GX42340@FreeBSD.org> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit X-BeenThere: svn-src-all@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: "SVN commit messages for the entire src tree \(except for " user" and " projects" \)" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 18 Dec 2015 13:40:34 -0000 On 17/12/2015 23:46, Gleb Smirnoff wrote: > On Thu, Dec 17, 2015 at 11:26:52PM +0000, Steven Hartland wrote: > S> You may have not read all the detail in the review so you might not have > S> noticed that I > S> identified that carp IPv6 NA was broken by r251584 which was committed 2 1/2 > S> years ago. I'm guessing not may people use it for IPv6. > > My suggestion is to look at this regression separated from the lagg failover > and fix it separately. We could, but from this new code it was a few characters, implemented separately you'd need a good portion of the code from this change anyway, so it made sense to just include it here IMO. > S> > The "link aggregation" itself refers to an aggregation of links between > S> > two logical devices. If you build lagg(4) interface on top of two ports > S> > that are plugged into different switches, you are calling for trouble. > S> > S> While multiple switches complicates the matter its not the only issue as > S> you can > S> reproduce this with a single switch and two nics in LAGG failover mode > S> with a simple > S> ifconfig down. At this point any traffic entering the switch for > S> LAGG member > S> will back-whole instead of being received by the other nic. > S> > S> It is much more common in networking now to have multiple physical switches > S> configured as part of bigger logical devices using protocols such as > S> MLAG, which is > S> what we're using with Cisco's and Arista's, so not some cheepo network ;-) > > Right, you are confirming what I said above. Multiple physical devices, but > still one logical on each side of lagg. In our target environment this is correct. > S> > Nevertheless, someone wants to give a kick to this initially broken > S> > network design and run it somehow. And this "somehow" implies Layer2 > S> > upcalling into upper layers to do something, since there is no > S> > established standard layer2 heartbeat packet. I have chatted with > S> > networking gurus at my job, and they said, that they don't know > S> > any decent network equipment that supports such setup. However, they > S> > noticed that Windows is capable for such failover. I haven't yet > S> > learned on how Windows solves the problem. Actually, those who > S> > pushed committing 156226 should have done these investigations. > S> > Probably Windows does exactly the same, sends gratutious ARP or > S> > its IPv6 analog. Or may be does something better like sending > S> > useless L2 datagram, but with a proper source hardware address. > S> Actually our testing here showed both Windows and Linux worked as > S> expected and > S> from my reading doing the GARP / UNA is actually expected in this > S> situation, for this very reason. > > Is it possible for you to sniff the traffic and see what actually happens > in there? My expectations are the same, but want to be sure. Netops here did do that, which lead them to conclude the missing GARP/NA. > S> I'd like to step back for a second and get you feedback on the changes > S> that where > S> reverted, which didn't have the DELAY in the callout. What where the > S> issues as you > S> saw them? So we don't spam people any more I've reopened the review so > S> we can > S> take this there: https://reviews.freebsd.org/D4111 > > Before going into implementation, can we first settle on the protocol? > Could be that GARP/NA is the only solution there, but let's be sure first. > I did try forcing traffic out from backup interface using the console once the primary was down, and unfortunately that didn't help. net.link.lagg.failover_rx_all=1 helps in the converse test but the only thing we found that fixed it fully in a timely manor was GARP/NA. In the tests you can clearly see the impact of ARP timeouts as sometimes it would converge quicker than others. If you would like me to try something else by all means LMK. Regards Steve