From owner-freebsd-net@freebsd.org Thu Sep 22 19:35:53 2016 Return-Path: Delivered-To: freebsd-net@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 994E9BE5ECA for ; Thu, 22 Sep 2016 19:35:53 +0000 (UTC) (envelope-from killing@multiplay.co.uk) Received: from mail-wm0-x235.google.com (mail-wm0-x235.google.com [IPv6:2a00:1450:400c:c09::235]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 39892117A for ; Thu, 22 Sep 2016 19:35:53 +0000 (UTC) (envelope-from killing@multiplay.co.uk) Received: by mail-wm0-x235.google.com with SMTP id w84so272168394wmg.1 for ; Thu, 22 Sep 2016 12:35:53 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=multiplay-co-uk.20150623.gappssmtp.com; s=20150623; h=subject:to:references:cc:from:message-id:date:user-agent :mime-version:in-reply-to; bh=HvDeQtsgu1smOlX8Kwk7ulCO7DCQMr2W9p89l0eo1tY=; b=YEHxTCW8PNaCZEiVF83C2kA3JqNfQg5Z5Py7hscx2vbgKQysMNzxgS44ix5bVBxQ6O XzsRNEqXYesvnDoXjGnciZRJhIObbzA+EgaQW64PcP2U0xOfmXL7CR6cJwCq/wtgfiRO oCYqYFAaI1HhsFvjsncDK34/OL/0WZKp50rtf6pLWA7uFOFYmmuxtJ4+tMNEy6CasA/q UhfOzq9shEAw5yHwKYNqAjr4Wtkt3q5S2N5C5G79W7t3YxwHcSFry5CrGdSRpZTJY6Qz x9/VezBVWx0c59Qy4BOSkh36gGl4ZXZ6W4XcRis4eISNFC+Xp0LVndfgfFotqWIwqF0m 9VEQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:subject:to:references:cc:from:message-id:date :user-agent:mime-version:in-reply-to; bh=HvDeQtsgu1smOlX8Kwk7ulCO7DCQMr2W9p89l0eo1tY=; b=EXZaErafr1++/l5T2B1NQzTwuW3YDKn29HAwiE5DHPxtpndCVhRhpXJ0/8pqC3HUWK RyTGaMsT+SwDAHXQ+Titac1NRSttPC27De6HXmgInddui6Bq8yG9zW5XJmc5oV534i+O yaKSIeu/3JQh8xC/Ua8ypCsTRZwLF6abKflPNNTfPUuMOKzBNALMBVAVgYLOUNFgF6Tz aUS6vzGHSMguEZJcHvL+rUUxV8suPTlHrbBHbXJXCVJQddjWZS1PhJUKhbZN70omalkt 1w17oX3TXOGo2l8xhsK57ufaYYavzsmlv8wjPMrZiMRxVrApjh7qe1k3XexY8VgdM2yC 9kZg== X-Gm-Message-State: AE9vXwOWwO7THyqDBmXOaQUEX3J2F1rmbf58ocH6WBJQwBtR/rdL0kexi8onDlJ7mF1fgp5c X-Received: by 10.194.170.163 with SMTP id an3mr3887066wjc.73.1474572951787; Thu, 22 Sep 2016 12:35:51 -0700 (PDT) Received: from [10.10.1.58] (liv3d.labs.multiplay.co.uk. [82.69.141.171]) by smtp.gmail.com with ESMTPSA id a5sm3582348wjd.9.2016.09.22.12.35.50 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 22 Sep 2016 12:35:50 -0700 (PDT) Subject: Re: lagg Interfaces - don't do Gratuitous ARP? To: Gleb Smirnoff References: <20160921235703.GG1018@cell.glebi.us> <20160922025856.GH1018@cell.glebi.us> <348d534d-ef87-f90c-aa43-cc65c2f6283c@multiplay.co.uk> <20160922150940.GK1018@cell.glebi.us> <20160922154144.GO1018@cell.glebi.us> <0c678da4-bf72-5a81-aee1-d82a873661b7@multiplay.co.uk> <20160922160840.GP1018@cell.glebi.us> <80fd962a-fba3-d71e-a1cb-2b09181d3925@multiplay.co.uk> <20160922180359.GT1018@cell.glebi.us> Cc: Ryan Stone , Kubilay Kocak , freebsd-net , Karl Pielorz From: Steven Hartland Message-ID: Date: Thu, 22 Sep 2016 20:35:50 +0100 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:45.0) Gecko/20100101 Thunderbird/45.2.0 MIME-Version: 1.0 In-Reply-To: <20160922180359.GT1018@cell.glebi.us> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit X-Content-Filtered-By: Mailman/MimeDel 2.1.23 X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 22 Sep 2016 19:35:53 -0000 On 22/09/2016 19:03, Gleb Smirnoff wrote: > On Thu, Sep 22, 2016 at 05:50:09PM +0100, Steven Hartland wrote: > S> > S> We could but then what happens when its IPv6 or $other protocol that > S> > S> needs to know? That would require lagg to be edited with all the special > S> > S> cases instead of allowing the protocol to handle it they way it needs. > S> > > S> > You just said that "without GARP devices can and do ignore", didn't you? > S> > Let's take this as truth, although I doubt. So, if this is the truth, that > S> > means that if you are running IPv6 only, the switches won't recondigure > S> > theirselves due to lack of gratious ARP. > S> Not sure I follow you, gratuitous ARP is required for IPv4 to work, for > S> IPv6 you need an unsolicited neighbour announcement. > S> > Other protocols, where PPPoE is good example simply doesn't have any > S> > analogs of ARP or ND. So what would your switches do in that case? And > S> > what other layers are you going to hack, if you are going to run PPPoE > S> > service with lagg failover? > S> Good question, surely that's a good reason to have each protocol handle > S> it and not to teach LAGG about every possible protocol? > > No. It is not a good reason to have each protocol handle it. It is a > demonstration that this must be handled by a lower protocol layer - the L2, > which is the level where problem exists. > > S> > In reality, a layer 2 device must forward layer 2 traffic, and must > S> > reconfigure its forwarding table based on source addresses seen on ports. > S> > And that's what all devices I've seen do. So what if we actually try > S> > the approach, I suggested? I can write the patch for you if you want. > S> The main problem with LAGG in failover mode is ensuring the traffic is > S> sent to the correct port. > S> > S> When you have the scenario where a switch stack believes MAC XYZ is > S> accessible by port ABC then unless you tell it otherwise it will > S> continue to believe that and hence send traffic to said port. I'm sure > S> we'll agree that the standard for doing this for IPv4 is ARP and for > S> IPv6 is NA. > > No, we don't agree on that. I assert that the ARP is standard to map IPv4 > address to physical address, not to a port. Same for NA. The de-facto > standard for a switch to believe that MAC XYZ is accessible by port ABC > is looking at the source address of any packet on a port. > > S> When using LAGG and we loose the master port we need correct the > S> connected devices view (both direct and remote) of the world such that > S> traffic is now sent to a different physical port. > S> > S> Back in the day, when switches weren't so "smart", sending a correctly > S> address packet from the new port would potentially help, but with > S> smarter switches and stacking in the mix sticking to the "standards" > S> helps maintain compatibility and hence functionality with things like LAGG. > S> > S> Having tested with a number of vendor switches Cisco, Extreme and more > S> recently Arista only sending gratuitous ARP for IPv4 and unsolicited NA > S> for IPv6 reliably resulted in rapid failover between LAGG ports. > S> > S> Other methods like sending correctly addressed output from the new port > S> helped, we tested this with outbound pings from IPMI, but still resulted > S> in noticeable recovery delay. > > This means that switches are "smart" and are violating standards. If you want > to create a hack to deal with that, better keep this hack inside the module > that is affected by "smart" switches, in the lagg driver. And not plow through > all levels of network stack to satisfy demands of standard violators. > > So, please send a self made gratious ARP packet right from lagg(4). If the > switches work as you describe, that would work regardless of the actual > IPv4/IPv6/whatever configuration. > > S> > S> Overall, while the proposed change (https://reviews.freebsd.org/D4111) > S> > S> does involve changes to multiple layers it still feels like the right > S> > S> approach as it has the right layer dealing with the change instead of > S> > S> hard-coded assumptions. > S> > > S> > Sorry, it doesn't feel like the right approach. :( > S> Out of interest why has your opinion changed since your post here: > S> https://lists.freebsd.org/pipermail/freebsd-net/2012-February/031340.html ? > > I'm sorry, I didn't look at D4111, expecting that it is exactly the patch > that was backed out. I will review D4111. > They are similar in approach but incorporated additional feedback. Essentially it still follows your suggestion from 2012 which was: > 1) Network protocols should register theirselves on the ifnet_link_event > EVENTHANDLER(9). > 2) The inet4 should send gratutious ARP on this event. > 3) The inet6 should send NA. Hence my confusion ;-) Regards Steve