From owner-freebsd-net@freebsd.org Thu Sep 22 16:50:12 2016 Return-Path: Delivered-To: freebsd-net@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id E1FEABE65D6 for ; Thu, 22 Sep 2016 16:50:12 +0000 (UTC) (envelope-from killing@multiplay.co.uk) Received: from mail-wm0-x22f.google.com (mail-wm0-x22f.google.com [IPv6:2a00:1450:400c:c09::22f]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 8F29110D2 for ; Thu, 22 Sep 2016 16:50:12 +0000 (UTC) (envelope-from killing@multiplay.co.uk) Received: by mail-wm0-x22f.google.com with SMTP id w84so263858928wmg.1 for ; Thu, 22 Sep 2016 09:50:12 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=multiplay-co-uk.20150623.gappssmtp.com; s=20150623; h=subject:to:references:cc:from:message-id:date:user-agent :mime-version:in-reply-to; bh=nwpR30Dhz8/DyJX/qrXFlQjLFnDatMSNL4m4zdXTxzk=; b=lu5Igf8DjIhhj6Wqll/JMG95MwheCSOIGjwTO/m9s50QKR4HKP4eB6KjD3D6Xijcwy yrAO7Nv58EtNwkIqINj6Qgk7Yg3Mud1v3u8zwARGvwr5Rlm62PSagI6TPLqhtokxtk17 fZweZ2ViUWBhbZrDVY68RebNXCstr2YrW/UB5deif8rQntC1AFZRU6/EycnEqMvKrZv8 qyNUwpSASiFgNGChrN29IVgsjmD2ktSkkn7GvaRH1aTtv5eaoxdFTNGWHOScgMNgKR2v B5hk1bSm5m02RUAVRddB8xezqlAu5mToAW5cjEDXZO56XkQYqeuh3/wwSpN8MFFBUIgf 7zxA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:subject:to:references:cc:from:message-id:date :user-agent:mime-version:in-reply-to; bh=nwpR30Dhz8/DyJX/qrXFlQjLFnDatMSNL4m4zdXTxzk=; b=FZAm97ZJUfKWMGK6pTmqWrtDBr09yV/YYO4sqkegIm8y08LuXoIaCPQwamRD0t4OaR boj9UeYa3XPulYV8Qt3Qw8Yh0zk8FnZMNsRJE0CRNDRRLxY0SX+jKaKwWBE2uvJ4xeE5 AJh3yM4sXZYf1/1K3A3vJVBSYku/oCRKFaOJKvIYkBFFg/BP4/6mN6y6q3NuE8rqAzAC TIwIMAgO/NQKe7/tQaRLen8ts5h2ZdV6eg//UL+0fG5zIrMCCx2CwTaBSsG2M2KaQ5sc tISHbJljqonzoAMHGr6+isRVWHAQ/V1aziecIXlGLiQwvLjqnx3VPQPiSE2h8B+ommsp bzIg== X-Gm-Message-State: AE9vXwMFolirfKxhMtbTpthjzN/2t+aA3bDKyDqgaoEj1xsrc2F/O2GMktfUkgWWGpUHz9NB X-Received: by 10.28.7.80 with SMTP id 77mr3293595wmh.28.1474563010821; Thu, 22 Sep 2016 09:50:10 -0700 (PDT) Received: from [10.10.1.58] (liv3d.labs.multiplay.co.uk. [82.69.141.171]) by smtp.gmail.com with ESMTPSA id n28sm3379160wmi.2.2016.09.22.09.50.09 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 22 Sep 2016 09:50:09 -0700 (PDT) Subject: Re: lagg Interfaces - don't do Gratuitous ARP? To: Gleb Smirnoff References: <6E574F1B61786E6032824A88@10.12.30.106> <2c62f5f0-3fb4-f513-2a8f-02de3a1d552f@FreeBSD.org> <20160921235703.GG1018@cell.glebi.us> <20160922025856.GH1018@cell.glebi.us> <348d534d-ef87-f90c-aa43-cc65c2f6283c@multiplay.co.uk> <20160922150940.GK1018@cell.glebi.us> <20160922154144.GO1018@cell.glebi.us> <0c678da4-bf72-5a81-aee1-d82a873661b7@multiplay.co.uk> <20160922160840.GP1018@cell.glebi.us> Cc: Ryan Stone , Kubilay Kocak , freebsd-net , Karl Pielorz From: Steven Hartland Message-ID: <80fd962a-fba3-d71e-a1cb-2b09181d3925@multiplay.co.uk> Date: Thu, 22 Sep 2016 17:50:09 +0100 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:45.0) Gecko/20100101 Thunderbird/45.2.0 MIME-Version: 1.0 In-Reply-To: <20160922160840.GP1018@cell.glebi.us> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit X-Content-Filtered-By: Mailman/MimeDel 2.1.23 X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 22 Sep 2016 16:50:13 -0000 On 22/09/2016 17:08, Gleb Smirnoff wrote: > On Thu, Sep 22, 2016 at 04:52:35PM +0100, Steven Hartland wrote: > S> > S> > S> > Does lagg(4) hardware address change when it failovers? > S> > S> > S> > > S> > S> > S> It moves the address between interfaces which typically moves it between > S> > S> > S> switches too. > S> > S> > > S> > S> > So, the address doesn't change, which means ARP cache doesn't need to > S> > S> > change as well. If it moves between switches, all that needs to be done > S> > S> > is to send whatever packet from proper hardware address to broadcast. > S> > S> > > S> > S> That would be nice but unfortunately in the wild that won't work as > S> > S> without GARP devices can and do ignore :( > S> > > S> > You can create a fake gratious ARP packet, if you want. Switches must not > S> > require IP addresses matching the reality in the packet. > S> > > S> > P.S. I always read GARP as Generic Attribute Registration Protocol. > S> > > S> We could but then what happens when its IPv6 or $other protocol that > S> needs to know? That would require lagg to be edited with all the special > S> cases instead of allowing the protocol to handle it they way it needs. > > You just said that "without GARP devices can and do ignore", didn't you? > Let's take this as truth, although I doubt. So, if this is the truth, that > means that if you are running IPv6 only, the switches won't recondigure > theirselves due to lack of gratious ARP. Not sure I follow you, gratuitous ARP is required for IPv4 to work, for IPv6 you need an unsolicited neighbour announcement. > Other protocols, where PPPoE is good example simply doesn't have any > analogs of ARP or ND. So what would your switches do in that case? And > what other layers are you going to hack, if you are going to run PPPoE > service with lagg failover? Good question, surely that's a good reason to have each protocol handle it and not to teach LAGG about every possible protocol? > In reality, a layer 2 device must forward layer 2 traffic, and must > reconfigure its forwarding table based on source addresses seen on ports. > And that's what all devices I've seen do. So what if we actually try > the approach, I suggested? I can write the patch for you if you want. The main problem with LAGG in failover mode is ensuring the traffic is sent to the correct port. When you have the scenario where a switch stack believes MAC XYZ is accessible by port ABC then unless you tell it otherwise it will continue to believe that and hence send traffic to said port. I'm sure we'll agree that the standard for doing this for IPv4 is ARP and for IPv6 is NA. When using LAGG and we loose the master port we need correct the connected devices view (both direct and remote) of the world such that traffic is now sent to a different physical port. Back in the day, when switches weren't so "smart", sending a correctly address packet from the new port would potentially help, but with smarter switches and stacking in the mix sticking to the "standards" helps maintain compatibility and hence functionality with things like LAGG. Having tested with a number of vendor switches Cisco, Extreme and more recently Arista only sending gratuitous ARP for IPv4 and unsolicited NA for IPv6 reliably resulted in rapid failover between LAGG ports. Other methods like sending correctly addressed output from the new port helped, we tested this with outbound pings from IPMI, but still resulted in noticeable recovery delay. > S> Overall, while the proposed change (https://reviews.freebsd.org/D4111) > S> does involve changes to multiple layers it still feels like the right > S> approach as it has the right layer dealing with the change instead of > S> hard-coded assumptions. > > Sorry, it doesn't feel like the right approach. :( Out of interest why has your opinion changed since your post here: https://lists.freebsd.org/pipermail/freebsd-net/2012-February/031340.html ?