From owner-svn-src-all@freebsd.org Thu Dec 17 23:26:56 2015 Return-Path: Delivered-To: svn-src-all@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 6CAB2A4A245 for ; Thu, 17 Dec 2015 23:26:56 +0000 (UTC) (envelope-from steven@multiplay.co.uk) Received: from mail-wm0-x22a.google.com (mail-wm0-x22a.google.com [IPv6:2a00:1450:400c:c09::22a]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id F31D51F1F for ; Thu, 17 Dec 2015 23:26:55 +0000 (UTC) (envelope-from steven@multiplay.co.uk) Received: by mail-wm0-x22a.google.com with SMTP id l126so43128977wml.1 for ; Thu, 17 Dec 2015 15:26:55 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=multiplay-co-uk.20150623.gappssmtp.com; s=20150623; h=subject:to:references:cc:from:message-id:date:user-agent :mime-version:in-reply-to:content-type:content-transfer-encoding; bh=nwbocApcSM4zBDiyxwoGdlw8qNodSF+H6U9HCaXQUOg=; b=v43ThWbK5BmjNAkjhLt1ZeMKpHdtR02Cr7pmXpJYpFkqpbnmrul9CD2XDjAUDMaEpY 4q3YkAaZqJyKu52FlkIJuV8LMO/dD0Czqgrr0VyZOVJQ0rmvRaXd9ng/UmRc3mB50mHx U1gpvU+CpyB3B6++dGKBtzwJtqYyrcwisLIlLu+XgJSsjHjN+oCqFxacwe8rjnTrDPqU uGn1oXn4OBBUG926inWEihaFZt3a2BR1zMmfeBhOFYgAe8h26yp4W2XmiDkpUYBWRnre Ta2BxaunVWbhwSVUsNUQOfu24mODbStTo6hpwYkl2Ad25sQwDn+2ydBcacXj9RZMPZjY KfLA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:subject:to:references:cc:from:message-id:date :user-agent:mime-version:in-reply-to:content-type :content-transfer-encoding; bh=nwbocApcSM4zBDiyxwoGdlw8qNodSF+H6U9HCaXQUOg=; b=B4vmsGXpI/Lma3tWYNT82+hAIKl5CIaAt9IIV8CO0/EYY9ezSKGJ+IrA1RUNvETKVm n0vA1EIWTq4FP2aJ3nndJgKSDSILusAT7lNL9r9rEqN41UTjoPxhXzpiAetHqCsPUi30 ul5gQLbCCXC0YqsOodxNtWoFMB+mS9epEqmuZLctzdFKahQuOEigAG5fzZ46jh+0HXM6 YS2cJXgEyoQAqcPJ1DIr2KshqXMBIpSDQAKZIIAyorBUa8mNVLwozgUQrKR60PGkGDnI OjCHDSxwqtM4nVdU36uZYgIs79SEKp0zsRmVTllS5MWQgeWMYx3AQg9eVcnD3A4F/J24 nBuQ== X-Gm-Message-State: ALoCoQm3SyRJ/SXvQF4VmZ1zN4WLsiIfH7YG0wNn9jQsADPIhjEOAZ2bXtBUZfyx65wgwVVYMzRoIG5CIQixUe0y5EDv9tgMcQ== X-Received: by 10.194.117.228 with SMTP id kh4mr345610wjb.171.1450394813663; Thu, 17 Dec 2015 15:26:53 -0800 (PST) Received: from [10.10.1.58] (liv3d.labs.multiplay.co.uk. [82.69.141.171]) by smtp.gmail.com with ESMTPSA id u205sm4293769wmb.12.2015.12.17.15.26.52 (version=TLSv1/SSLv3 cipher=OTHER); Thu, 17 Dec 2015 15:26:52 -0800 (PST) Subject: Re: svn commit: r292379 - in head/sys: netinet netinet6 To: Gleb Smirnoff References: <201512162226.tBGMQSvs098886@repo.freebsd.org> <20151217003824.GG42340@FreeBSD.org> <5672C6AE.7070407@freebsd.org> <20151217191630.GL42340@FreeBSD.org> Cc: src-committers@freebsd.org, svn-src-all@freebsd.org, svn-src-head@freebsd.org From: Steven Hartland Message-ID: <567344BC.20501@multiplay.co.uk> Date: Thu, 17 Dec 2015 23:26:52 +0000 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:38.0) Gecko/20100101 Thunderbird/38.4.0 MIME-Version: 1.0 In-Reply-To: <20151217191630.GL42340@FreeBSD.org> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit X-BeenThere: svn-src-all@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: "SVN commit messages for the entire src tree \(except for " user" and " projects" \)" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 17 Dec 2015 23:26:56 -0000 On 17/12/2015 19:16, Gleb Smirnoff wrote: > Steven, > > On Thu, Dec 17, 2015 at 02:29:02PM +0000, Steven Hartland wrote: > S> I would definitely like to understand more about your concerns and learn > S> from > S> your knowledge in this area, so thanks for that offer, and while it does > S> sound > S> unforgiving I totally understand where you're coming from. > S> > S> Hopefully together we can bring this to a satisfactory conclusion as I > S> would hate > S> for both carp and lagg to stay as broken, 2 years is long enough :D > > Ok, let's get technical. CARP and LAGG were not broken for 2 years. They > were working very well in the way they were designed to work. The setup > in the bug 156226 was broken initially. You may have not read all the detail in the review so you might not have noticed that I identified that carp IPv6 NA was broken by r251584 which was committed 2 1/2 years ago. I'm guessing not may people use it for IPv6. > The "link aggregation" itself refers to an aggregation of links between > two logical devices. If you build lagg(4) interface on top of two ports > that are plugged into different switches, you are calling for trouble. While multiple switches complicates the matter its not the only issue as you can reproduce this with a single switch and two nics in LAGG failover mode with a simple ifconfig down. At this point any traffic entering the switch for LAGG member will back-whole instead of being received by the other nic. It is much more common in networking now to have multiple physical switches configured as part of bigger logical devices using protocols such as MLAG, which is what we're using with Cisco's and Arista's, so not some cheepo network ;-) > All comments in the 156226 from Eugene Grosbein are valid. I would not > repeat them, but ask you to reread them in bugzilla. There was a good > reason why for 2 years committers stayed away from this "bug" and related > patch. Yes but not confuse the different types, we're talking specifically about failover mode here which has no special configuration hence its reliant on the OS implementation only. > Nevertheless, someone wants to give a kick to this initially broken > network design and run it somehow. And this "somehow" implies Layer2 > upcalling into upper layers to do something, since there is no > established standard layer2 heartbeat packet. I have chatted with > networking gurus at my job, and they said, that they don't know > any decent network equipment that supports such setup. However, they > noticed that Windows is capable for such failover. I haven't yet > learned on how Windows solves the problem. Actually, those who > pushed committing 156226 should have done these investigations. > Probably Windows does exactly the same, sends gratutious ARP or > its IPv6 analog. Or may be does something better like sending > useless L2 datagram, but with a proper source hardware address. Actually our testing here showed both Windows and Linux worked as expected and from my reading doing the GARP / UNA is actually expected in this situation, for this very reason. > Okay, what if we want same in FreeBSD as in Windows? Should we do the > following list of evil things: > > - put DELAY in context of callout(or in context of any network processing) > - introduce new notions of a link state, or new KPI for link handling > Note that link handling KPI was stable for iver 10 years and satisfied > all the different types of interfaces we support > - create new interface methods > - call into address families supplying an ifnet that doesn't have this AF > instantiated, and then to fix immediate panic putting there a kludge > of "if (foo == NULL) return;" > - etc... > > Sorry, I'm putting "etc" here, because tires on details. You would agree > that the whole process of fixing the "bug" was overcoming the problems > that the network stack is not designed for the things that you are > willing to do. Won't you agree? I am indeed trying to produce feature parity, to prevent the powers that be throwing FreeBSD out as the only OS which fails to work as expected in failover mode, even in the simple case as described above. Yes we could apply user land work around but then everyone has to be aware its need and to set it up which doesn't sound like the best solution. > Or should we just write a tiny program, that would observe state of > networking ports, and if a port changes state then send a tiny packet > as a bpf(4) write? This could be done but still means our lagg failover doesn't do what people would expect. I'd like to step back for a second and get you feedback on the changes that where reverted, which didn't have the DELAY in the callout. What where the issues as you saw them? So we don't spam people any more I've reopened the review so we can take this there: https://reviews.freebsd.org/D4111 Apologies if these are very obvious to others but clearly those involved with this didn't spot them so it would be really nice to learn from this. Regards Steve