From owner-svn-src-head@freebsd.org Thu Dec 17 19:16:34 2015 Return-Path: Delivered-To: svn-src-head@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 090F8A4A7A4; Thu, 17 Dec 2015 19:16:34 +0000 (UTC) (envelope-from glebius@FreeBSD.org) Received: from cell.glebius.int.ru (glebius.int.ru [81.19.69.10]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "cell.glebius.int.ru", Issuer "cell.glebius.int.ru" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id 7DF761F4E; Thu, 17 Dec 2015 19:16:32 +0000 (UTC) (envelope-from glebius@FreeBSD.org) Received: from cell.glebius.int.ru (localhost [127.0.0.1]) by cell.glebius.int.ru (8.15.2/8.15.2) with ESMTPS id tBHJGUJF050395 (version=TLSv1.2 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO); Thu, 17 Dec 2015 22:16:30 +0300 (MSK) (envelope-from glebius@FreeBSD.org) Received: (from glebius@localhost) by cell.glebius.int.ru (8.15.2/8.15.2/Submit) id tBHJGU5T050394; Thu, 17 Dec 2015 22:16:30 +0300 (MSK) (envelope-from glebius@FreeBSD.org) X-Authentication-Warning: cell.glebius.int.ru: glebius set sender to glebius@FreeBSD.org using -f Date: Thu, 17 Dec 2015 22:16:30 +0300 From: Gleb Smirnoff To: Steven Hartland Cc: src-committers@freebsd.org, svn-src-all@freebsd.org, svn-src-head@freebsd.org Subject: Re: svn commit: r292379 - in head/sys: netinet netinet6 Message-ID: <20151217191630.GL42340@FreeBSD.org> References: <201512162226.tBGMQSvs098886@repo.freebsd.org> <20151217003824.GG42340@FreeBSD.org> <5672C6AE.7070407@freebsd.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <5672C6AE.7070407@freebsd.org> User-Agent: Mutt/1.5.23 (2014-03-12) X-BeenThere: svn-src-head@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: SVN commit messages for the src tree for head/-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 17 Dec 2015 19:16:34 -0000 Steven, On Thu, Dec 17, 2015 at 02:29:02PM +0000, Steven Hartland wrote: S> I would definitely like to understand more about your concerns and learn S> from S> your knowledge in this area, so thanks for that offer, and while it does S> sound S> unforgiving I totally understand where you're coming from. S> S> Hopefully together we can bring this to a satisfactory conclusion as I S> would hate S> for both carp and lagg to stay as broken, 2 years is long enough :D Ok, let's get technical. CARP and LAGG were not broken for 2 years. They were working very well in the way they were designed to work. The setup in the bug 156226 was broken initially. The "link aggregation" itself refers to an aggregation of links between two logical devices. If you build lagg(4) interface on top of two ports that are plugged into different switches, you are calling for trouble. All comments in the 156226 from Eugene Grosbein are valid. I would not repeat them, but ask you to reread them in bugzilla. There was a good reason why for 2 years committers stayed away from this "bug" and related patch. Nevertheless, someone wants to give a kick to this initially broken network design and run it somehow. And this "somehow" implies Layer2 upcalling into upper layers to do something, since there is no established standard layer2 heartbeat packet. I have chatted with networking gurus at my job, and they said, that they don't know any decent network equipment that supports such setup. However, they noticed that Windows is capable for such failover. I haven't yet learned on how Windows solves the problem. Actually, those who pushed committing 156226 should have done these investigations. Probably Windows does exactly the same, sends gratutious ARP or its IPv6 analog. Or may be does something better like sending useless L2 datagram, but with a proper source hardware address. Okay, what if we want same in FreeBSD as in Windows? Should we do the following list of evil things: - put DELAY in context of callout(or in context of any network processing) - introduce new notions of a link state, or new KPI for link handling Note that link handling KPI was stable for iver 10 years and satisfied all the different types of interfaces we support - create new interface methods - call into address families supplying an ifnet that doesn't have this AF instantiated, and then to fix immediate panic putting there a kludge of "if (foo == NULL) return;" - etc... Sorry, I'm putting "etc" here, because tires on details. You would agree that the whole process of fixing the "bug" was overcoming the problems that the network stack is not designed for the things that you are willing to do. Won't you agree? Or should we just write a tiny program, that would observe state of networking ports, and if a port changes state then send a tiny packet as a bpf(4) write? -- Totus tuus, Glebius.