Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 18 Dec 2015 02:46:30 +0300
From:      Gleb Smirnoff <glebius@FreeBSD.org>
To:        Steven Hartland <steven@multiplay.co.uk>
Cc:        src-committers@freebsd.org, svn-src-all@freebsd.org, svn-src-head@freebsd.org
Subject:   Re: svn commit: r292379 - in head/sys: netinet netinet6
Message-ID:  <20151217234630.GX42340@FreeBSD.org>
In-Reply-To: <567344BC.20501@multiplay.co.uk>
References:  <201512162226.tBGMQSvs098886@repo.freebsd.org> <20151217003824.GG42340@FreeBSD.org> <5672C6AE.7070407@freebsd.org> <20151217191630.GL42340@FreeBSD.org> <567344BC.20501@multiplay.co.uk>

next in thread | previous in thread | raw e-mail | index | archive | help
On Thu, Dec 17, 2015 at 11:26:52PM +0000, Steven Hartland wrote:
S> You may have not read all the detail in the review so you might not have 
S> noticed that I
S> identified that carp IPv6 NA was broken by r251584 which was committed 2 1/2
S> years ago. I'm guessing not may people use it for IPv6.

My suggestion is to look at this regression separated from the lagg failover
and fix it separately.

S> > The "link aggregation" itself refers to an aggregation of links between
S> > two logical devices. If you build lagg(4) interface on top of two ports
S> > that are plugged into different switches, you are calling for trouble.
S> 
S> While multiple switches complicates the matter its not the only issue as 
S> you can
S> reproduce this with a single switch and two nics in LAGG failover mode 
S> with a simple
S> ifconfig <nic1> down. At this point any traffic entering the switch for 
S> LAGG member
S> will back-whole instead of being received by the other nic.
S> 
S> It is much more common in networking now to have multiple physical switches
S> configured as part of bigger logical devices using protocols such as 
S> MLAG, which is
S> what we're using with Cisco's and Arista's, so not some cheepo network ;-)

Right, you are confirming what I said above. Multiple physical devices, but
still one logical on each side of lagg.

S> > Nevertheless, someone wants to give a kick to this initially broken
S> > network design and run it somehow. And this "somehow" implies Layer2
S> > upcalling into upper layers to do something, since there is no
S> > established standard layer2 heartbeat packet. I have chatted with
S> > networking gurus at my job, and they said, that they don't know
S> > any decent network equipment that supports such setup. However, they
S> > noticed that Windows is capable for such failover. I haven't yet
S> > learned on how Windows solves the problem. Actually, those who
S> > pushed committing 156226 should have done these investigations.
S> > Probably Windows does exactly the same, sends gratutious ARP or
S> > its IPv6 analog. Or may be does something better like sending
S> > useless L2 datagram, but with a proper source hardware address.
S> Actually our testing here showed both Windows and Linux worked as 
S> expected and
S> from my reading doing the GARP / UNA is actually expected in this 
S> situation, for this very reason.

Is it possible for you to sniff the traffic and see what actually happens
in there? My expectations are the same, but want to be sure.

S> I'd like to step back for a second and get you feedback on the changes 
S> that where
S> reverted, which didn't have the DELAY in the callout. What where the 
S> issues as you
S> saw them? So we don't spam people any more I've reopened the review so 
S> we can
S> take this there: https://reviews.freebsd.org/D4111

Before going into implementation, can we first settle on the protocol?
Could be that GARP/NA is the only solution there, but let's be sure first.

-- 
Totus tuus, Glebius.



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20151217234630.GX42340>