FreeBSD Mail Archives

Date:      Fri, 18 Dec 2015 13:40:31 +0000
From:      Steven Hartland <steven@multiplay.co.uk>
To:        Gleb Smirnoff <glebius@FreeBSD.org>
Cc:        src-committers@freebsd.org, svn-src-all@freebsd.org, svn-src-head@freebsd.org
Subject:   Re: svn commit: r292379 - in head/sys: netinet netinet6
Message-ID:  <56740CCF.2070101@multiplay.co.uk>
In-Reply-To: <20151217234630.GX42340@FreeBSD.org>
References:  <201512162226.tBGMQSvs098886@repo.freebsd.org> <20151217003824.GG42340@FreeBSD.org> <5672C6AE.7070407@freebsd.org> <20151217191630.GL42340@FreeBSD.org> <567344BC.20501@multiplay.co.uk> <20151217234630.GX42340@FreeBSD.org>

On 17/12/2015 23:46, Gleb Smirnoff wrote:
> On Thu, Dec 17, 2015 at 11:26:52PM +0000, Steven Hartland wrote:
> S> You may have not read all the detail in the review so you might not have
> S> noticed that I
> S> identified that carp IPv6 NA was broken by r251584 which was committed 2 1/2
> S> years ago. I'm guessing not may people use it for IPv6.
>
> My suggestion is to look at this regression separated from the lagg failover
> and fix it separately.
We could, but from this new code it was a few characters, implemented 
separately you'd
need a good portion of the code from this change anyway, so it made 
sense to just include it
here IMO.
> S> > The "link aggregation" itself refers to an aggregation of links between
> S> > two logical devices. If you build lagg(4) interface on top of two ports
> S> > that are plugged into different switches, you are calling for trouble.
> S>
> S> While multiple switches complicates the matter its not the only issue as
> S> you can
> S> reproduce this with a single switch and two nics in LAGG failover mode
> S> with a simple
> S> ifconfig <nic1> down. At this point any traffic entering the switch for
> S> LAGG member
> S> will back-whole instead of being received by the other nic.
> S>
> S> It is much more common in networking now to have multiple physical switches
> S> configured as part of bigger logical devices using protocols such as
> S> MLAG, which is
> S> what we're using with Cisco's and Arista's, so not some cheepo network ;-)
>
> Right, you are confirming what I said above. Multiple physical devices, but
> still one logical on each side of lagg.
In our target environment this is correct.
> S> > Nevertheless, someone wants to give a kick to this initially broken
> S> > network design and run it somehow. And this "somehow" implies Layer2
> S> > upcalling into upper layers to do something, since there is no
> S> > established standard layer2 heartbeat packet. I have chatted with
> S> > networking gurus at my job, and they said, that they don't know
> S> > any decent network equipment that supports such setup. However, they
> S> > noticed that Windows is capable for such failover. I haven't yet
> S> > learned on how Windows solves the problem. Actually, those who
> S> > pushed committing 156226 should have done these investigations.
> S> > Probably Windows does exactly the same, sends gratutious ARP or
> S> > its IPv6 analog. Or may be does something better like sending
> S> > useless L2 datagram, but with a proper source hardware address.
> S> Actually our testing here showed both Windows and Linux worked as
> S> expected and
> S> from my reading doing the GARP / UNA is actually expected in this
> S> situation, for this very reason.
>
> Is it possible for you to sniff the traffic and see what actually happens
> in there? My expectations are the same, but want to be sure.
Netops here did do that, which lead them to conclude the missing GARP/NA.
> S> I'd like to step back for a second and get you feedback on the changes
> S> that where
> S> reverted, which didn't have the DELAY in the callout. What where the
> S> issues as you
> S> saw them? So we don't spam people any more I've reopened the review so
> S> we can
> S> take this there: https://reviews.freebsd.org/D4111
>
> Before going into implementation, can we first settle on the protocol?
> Could be that GARP/NA is the only solution there, but let's be sure first.
>
I did try forcing traffic out from backup interface using the console 
once the primary was
down, and unfortunately that didn't help. 
net.link.lagg.failover_rx_all=1 helps in the
converse test but the only thing we found that fixed it fully in a 
timely manor was
GARP/NA.

In the tests you can clearly see the impact of ARP timeouts as sometimes 
it would
converge quicker than others.

If you would like me to try something else by all means LMK.

     Regards
     Steve

Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?56740CCF.2070101>

Header And Logo

Peripheral Links

Site Navigation

Header And Logo

Peripheral Links

Search

Site Navigation