Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 10 Jul 2019 13:25:59 +0000
From:      bugzilla-noreply@freebsd.org
To:        bugs@FreeBSD.org
Subject:   [Bug 239112] [LACP] Latency problem when reconfiguring LACP links
Message-ID:  <bug-239112-227@https.bugs.freebsd.org/bugzilla/>

next in thread | raw e-mail | index | archive | help
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D239112

            Bug ID: 239112
           Summary: [LACP] Latency problem when reconfiguring LACP links
           Product: Base System
           Version: 11.2-RELEASE
          Hardware: Any
                OS: Any
            Status: New
          Severity: Affects Some People
          Priority: ---
         Component: kern
          Assignee: bugs@FreeBSD.org
          Reporter: nicolas.masse@stormshield.eu

Created attachment 205657
  --> https://bugs.freebsd.org/bugzilla/attachment.cgi?id=3D205657&action=
=3Dedit
proposed fix

Recently, me and my company came across a problem with lacp:
We observed that since we migrate from FreeBSD10.3 to FreeBSD11.2, sometimes
the lacp links takes several seconds (+/- 6 seconds) to get configured
properly.
This was observed when re-creating the lacp links from scratch using the
following commands:
ifconfig lagg0 inet 192.168.29.131/24 delete
ifconfig igb1 down
ifconfig lagg0 -laggport igb1
ifconfig lagg0 ether 00:00:00:00:00:00
ifconfig igb1 mtu 1500
ifconfig igb1 media autoselect
ifconfig lagg0 laggproto lacp
ifconfig lagg0 laggport igb1
ifconfig lagg0 mtu 1500
ifconfig lagg0 ether 0:d:b4:e:ba:e1
ifconfig lagg0 inet 192.168.29.131/24
ifconfig igb1 up
ifconfig lagg0 up

After some research, we found that the problem comes from the commit
https://svnweb.freebsd.org/base?view=3Drevision&revision=3D332834.
>From what I understand, here is what happens:
- lacp_select is called. In the case we don't see our peer yet, it does
nothing.
- since it does nothing, the flag LACP_SELECTED isn't set. As a consequence,
the timer LACP_TIMER_WAIT_WHILE isn't armed.
- Since this timer isn't armed, we have to wait for the timer
LACP_TIMER_CURRENT_WHILE to be triggered instead, adding an extra latency we
didn't observe before (up to 6 seconds).

This extra latency is a problem for us since we have a lot of automated
regression tests, and it makes them taking twice as much time to run than
before because we have to wait to be sure that the link is created properly.
So I tried to see if I can solve this, and came across the following fix (s=
ee
the attached patch):
- In lacp_select, in the case we haven't seen our peer yet, I still create =
the
aggregator if he doesn't exists yet and set LACP_SELECTED, but I dont fill =
the
aggregator id.
- In the next call to lacp_select, i test if the aggregator id id filled by
checking the LACP_STATE_AGGREGATION flag. In the case this isn't set the
aggregator id is filled and the flag is set.
Since LACP_SELECTED is set anew in the first call to lacp_select, the
LACP_TIMER_WAIT_WHILE is armed and triggered as it was before the revision
332834.

Early testing of this patch in our environnement show us that with this the
extra latency is gone and things seems to work properly.

--=20
You are receiving this mail because:
You are the assignee for the bug.=



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?bug-239112-227>