Date: Wed, 10 Jul 2019 13:25:59 +0000 From: bugzilla-noreply@freebsd.org To: bugs@FreeBSD.org Subject: [Bug 239112] [LACP] Latency problem when reconfiguring LACP links Message-ID: <bug-239112-227@https.bugs.freebsd.org/bugzilla/>
next in thread | raw e-mail | index | archive | help
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D239112 Bug ID: 239112 Summary: [LACP] Latency problem when reconfiguring LACP links Product: Base System Version: 11.2-RELEASE Hardware: Any OS: Any Status: New Severity: Affects Some People Priority: --- Component: kern Assignee: bugs@FreeBSD.org Reporter: nicolas.masse@stormshield.eu Created attachment 205657 --> https://bugs.freebsd.org/bugzilla/attachment.cgi?id=3D205657&action= =3Dedit proposed fix Recently, me and my company came across a problem with lacp: We observed that since we migrate from FreeBSD10.3 to FreeBSD11.2, sometimes the lacp links takes several seconds (+/- 6 seconds) to get configured properly. This was observed when re-creating the lacp links from scratch using the following commands: ifconfig lagg0 inet 192.168.29.131/24 delete ifconfig igb1 down ifconfig lagg0 -laggport igb1 ifconfig lagg0 ether 00:00:00:00:00:00 ifconfig igb1 mtu 1500 ifconfig igb1 media autoselect ifconfig lagg0 laggproto lacp ifconfig lagg0 laggport igb1 ifconfig lagg0 mtu 1500 ifconfig lagg0 ether 0:d:b4:e:ba:e1 ifconfig lagg0 inet 192.168.29.131/24 ifconfig igb1 up ifconfig lagg0 up After some research, we found that the problem comes from the commit https://svnweb.freebsd.org/base?view=3Drevision&revision=3D332834. >From what I understand, here is what happens: - lacp_select is called. In the case we don't see our peer yet, it does nothing. - since it does nothing, the flag LACP_SELECTED isn't set. As a consequence, the timer LACP_TIMER_WAIT_WHILE isn't armed. - Since this timer isn't armed, we have to wait for the timer LACP_TIMER_CURRENT_WHILE to be triggered instead, adding an extra latency we didn't observe before (up to 6 seconds). This extra latency is a problem for us since we have a lot of automated regression tests, and it makes them taking twice as much time to run than before because we have to wait to be sure that the link is created properly. So I tried to see if I can solve this, and came across the following fix (s= ee the attached patch): - In lacp_select, in the case we haven't seen our peer yet, I still create = the aggregator if he doesn't exists yet and set LACP_SELECTED, but I dont fill = the aggregator id. - In the next call to lacp_select, i test if the aggregator id id filled by checking the LACP_STATE_AGGREGATION flag. In the case this isn't set the aggregator id is filled and the flag is set. Since LACP_SELECTED is set anew in the first call to lacp_select, the LACP_TIMER_WAIT_WHILE is armed and triggered as it was before the revision 332834. Early testing of this patch in our environnement show us that with this the extra latency is gone and things seems to work properly. --=20 You are receiving this mail because: You are the assignee for the bug.=
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?bug-239112-227>