Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 23 Jun 2014 10:27:26 +0800
From:      Marcelo Araujo <araujobsdport@gmail.com>
To:        FreeBSD Net <freebsd-net@freebsd.org>
Subject:   [patch][lagg] - Set a better granularity and distribution on roundrobin protocol.
Message-ID:  <CAOfEmZjmb1bdvn0gR6vD1WeP8o8g7KwXod4TE0iJfa=nicyeng@mail.gmail.com>

next in thread | raw e-mail | index | archive | help

[-- Attachment #1 --]
Hello guys,

I made some changes on roundrobin protocol where from now you can via
sysctl(8) set a better packets distribution among the interfaces that are
part of the lagg(4) group.

My motivation for this change was interfaces that use TSO, as example
ixgbe(4), the performance is terrible, as we can't full fill the TSO buffer
at once, the throughput drops expressively and we have much more sack
between hosts.

So, with this patch we can set the number of packets that will be send
before switch to the next interface.

In my testbed using ixgbe(4), I had a very good performance as you can see
bellow:

1) Without patch:
------------------------------------------------------------
Client connecting to 192.168.1.2, TCP port 5001
TCP window size: 32.5 KByte (default)
------------------------------------------------------------
[  3] local 192.168.1.1 port 32808 connected with 192.168.1.2 port 5001
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0- 1.0 sec   406 MBytes  3.40 Gbits/sec
[  3]  1.0- 2.0 sec   391 MBytes  3.28 Gbits/sec
[  3]  2.0- 3.0 sec   406 MBytes  3.41 Gbits/sec
[  3]  3.0- 4.0 sec   585 MBytes  4.91 Gbits/sec
[  3]  4.0- 5.0 sec   477 MBytes  4.00 Gbits/sec
[  3]  5.0- 6.0 sec   429 MBytes  3.60 Gbits/sec
[  3]  6.0- 7.0 sec   520 MBytes  4.36 Gbits/sec
[  3]  7.0- 8.0 sec   385 MBytes  3.23 Gbits/sec
[  3]  8.0- 9.0 sec   414 MBytes  3.48 Gbits/sec
[  3]  9.0-10.0 sec   515 MBytes  4.32 Gbits/sec
[  3]  0.0-10.0 sec  4.42 GBytes  3.80 Gbits/sec

2) With patch:
------------------------------------------------------------
Client connecting to 192.168.1.2, TCP port 5001
TCP window size: 32.5 KByte (default)
------------------------------------------------------------
[  3] local 192.168.1.1 port 10526 connected with 192.168.1.2 port 5001
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0- 1.0 sec   694 MBytes  5.83 Gbits/sec
[  3]  1.0- 2.0 sec   999 MBytes  8.38 Gbits/sec
[  3]  2.0- 3.0 sec  1.17 GBytes  10.1 Gbits/sec
[  3]  3.0- 4.0 sec  1.34 GBytes  11.5 Gbits/sec
[  3]  4.0- 5.0 sec  1.15 GBytes  9.91 Gbits/sec
[  3]  5.0- 6.0 sec  1.19 GBytes  10.2 Gbits/sec
[  3]  6.0- 7.0 sec  1.08 GBytes  9.23 Gbits/sec
[  3]  7.0- 8.0 sec  1.10 GBytes  9.45 Gbits/sec
[  3]  8.0- 9.0 sec  1.27 GBytes  10.9 Gbits/sec
[  3]  9.0-10.0 sec  1.39 GBytes  12.0 Gbits/sec
[  3]  0.0-10.0 sec  11.3 GBytes  9.74 Gbits/sec

So, basically we have a sysctl(8) called "net.link.lagg.rr_packets" where
we can set the number of packets that will be send before the roundrobin
move to the next interface.

Any comment and review are very appreciated.

Best Regards,

-- 
Marcelo Araujo            (__)araujo@FreeBSD.org
\\\'',)http://www.FreeBSD.org <http://www.freebsd.org/>;   \/  \ ^
Power To Server.         .\. /_)

[-- Attachment #2 --]
Index: if_lagg.c
===================================================================
--- if_lagg.c	(revision 267666)
+++ if_lagg.c	(working copy)
@@ -189,6 +189,10 @@
 SYSCTL_INT(_net_link_lagg, OID_AUTO, default_flowid_shift, CTLFLAG_RW,
     &def_flowid_shift, 0,
     "Default setting for flowid shift for load sharing");
+static int lagg_rr_packets = 0; /* Default value for using rr_packets */
+SYSCTL_INT(_net_link_lagg, OID_AUTO, rr_packets, CTLFLAG_RW,
+    &lagg_rr_packets, 0,
+    "How many packets to be send per interface");
 
 static int
 lagg_modevent(module_t mod, int type, void *data)
@@ -1689,14 +1693,68 @@
 {
 	struct lagg_port *lp;
 	uint32_t p;
+	uint32_t pkt_sysctl_count;
+	int ifp_count = 1;
 
 	p = atomic_fetchadd_32(&sc->sc_seq, 1);
 	p %= sc->sc_count;
 	lp = SLIST_FIRST(&sc->sc_ports);
-	while (p--)
-		lp = SLIST_NEXT(lp, lp_entries);
 
 	/*
+	 * If there is no reference for the IFP, we must
+ 	 * copy it now.
+	 */
+	if (strlen(sc->sc_ref_ifp) == 0)
+		strncpy(sc->sc_ref_ifp, lp->lp_ifp->if_xname, sizeof(sc->sc_ref_ifp));
+              
+	/*
+	 * If ifp_count was not yet initialized, we must
+	 * initialize now.
+	 */
+	if (sc->sc_ifp_count == 0)
+		sc->sc_ifp_count = 1;
+
+	/*
+	 * If the sysctl rr_packets is set to 0, we must use the
+	 * roundrobin as it is, or otherwise, we must apply the
+	 * granularity between the interfaces that are part of the group.
+	 */
+	if (!lagg_rr_packets) {
+		while (p--)
+			lp = SLIST_NEXT(lp, lp_entries);
+		goto send_mbuf;
+	} else {
+		pkt_sysctl_count = atomic_fetchadd_32(&sc->sc_pkt_count, 1);
+		if (pkt_sysctl_count == lagg_rr_packets) {
+			if (sc->sc_ifp_count <= sc->sc_count) {
+				while (ifp_count < sc->sc_ifp_count) {
+					lp = SLIST_NEXT(lp, lp_entries);
+					ifp_count++;
+				}
+				sc->sc_ifp_count++;
+				if (sc->sc_ifp_count > sc->sc_count)
+					sc->sc_ifp_count = 0;
+			}
+			strncpy(sc->sc_ref_ifp, lp->lp_ifp->if_xname, sizeof(sc->sc_ref_ifp));
+			sc->sc_pkt_count = 0;
+		}
+	}
+
+	/*
+	 * Check if the current interface to be enqueue is not the
+	 * same used in the last round.
+	 */
+	lp = SLIST_FIRST(&sc->sc_ports);
+	for (;;) {
+		if (strcmp(lp->lp_ifp->if_xname, sc->sc_ref_ifp) == 0)
+			break;
+		else
+			lp = SLIST_NEXT(lp, lp_entries);
+	}
+	goto send_mbuf;
+
+send_mbuf:
+	/*
 	 * Check the port's link state. This will return the next active
 	 * port if the link is down or the port is NULL.
 	 */
Index: if_lagg.h
===================================================================
--- if_lagg.h	(revision 267666)
+++ if_lagg.h	(working copy)
@@ -232,6 +232,9 @@
 	struct sysctl_oid		*sc_oid;	/* sysctl tree oid */
 	int				use_flowid;	/* use M_FLOWID */
 	int				flowid_shift;	/* shift the flowid */
+	uint32_t			sc_pkt_count; /* use for count packates per ifp */
+	int				sc_ifp_count; /* counter reference of interfaces on rr */
+	char				sc_ref_ifp[IFNAMSIZ]; /* name of the ifp */
 };
 
 struct lagg_port {

Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAOfEmZjmb1bdvn0gR6vD1WeP8o8g7KwXod4TE0iJfa=nicyeng>