Date: Mon, 23 Jun 2014 10:27:26 +0800 From: Marcelo Araujo <araujobsdport@gmail.com> To: FreeBSD Net <freebsd-net@freebsd.org> Subject: [patch][lagg] - Set a better granularity and distribution on roundrobin protocol. Message-ID: <CAOfEmZjmb1bdvn0gR6vD1WeP8o8g7KwXod4TE0iJfa=nicyeng@mail.gmail.com>
next in thread | raw e-mail | index | archive | help
[-- Attachment #1 --] Hello guys, I made some changes on roundrobin protocol where from now you can via sysctl(8) set a better packets distribution among the interfaces that are part of the lagg(4) group. My motivation for this change was interfaces that use TSO, as example ixgbe(4), the performance is terrible, as we can't full fill the TSO buffer at once, the throughput drops expressively and we have much more sack between hosts. So, with this patch we can set the number of packets that will be send before switch to the next interface. In my testbed using ixgbe(4), I had a very good performance as you can see bellow: 1) Without patch: ------------------------------------------------------------ Client connecting to 192.168.1.2, TCP port 5001 TCP window size: 32.5 KByte (default) ------------------------------------------------------------ [ 3] local 192.168.1.1 port 32808 connected with 192.168.1.2 port 5001 [ ID] Interval Transfer Bandwidth [ 3] 0.0- 1.0 sec 406 MBytes 3.40 Gbits/sec [ 3] 1.0- 2.0 sec 391 MBytes 3.28 Gbits/sec [ 3] 2.0- 3.0 sec 406 MBytes 3.41 Gbits/sec [ 3] 3.0- 4.0 sec 585 MBytes 4.91 Gbits/sec [ 3] 4.0- 5.0 sec 477 MBytes 4.00 Gbits/sec [ 3] 5.0- 6.0 sec 429 MBytes 3.60 Gbits/sec [ 3] 6.0- 7.0 sec 520 MBytes 4.36 Gbits/sec [ 3] 7.0- 8.0 sec 385 MBytes 3.23 Gbits/sec [ 3] 8.0- 9.0 sec 414 MBytes 3.48 Gbits/sec [ 3] 9.0-10.0 sec 515 MBytes 4.32 Gbits/sec [ 3] 0.0-10.0 sec 4.42 GBytes 3.80 Gbits/sec 2) With patch: ------------------------------------------------------------ Client connecting to 192.168.1.2, TCP port 5001 TCP window size: 32.5 KByte (default) ------------------------------------------------------------ [ 3] local 192.168.1.1 port 10526 connected with 192.168.1.2 port 5001 [ ID] Interval Transfer Bandwidth [ 3] 0.0- 1.0 sec 694 MBytes 5.83 Gbits/sec [ 3] 1.0- 2.0 sec 999 MBytes 8.38 Gbits/sec [ 3] 2.0- 3.0 sec 1.17 GBytes 10.1 Gbits/sec [ 3] 3.0- 4.0 sec 1.34 GBytes 11.5 Gbits/sec [ 3] 4.0- 5.0 sec 1.15 GBytes 9.91 Gbits/sec [ 3] 5.0- 6.0 sec 1.19 GBytes 10.2 Gbits/sec [ 3] 6.0- 7.0 sec 1.08 GBytes 9.23 Gbits/sec [ 3] 7.0- 8.0 sec 1.10 GBytes 9.45 Gbits/sec [ 3] 8.0- 9.0 sec 1.27 GBytes 10.9 Gbits/sec [ 3] 9.0-10.0 sec 1.39 GBytes 12.0 Gbits/sec [ 3] 0.0-10.0 sec 11.3 GBytes 9.74 Gbits/sec So, basically we have a sysctl(8) called "net.link.lagg.rr_packets" where we can set the number of packets that will be send before the roundrobin move to the next interface. Any comment and review are very appreciated. Best Regards, -- Marcelo Araujo (__)araujo@FreeBSD.org \\\'',)http://www.FreeBSD.org <http://www.freebsd.org/> \/ \ ^ Power To Server. .\. /_) [-- Attachment #2 --] Index: if_lagg.c =================================================================== --- if_lagg.c (revision 267666) +++ if_lagg.c (working copy) @@ -189,6 +189,10 @@ SYSCTL_INT(_net_link_lagg, OID_AUTO, default_flowid_shift, CTLFLAG_RW, &def_flowid_shift, 0, "Default setting for flowid shift for load sharing"); +static int lagg_rr_packets = 0; /* Default value for using rr_packets */ +SYSCTL_INT(_net_link_lagg, OID_AUTO, rr_packets, CTLFLAG_RW, + &lagg_rr_packets, 0, + "How many packets to be send per interface"); static int lagg_modevent(module_t mod, int type, void *data) @@ -1689,14 +1693,68 @@ { struct lagg_port *lp; uint32_t p; + uint32_t pkt_sysctl_count; + int ifp_count = 1; p = atomic_fetchadd_32(&sc->sc_seq, 1); p %= sc->sc_count; lp = SLIST_FIRST(&sc->sc_ports); - while (p--) - lp = SLIST_NEXT(lp, lp_entries); /* + * If there is no reference for the IFP, we must + * copy it now. + */ + if (strlen(sc->sc_ref_ifp) == 0) + strncpy(sc->sc_ref_ifp, lp->lp_ifp->if_xname, sizeof(sc->sc_ref_ifp)); + + /* + * If ifp_count was not yet initialized, we must + * initialize now. + */ + if (sc->sc_ifp_count == 0) + sc->sc_ifp_count = 1; + + /* + * If the sysctl rr_packets is set to 0, we must use the + * roundrobin as it is, or otherwise, we must apply the + * granularity between the interfaces that are part of the group. + */ + if (!lagg_rr_packets) { + while (p--) + lp = SLIST_NEXT(lp, lp_entries); + goto send_mbuf; + } else { + pkt_sysctl_count = atomic_fetchadd_32(&sc->sc_pkt_count, 1); + if (pkt_sysctl_count == lagg_rr_packets) { + if (sc->sc_ifp_count <= sc->sc_count) { + while (ifp_count < sc->sc_ifp_count) { + lp = SLIST_NEXT(lp, lp_entries); + ifp_count++; + } + sc->sc_ifp_count++; + if (sc->sc_ifp_count > sc->sc_count) + sc->sc_ifp_count = 0; + } + strncpy(sc->sc_ref_ifp, lp->lp_ifp->if_xname, sizeof(sc->sc_ref_ifp)); + sc->sc_pkt_count = 0; + } + } + + /* + * Check if the current interface to be enqueue is not the + * same used in the last round. + */ + lp = SLIST_FIRST(&sc->sc_ports); + for (;;) { + if (strcmp(lp->lp_ifp->if_xname, sc->sc_ref_ifp) == 0) + break; + else + lp = SLIST_NEXT(lp, lp_entries); + } + goto send_mbuf; + +send_mbuf: + /* * Check the port's link state. This will return the next active * port if the link is down or the port is NULL. */ Index: if_lagg.h =================================================================== --- if_lagg.h (revision 267666) +++ if_lagg.h (working copy) @@ -232,6 +232,9 @@ struct sysctl_oid *sc_oid; /* sysctl tree oid */ int use_flowid; /* use M_FLOWID */ int flowid_shift; /* shift the flowid */ + uint32_t sc_pkt_count; /* use for count packates per ifp */ + int sc_ifp_count; /* counter reference of interfaces on rr */ + char sc_ref_ifp[IFNAMSIZ]; /* name of the ifp */ }; struct lagg_port {
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAOfEmZjmb1bdvn0gR6vD1WeP8o8g7KwXod4TE0iJfa=nicyeng>
