FreeBSD Mail Archives

Date:      Mon, 23 Jun 2014 15:31:02 +0800
From:      Marcelo Araujo <araujobsdport@gmail.com>
To:        Adrian Chadd <adrian@freebsd.org>
Cc:        FreeBSD Net <freebsd-net@freebsd.org>
Subject:   Re: [patch][lagg] - Set a better granularity and distribution on roundrobin protocol.
Message-ID:  <CAOfEmZja8Tkv_xG8LyR5Nbj%2BOga=vvdy=b3pxHqZi0-BBq25Uw@mail.gmail.com>
In-Reply-To: <CAJ-Vmomt2QDXAVBVUk6m8oH4Pa5yErDdG6wWrP3X7%2BDW137xiA@mail.gmail.com>
References:  <CAOfEmZjmb1bdvn0gR6vD1WeP8o8g7KwXod4TE0iJfa=nicyeng@mail.gmail.com> <CAJ-Vmomt2QDXAVBVUk6m8oH4Pa5yErDdG6wWrP3X7%2BDW137xiA@mail.gmail.com>

Hello Adrian,


2014-06-23 12:16 GMT+08:00 Adrian Chadd <adrian@freebsd.org>:

> ...
>
> It's an interesting idea, but doing round robin like that may
> introduce out of order packets.
>

Actually, the round robin implementation as it is, causes out of order
packets, but almost all the time SACK can recover it.

In my tests using iperf, when we set a bigger number of packets to be sent
through the same interface before switch to the next one, I can see that we
have less SACK request, and I do believe because of it, I can reach a
better throughput.

The test is very simple: "iperf -s" and "iperf -c <ip> -i 1 -t 10".

As an example:
1) without change the number of packets:
        43 SACK recovery episodes
        187 segment rexmits in SACK recovery episodes
        270776 byte rexmits in SACK recovery episodes
        172688 SACK options (SACK blocks) received
        0 SACK options (SACK blocks) sent
        0 SACK scoreboard overflow
                0 input SACK chunks
                0 output SACKs

2) Set 50 packets per interface:
        6 SACK recovery episodes
        16 segment rexmits in SACK recovery episodes
        23168 byte rexmits in SACK recovery episodes
        111626 SACK options (SACK blocks) received
        0 SACK options (SACK blocks) sent
        0 SACK scoreboard overflow
                0 input SACK chunks
                0 output SACKs



>
> What's the actual problem you're seeing? Are the transmit queues
> filling up? Is the distribution with flowid/curcpu not good enough?
>

I have had imported Scott's patch, I do believe you are talking about
r260070. I didn't pay attention to the flowid/curcpu distribution and I
can't tell you if it is the root cause or not, but for my case, it didn't
solve the bad performance of round robin. With all the other lagg(4)
protocols, the throughput reach the limit of the NIC.

It might be likely that the transmit queue isn't filled up or hang for some
reason, it is something that I need check.

My suspicious is how the ixgbe(4) trigger the TSO, it seems that transmit
queue is not completely filled up and it might delay the transmission or
lose packets, or perhaps lose the entire queue. Also any tips of how debug
the TSO will be very welcome.


>
> Scott saw this happen at Netflix. He added a lagg twiddle to set which
> set of bits to care about in the flowid when picking an interface to
> choose. The ixgbe hashing was being done on the low x bits, where x is
> related to how many CPUs you have (2 CPUs? 1 bit. 8 CPUs? 3 bits.
> etc.) lagg was doing the same thing on the same low order set of bits.
> He modified lagg so you could pick some new starting point a few bits
> up in the flowid to pick a lagg interface with. That fixed the
> distribution issue and also kept the in-orderness of it all.
>

I thought that Scott's patch is more focused on LACP, I didn't realize that
it would helps the other aggregation protocols. Anyway, for round robin,
with/without the r260070, don't change too much, at least in my environment.

Best Regards,


>
> 2c,
>
>
> -a
>
> On 22 June 2014 19:27, Marcelo Araujo <araujobsdport@gmail.com> wrote:
> > Hello guys,
> >
> > I made some changes on roundrobin protocol where from now you can via
> > sysctl(8) set a better packets distribution among the interfaces that are
> > part of the lagg(4) group.
> >
> > My motivation for this change was interfaces that use TSO, as example
> > ixgbe(4), the performance is terrible, as we can't full fill the TSO
> buffer
> > at once, the throughput drops expressively and we have much more sack
> > between hosts.
> >
> > So, with this patch we can set the number of packets that will be send
> > before switch to the next interface.
> >
> > In my testbed using ixgbe(4), I had a very good performance as you can
> see
> > bellow:
> >
> > 1) Without patch:
> > ------------------------------------------------------------
> > Client connecting to 192.168.1.2, TCP port 5001
> > TCP window size: 32.5 KByte (default)
> > ------------------------------------------------------------
> > [  3] local 192.168.1.1 port 32808 connected with 192.168.1.2 port 5001
> > [ ID] Interval       Transfer     Bandwidth
> > [  3]  0.0- 1.0 sec   406 MBytes  3.40 Gbits/sec
> > [  3]  1.0- 2.0 sec   391 MBytes  3.28 Gbits/sec
> > [  3]  2.0- 3.0 sec   406 MBytes  3.41 Gbits/sec
> > [  3]  3.0- 4.0 sec   585 MBytes  4.91 Gbits/sec
> > [  3]  4.0- 5.0 sec   477 MBytes  4.00 Gbits/sec
> > [  3]  5.0- 6.0 sec   429 MBytes  3.60 Gbits/sec
> > [  3]  6.0- 7.0 sec   520 MBytes  4.36 Gbits/sec
> > [  3]  7.0- 8.0 sec   385 MBytes  3.23 Gbits/sec
> > [  3]  8.0- 9.0 sec   414 MBytes  3.48 Gbits/sec
> > [  3]  9.0-10.0 sec   515 MBytes  4.32 Gbits/sec
> > [  3]  0.0-10.0 sec  4.42 GBytes  3.80 Gbits/sec
> >
> > 2) With patch:
> > ------------------------------------------------------------
> > Client connecting to 192.168.1.2, TCP port 5001
> > TCP window size: 32.5 KByte (default)
> > ------------------------------------------------------------
> > [  3] local 192.168.1.1 port 10526 connected with 192.168.1.2 port 5001
> > [ ID] Interval       Transfer     Bandwidth
> > [  3]  0.0- 1.0 sec   694 MBytes  5.83 Gbits/sec
> > [  3]  1.0- 2.0 sec   999 MBytes  8.38 Gbits/sec
> > [  3]  2.0- 3.0 sec  1.17 GBytes  10.1 Gbits/sec
> > [  3]  3.0- 4.0 sec  1.34 GBytes  11.5 Gbits/sec
> > [  3]  4.0- 5.0 sec  1.15 GBytes  9.91 Gbits/sec
> > [  3]  5.0- 6.0 sec  1.19 GBytes  10.2 Gbits/sec
> > [  3]  6.0- 7.0 sec  1.08 GBytes  9.23 Gbits/sec
> > [  3]  7.0- 8.0 sec  1.10 GBytes  9.45 Gbits/sec
> > [  3]  8.0- 9.0 sec  1.27 GBytes  10.9 Gbits/sec
> > [  3]  9.0-10.0 sec  1.39 GBytes  12.0 Gbits/sec
> > [  3]  0.0-10.0 sec  11.3 GBytes  9.74 Gbits/sec
> >
> > So, basically we have a sysctl(8) called "net.link.lagg.rr_packets" where
> > we can set the number of packets that will be send before the roundrobin
> > move to the next interface.
> >
> > Any comment and review are very appreciated.
> >
> > Best Regards,
> >
> > --
> > Marcelo Araujo            (__)araujo@FreeBSD.org
> > \\\'',)http://www.FreeBSD.org <http://www.freebsd.org/>;   \/  \ ^
> > Power To Server.         .\. /_)
> >
> > _______________________________________________
> > freebsd-net@freebsd.org mailing list
> > http://lists.freebsd.org/mailman/listinfo/freebsd-net
> > To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org"
>



-- 

-- 
Marcelo Araujo            (__)araujo@FreeBSD.org
\\\'',)http://www.FreeBSD.org <http://www.freebsd.org/>;   \/  \ ^
Power To Server.         .\. /_)

Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAOfEmZja8Tkv_xG8LyR5Nbj%2BOga=vvdy=b3pxHqZi0-BBq25Uw>

Header And Logo

Peripheral Links

Site Navigation

Header And Logo

Peripheral Links

Search

Site Navigation