Date: Sun, 10 Apr 2016 20:52:59 +0000 From: "Cui, Cheng" <Cheng.Cui@netapp.com> To: Hans Petter Selasky <hps@selasky.org> Cc: "svn-src-head@freebsd.org" <svn-src-head@freebsd.org> Subject: Re: question about trimning data "len" conditions in TSO in tcp_output.c Message-ID: <D33034DF.FE6D%Cheng.Cui@netapp.com> In-Reply-To: <D3300112.FE63%Cheng.Cui@netapp.com> References: <C44A2900-40E0-41EF-83B1-6DD4B31DABD5@netapp.com> <563D1892.3050406@selasky.org> <E6DBCD15-A41C-49A4-88C5-FB79EC969716@netapp.com> <563D2C26.2070300@selasky.org> <D26BC410.B4AD%Cheng.Cui@netapp.com> <D3300112.FE63%Cheng.Cui@netapp.com>
next in thread | previous in thread | raw e-mail | index | archive | help
--_002_D33034DFFE6DChengCuinetappcom_ Content-Type: text/plain; charset="iso-8859-1" Content-ID: <B53DA8F7A238AE43AFB159DDF66A1CBB@hq.netapp.com> Content-Transfer-Encoding: quoted-printable Sorry, the path I attached in previous email was against the FreeBSD 10.2 release. This one attached should be the one against FreeBSD HEAD. diff --git a/sys/netinet/tcp_output.c b/sys/netinet/tcp_output.c index 2043fc9..43b0737 100644 --- a/sys/netinet/tcp_output.c +++ b/sys/netinet/tcp_output.c @@ -939,23 +939,15 @@ send: * emptied: */ max_len =3D (tp->t_maxseg - optlen); - if ((off + len) < sbavail(&so->so_snd)) { + if (len > (max_len << 1)) { moff =3D len % max_len; if (moff !=3D 0) { len -=3D moff; sendalot =3D 1; } } - - /* - * In case there are too many small fragments - * don't use TSO: - */ - if (len <=3D max_len) { - len =3D max_len; - sendalot =3D 1; - tso =3D 0; - } + KASSERT(len >=3D max_len, + ("[%s:%d]: len < max_len", __func__, __LINE__)); =20 /* * Send the FIN in a separate segment Thanks, --Cheng Cui NetApp Scale Out Networking On 4/10/16, 4:44 PM, "Cui, Cheng" <Cheng.Cui@netapp.com> wrote: >Hi Hans, > >I would continue this discussion with a different change. The piece of >change is >here and also I attached the patch "change.patch" against the FreeBSD HEAD >code-line. > >diff --git a/sys/netinet/tcp_output.c b/sys/netinet/tcp_output.c >index 2043fc9..fa124f1 100644 >--- a/sys/netinet/tcp_output.c >+++ b/sys/netinet/tcp_output.c >@@ -938,25 +938,16 @@ send: > * fractional unless the send sockbuf can be > * emptied: > */ >- max_len =3D (tp->t_maxseg - optlen); >- if ((off + len) < sbavail(&so->so_snd)) { >+ max_len =3D (tp->t_maxopd - optlen); >+ if (len > (max_len << 1)) { > moff =3D len % max_len; > if (moff !=3D 0) { > len -=3D moff; > sendalot =3D 1; > } > } >- >- /* >- * In case there are too many small fragments >- * don't use TSO: >- */ >- if (len <=3D max_len) { >- len =3D max_len; >- sendalot =3D 1; >- tso =3D 0; >- } >- >+ KASSERT(len > max_len, >+ ("[%s:%d]: len <=3D max_len", __func__, >__LINE__)); > /* > * Send the FIN in a separate segment > * after the bulk sending is done. > >I think this change could save additional loops that send single MSS-size >packets. So I think some CPU cycles can be saved as well, due to this >change=20 >reduced software sends and pushed more data to offloading sends. > >Here is my test. The iperf command I choose pushes 100Mbytes data to the >wire by setting the default TCP sendspace to 1MB and recvspace to 2MB. I >tested this TCP connection performance on a pair of 10Gbps FreeBSD 10.2 >nodes=20 >(s1 and r1) with a switch in between. Both nodes have TSO and delayed ACK >enabled.=20 > >root@s1:~ # ping -c 3 r1 >PING r1-link1 (10.1.2.3): 56 data bytes >64 bytes from 10.1.2.3: icmp_seq=3D0 ttl=3D64 time=3D0.045 ms >64 bytes from 10.1.2.3: icmp_seq=3D1 ttl=3D64 time=3D0.037 ms >64 bytes from 10.1.2.3: icmp_seq=3D2 ttl=3D64 time=3D0.038 ms > >--- r1-link1 ping statistics --- >3 packets transmitted, 3 packets received, 0.0% packet loss >round-trip min/avg/max/stddev =3D 0.037/0.040/0.045/0.004 ms > >1M snd buffer/2M rcv buffer >sysctl -w net.inet.tcp.hostcache.expire=3D1 >sysctl -w net.inet.tcp.sendspace=3D1048576 >sysctl -w net.inet.tcp.recvspace=3D2097152 > >iperf -s <=3D=3D iperf command@receiver >iperf -c r1 -m -n 100M <=3D=3D iperf command@sender > >root@s1:~ # iperf -c r1 -m -n 100M >------------------------------------------------------------ >Client connecting to r1, TCP port 5001 >TCP window size: 1.00 MByte (default) >------------------------------------------------------------ >[ 3] local 10.1.2.2 port 22491 connected with 10.1.2.3 port 5001 >[ ID] Interval Transfer Bandwidth >[ 3] 0.0- 0.3 sec 100 MBytes 2.69 Gbits/sec >[ 3] MSS size 1448 bytes (MTU 1500 bytes, ethernet) > >root@r1:~ # iperf -s >------------------------------------------------------------ >Server listening on TCP port 5001 >TCP window size: 2.00 MByte (default) >------------------------------------------------------------ >[ 4] local 10.1.2.3 port 5001 connected with 10.1.2.2 port 22491 >[ ID] Interval Transfer Bandwidth >[ 4] 0.0- 0.3 sec 100 MBytes 2.62 Gbits/sec > >Each test sent 100MBytes of data, and I collected the packet trace from >both=20 >nodes by tcpdump. I did this test twice to confirm the result can be >reproduced. > >From the trace files of both nodes before my code change, I see a lot of >single-MSS size packets. See the attached trace files in >"before_change.zip". >For example, in a sender trace file I see 43480 single-MSS size >packets(tcp.len=3D=3D1448) out of 57005 packets that contain data(tcp.len = > >0).=20 >That's 76.2%. > >And I did the same iperf test and gathered trace files. I did not find >many single-MSS packets this time. See the attached trace files in >"after_change.zip". For example, in a sender trace file I see zero >single-MSS=20 >size packets(tcp.len=3D=3D1448) out of 35729 data packets(tcp.len > 0). > >Compared with the receiver traces, I did not see significant more >fractional=20 >packets received after change. > >I also did tests using netperf, although I did not get enough 95% >confidence for >every test on snd/rcv buffer size. Attached are my netperf result on >different >snd/rcv buffer size before and after the change (netperf_before_change.txt >and=20 >netperf_after_change.txt), which also look good. > >used netperf command: >netperf -H s1 -t TCP_STREAM -C -c -l 400 -i 10,3 -I 95,10 -- -s >${LocalSndBuf} -S ${RemoteSndBuf} > > >Thanks, >--Cheng Cui >NetApp Scale Out Networking > --_002_D33034DFFE6DChengCuinetappcom_ Content-Type: application/octet-stream; name="change.patch" Content-Description: change.patch Content-Disposition: attachment; filename="change.patch"; size=737; creation-date="Sun, 10 Apr 2016 20:52:59 GMT"; modification-date="Sun, 10 Apr 2016 20:52:59 GMT" Content-ID: <4F635E5844F28D47AF2B83FE20BA1844@hq.netapp.com> Content-Transfer-Encoding: base64 ZGlmZiAtLWdpdCBhL3N5cy9uZXRpbmV0L3RjcF9vdXRwdXQuYyBiL3N5cy9uZXRpbmV0L3RjcF9v dXRwdXQuYwppbmRleCAyMDQzZmM5Li40M2IwNzM3IDEwMDY0NAotLS0gYS9zeXMvbmV0aW5ldC90 Y3Bfb3V0cHV0LmMKKysrIGIvc3lzL25ldGluZXQvdGNwX291dHB1dC5jCkBAIC05MzksMjMgKzkz OSwxNSBAQCBzZW5kOgogCQkJICogZW1wdGllZDoKIAkJCSAqLwogCQkJbWF4X2xlbiA9ICh0cC0+ dF9tYXhzZWcgLSBvcHRsZW4pOwotCQkJaWYgKChvZmYgKyBsZW4pIDwgc2JhdmFpbCgmc28tPnNv X3NuZCkpIHsKKwkJCWlmIChsZW4gPiAobWF4X2xlbiA8PCAxKSkgewogCQkJCW1vZmYgPSBsZW4g JSBtYXhfbGVuOwogCQkJCWlmIChtb2ZmICE9IDApIHsKIAkJCQkJbGVuIC09IG1vZmY7CiAJCQkJ CXNlbmRhbG90ID0gMTsKIAkJCQl9CiAJCQl9Ci0KLQkJCS8qCi0JCQkgKiBJbiBjYXNlIHRoZXJl IGFyZSB0b28gbWFueSBzbWFsbCBmcmFnbWVudHMKLQkJCSAqIGRvbid0IHVzZSBUU086Ci0JCQkg Ki8KLQkJCWlmIChsZW4gPD0gbWF4X2xlbikgewotCQkJCWxlbiA9IG1heF9sZW47Ci0JCQkJc2Vu ZGFsb3QgPSAxOwotCQkJCXRzbyA9IDA7Ci0JCQl9CisJCQlLQVNTRVJUKGxlbiA+PSBtYXhfbGVu LAorCQkJICAgICgiWyVzOiVkXTogbGVuIDwgbWF4X2xlbiIsIF9fZnVuY19fLCBfX0xJTkVfXykp OwogCiAJCQkvKgogCQkJICogU2VuZCB0aGUgRklOIGluIGEgc2VwYXJhdGUgc2VnbWVudAo= --_002_D33034DFFE6DChengCuinetappcom_--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?D33034DF.FE6D%Cheng.Cui>