Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 10 Apr 2016 20:52:59 +0000
From:      "Cui, Cheng" <Cheng.Cui@netapp.com>
To:        Hans Petter Selasky <hps@selasky.org>
Cc:        "svn-src-head@freebsd.org" <svn-src-head@freebsd.org>
Subject:   Re: question about trimning data "len" conditions in TSO in tcp_output.c
Message-ID:  <D33034DF.FE6D%Cheng.Cui@netapp.com>
In-Reply-To: <D3300112.FE63%Cheng.Cui@netapp.com>
References:  <C44A2900-40E0-41EF-83B1-6DD4B31DABD5@netapp.com> <563D1892.3050406@selasky.org> <E6DBCD15-A41C-49A4-88C5-FB79EC969716@netapp.com> <563D2C26.2070300@selasky.org> <D26BC410.B4AD%Cheng.Cui@netapp.com> <D3300112.FE63%Cheng.Cui@netapp.com>

next in thread | previous in thread | raw e-mail | index | archive | help
--_002_D33034DFFE6DChengCuinetappcom_
Content-Type: text/plain; charset="iso-8859-1"
Content-ID: <B53DA8F7A238AE43AFB159DDF66A1CBB@hq.netapp.com>
Content-Transfer-Encoding: quoted-printable

Sorry, the path I attached in previous email was against the FreeBSD 10.2
release. This one attached should be the one against FreeBSD HEAD.

diff --git a/sys/netinet/tcp_output.c b/sys/netinet/tcp_output.c
index 2043fc9..43b0737 100644
--- a/sys/netinet/tcp_output.c
+++ b/sys/netinet/tcp_output.c
@@ -939,23 +939,15 @@ send:
 			 * emptied:
 			 */
 			max_len =3D (tp->t_maxseg - optlen);
-			if ((off + len) < sbavail(&so->so_snd)) {
+			if (len > (max_len << 1)) {
 				moff =3D len % max_len;
 				if (moff !=3D 0) {
 					len -=3D moff;
 					sendalot =3D 1;
 				}
 			}
-
-			/*
-			 * In case there are too many small fragments
-			 * don't use TSO:
-			 */
-			if (len <=3D max_len) {
-				len =3D max_len;
-				sendalot =3D 1;
-				tso =3D 0;
-			}
+			KASSERT(len >=3D max_len,
+			    ("[%s:%d]: len < max_len", __func__, __LINE__));
=20
 			/*
 			 * Send the FIN in a separate segment



Thanks,
--Cheng Cui
NetApp Scale Out Networking




On 4/10/16, 4:44 PM, "Cui, Cheng" <Cheng.Cui@netapp.com> wrote:

>Hi Hans,
>
>I would continue this discussion with a different change. The piece of
>change is
>here and also I attached the patch "change.patch" against the FreeBSD HEAD
>code-line.
>
>diff --git a/sys/netinet/tcp_output.c b/sys/netinet/tcp_output.c
>index 2043fc9..fa124f1 100644
>--- a/sys/netinet/tcp_output.c
>+++ b/sys/netinet/tcp_output.c
>@@ -938,25 +938,16 @@ send:
>                         * fractional unless the send sockbuf can be
>                         * emptied:
>                         */
>-                       max_len =3D (tp->t_maxseg - optlen);
>-                       if ((off + len) < sbavail(&so->so_snd)) {
>+                       max_len =3D (tp->t_maxopd - optlen);
>+                       if (len > (max_len << 1)) {
>                                moff =3D len % max_len;
>                                if (moff !=3D 0) {
>                                        len -=3D moff;
>                                        sendalot =3D 1;
>                                }
>                        }
>-
>-                       /*
>-                        * In case there are too many small fragments
>-                        * don't use TSO:
>-                        */
>-                       if (len <=3D max_len) {
>-                               len =3D max_len;
>-                               sendalot =3D 1;
>-                               tso =3D 0;
>-                       }
>-
>+                       KASSERT(len > max_len,
>+                           ("[%s:%d]: len <=3D max_len", __func__,
>__LINE__));
>                        /*
>                         * Send the FIN in a separate segment
>                         * after the bulk sending is done.
>
>I think this change could save additional loops that send single MSS-size
>packets. So I think some CPU cycles can be saved as well, due to this
>change=20
>reduced software sends and pushed more data to offloading sends.
>
>Here is my test. The iperf command I choose pushes 100Mbytes data to the
>wire by setting the default TCP sendspace to 1MB and recvspace to 2MB. I
>tested this TCP connection performance on a pair of 10Gbps FreeBSD 10.2
>nodes=20
>(s1 and r1) with a switch in between. Both nodes have TSO and delayed ACK
>enabled.=20
>
>root@s1:~ # ping -c 3 r1
>PING r1-link1 (10.1.2.3): 56 data bytes
>64 bytes from 10.1.2.3: icmp_seq=3D0 ttl=3D64 time=3D0.045 ms
>64 bytes from 10.1.2.3: icmp_seq=3D1 ttl=3D64 time=3D0.037 ms
>64 bytes from 10.1.2.3: icmp_seq=3D2 ttl=3D64 time=3D0.038 ms
>
>--- r1-link1 ping statistics ---
>3 packets transmitted, 3 packets received, 0.0% packet loss
>round-trip min/avg/max/stddev =3D 0.037/0.040/0.045/0.004 ms
>
>1M snd buffer/2M rcv buffer
>sysctl -w net.inet.tcp.hostcache.expire=3D1
>sysctl -w net.inet.tcp.sendspace=3D1048576
>sysctl -w net.inet.tcp.recvspace=3D2097152
>
>iperf -s                  <=3D=3D iperf command@receiver
>iperf -c r1 -m -n 100M    <=3D=3D iperf command@sender
>
>root@s1:~ # iperf -c r1 -m -n 100M
>------------------------------------------------------------
>Client connecting to r1, TCP port 5001
>TCP window size: 1.00 MByte (default)
>------------------------------------------------------------
>[  3] local 10.1.2.2 port 22491 connected with 10.1.2.3 port 5001
>[ ID] Interval       Transfer     Bandwidth
>[  3]  0.0- 0.3 sec   100 MBytes  2.69 Gbits/sec
>[  3] MSS size 1448 bytes (MTU 1500 bytes, ethernet)
>
>root@r1:~ # iperf -s
>------------------------------------------------------------
>Server listening on TCP port 5001
>TCP window size: 2.00 MByte (default)
>------------------------------------------------------------
>[  4] local 10.1.2.3 port 5001 connected with 10.1.2.2 port 22491
>[ ID] Interval       Transfer     Bandwidth
>[  4]  0.0- 0.3 sec   100 MBytes  2.62 Gbits/sec
>
>Each test sent 100MBytes of data, and I collected the packet trace from
>both=20
>nodes by tcpdump. I did this test twice to confirm the result can be
>reproduced.
>
>From the trace files of both nodes before my code change, I see a lot of
>single-MSS size packets. See the attached trace files in
>"before_change.zip".
>For example, in a sender trace file I see 43480 single-MSS size
>packets(tcp.len=3D=3D1448) out of 57005 packets that contain data(tcp.len =
>
>0).=20
>That's 76.2%.
>
>And I did the same iperf test and gathered trace files. I did not find
>many single-MSS packets this time. See the attached trace files in
>"after_change.zip". For example, in a sender trace file I see zero
>single-MSS=20
>size packets(tcp.len=3D=3D1448) out of 35729 data packets(tcp.len > 0).
>
>Compared with the receiver traces, I did not see significant more
>fractional=20
>packets received after change.
>
>I also did tests using netperf, although I did not get enough 95%
>confidence for
>every test on snd/rcv buffer size. Attached are my netperf result on
>different
>snd/rcv buffer size before and after the change (netperf_before_change.txt
>and=20
>netperf_after_change.txt), which also look good.
>
>used netperf command:
>netperf -H s1 -t TCP_STREAM -C -c -l 400 -i 10,3 -I 95,10 -- -s
>${LocalSndBuf} -S ${RemoteSndBuf}
>
>
>Thanks,
>--Cheng Cui
>NetApp Scale Out Networking
>


--_002_D33034DFFE6DChengCuinetappcom_
Content-Type: application/octet-stream; name="change.patch"
Content-Description: change.patch
Content-Disposition: attachment; filename="change.patch"; size=737;
	creation-date="Sun, 10 Apr 2016 20:52:59 GMT";
	modification-date="Sun, 10 Apr 2016 20:52:59 GMT"
Content-ID: <4F635E5844F28D47AF2B83FE20BA1844@hq.netapp.com>
Content-Transfer-Encoding: base64

ZGlmZiAtLWdpdCBhL3N5cy9uZXRpbmV0L3RjcF9vdXRwdXQuYyBiL3N5cy9uZXRpbmV0L3RjcF9v
dXRwdXQuYwppbmRleCAyMDQzZmM5Li40M2IwNzM3IDEwMDY0NAotLS0gYS9zeXMvbmV0aW5ldC90
Y3Bfb3V0cHV0LmMKKysrIGIvc3lzL25ldGluZXQvdGNwX291dHB1dC5jCkBAIC05MzksMjMgKzkz
OSwxNSBAQCBzZW5kOgogCQkJICogZW1wdGllZDoKIAkJCSAqLwogCQkJbWF4X2xlbiA9ICh0cC0+
dF9tYXhzZWcgLSBvcHRsZW4pOwotCQkJaWYgKChvZmYgKyBsZW4pIDwgc2JhdmFpbCgmc28tPnNv
X3NuZCkpIHsKKwkJCWlmIChsZW4gPiAobWF4X2xlbiA8PCAxKSkgewogCQkJCW1vZmYgPSBsZW4g
JSBtYXhfbGVuOwogCQkJCWlmIChtb2ZmICE9IDApIHsKIAkJCQkJbGVuIC09IG1vZmY7CiAJCQkJ
CXNlbmRhbG90ID0gMTsKIAkJCQl9CiAJCQl9Ci0KLQkJCS8qCi0JCQkgKiBJbiBjYXNlIHRoZXJl
IGFyZSB0b28gbWFueSBzbWFsbCBmcmFnbWVudHMKLQkJCSAqIGRvbid0IHVzZSBUU086Ci0JCQkg
Ki8KLQkJCWlmIChsZW4gPD0gbWF4X2xlbikgewotCQkJCWxlbiA9IG1heF9sZW47Ci0JCQkJc2Vu
ZGFsb3QgPSAxOwotCQkJCXRzbyA9IDA7Ci0JCQl9CisJCQlLQVNTRVJUKGxlbiA+PSBtYXhfbGVu
LAorCQkJICAgICgiWyVzOiVkXTogbGVuIDwgbWF4X2xlbiIsIF9fZnVuY19fLCBfX0xJTkVfXykp
OwogCiAJCQkvKgogCQkJICogU2VuZCB0aGUgRklOIGluIGEgc2VwYXJhdGUgc2VnbWVudAo=

--_002_D33034DFFE6DChengCuinetappcom_--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?D33034DF.FE6D%Cheng.Cui>