Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 10 Apr 2016 20:44:22 +0000
From:      "Cui, Cheng" <Cheng.Cui@netapp.com>
To:        Hans Petter Selasky <hps@selasky.org>
Cc:        "svn-src-head@freebsd.org" <svn-src-head@freebsd.org>
Subject:   Re: question about trimning data "len" conditions in TSO in tcp_output.c
Message-ID:  <D3300112.FE63%Cheng.Cui@netapp.com>
In-Reply-To: <D26BC410.B4AD%Cheng.Cui@netapp.com>
References:  <C44A2900-40E0-41EF-83B1-6DD4B31DABD5@netapp.com> <563D1892.3050406@selasky.org> <E6DBCD15-A41C-49A4-88C5-FB79EC969716@netapp.com> <563D2C26.2070300@selasky.org> <D26BC410.B4AD%Cheng.Cui@netapp.com>

next in thread | previous in thread | raw e-mail | index | archive | help

[-- Attachment #1 --]
Hi Hans,

I would continue this discussion with a different change. The piece of
change is
here and also I attached the patch "change.patch" against the FreeBSD HEAD
code-line.

diff --git a/sys/netinet/tcp_output.c b/sys/netinet/tcp_output.c
index 2043fc9..fa124f1 100644
--- a/sys/netinet/tcp_output.c
+++ b/sys/netinet/tcp_output.c
@@ -938,25 +938,16 @@ send:
                         * fractional unless the send sockbuf can be
                         * emptied:
                         */
-                       max_len = (tp->t_maxseg - optlen);
-                       if ((off + len) < sbavail(&so->so_snd)) {
+                       max_len = (tp->t_maxopd - optlen);
+                       if (len > (max_len << 1)) {
                                moff = len % max_len;
                                if (moff != 0) {
                                        len -= moff;
                                        sendalot = 1;
                                }
                        }
-
-                       /*
-                        * In case there are too many small fragments
-                        * don't use TSO:
-                        */
-                       if (len <= max_len) {
-                               len = max_len;
-                               sendalot = 1;
-                               tso = 0;
-                       }
-
+                       KASSERT(len > max_len,
+                           ("[%s:%d]: len <= max_len", __func__,
__LINE__));
                        /*
                         * Send the FIN in a separate segment
                         * after the bulk sending is done.

I think this change could save additional loops that send single MSS-size
packets. So I think some CPU cycles can be saved as well, due to this
change 
reduced software sends and pushed more data to offloading sends.

Here is my test. The iperf command I choose pushes 100Mbytes data to the
wire by setting the default TCP sendspace to 1MB and recvspace to 2MB. I
tested this TCP connection performance on a pair of 10Gbps FreeBSD 10.2
nodes 
(s1 and r1) with a switch in between. Both nodes have TSO and delayed ACK
enabled. 

root@s1:~ # ping -c 3 r1
PING r1-link1 (10.1.2.3): 56 data bytes
64 bytes from 10.1.2.3: icmp_seq=0 ttl=64 time=0.045 ms
64 bytes from 10.1.2.3: icmp_seq=1 ttl=64 time=0.037 ms
64 bytes from 10.1.2.3: icmp_seq=2 ttl=64 time=0.038 ms

--- r1-link1 ping statistics ---
3 packets transmitted, 3 packets received, 0.0% packet loss
round-trip min/avg/max/stddev = 0.037/0.040/0.045/0.004 ms

1M snd buffer/2M rcv buffer
sysctl -w net.inet.tcp.hostcache.expire=1
sysctl -w net.inet.tcp.sendspace=1048576
sysctl -w net.inet.tcp.recvspace=2097152

iperf -s                  <== iperf command@receiver
iperf -c r1 -m -n 100M    <== iperf command@sender

root@s1:~ # iperf -c r1 -m -n 100M
------------------------------------------------------------
Client connecting to r1, TCP port 5001
TCP window size: 1.00 MByte (default)
------------------------------------------------------------
[  3] local 10.1.2.2 port 22491 connected with 10.1.2.3 port 5001
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0- 0.3 sec   100 MBytes  2.69 Gbits/sec
[  3] MSS size 1448 bytes (MTU 1500 bytes, ethernet)

root@r1:~ # iperf -s
------------------------------------------------------------
Server listening on TCP port 5001
TCP window size: 2.00 MByte (default)
------------------------------------------------------------
[  4] local 10.1.2.3 port 5001 connected with 10.1.2.2 port 22491
[ ID] Interval       Transfer     Bandwidth
[  4]  0.0- 0.3 sec   100 MBytes  2.62 Gbits/sec

Each test sent 100MBytes of data, and I collected the packet trace from
both 
nodes by tcpdump. I did this test twice to confirm the result can be
reproduced.

>From the trace files of both nodes before my code change, I see a lot of
single-MSS size packets. See the attached trace files in
"before_change.zip".
For example, in a sender trace file I see 43480 single-MSS size
packets(tcp.len==1448) out of 57005 packets that contain data(tcp.len >
0). 
That's 76.2%.

And I did the same iperf test and gathered trace files. I did not find
many single-MSS packets this time. See the attached trace files in
"after_change.zip". For example, in a sender trace file I see zero
single-MSS 
size packets(tcp.len==1448) out of 35729 data packets(tcp.len > 0).

Compared with the receiver traces, I did not see significant more
fractional 
packets received after change.

I also did tests using netperf, although I did not get enough 95%
confidence for
every test on snd/rcv buffer size. Attached are my netperf result on
different
snd/rcv buffer size before and after the change (netperf_before_change.txt
and 
netperf_after_change.txt), which also look good.

used netperf command:
netperf -H s1 -t TCP_STREAM -C -c -l 400 -i 10,3 -I 95,10 -- -s
${LocalSndBuf} -S ${RemoteSndBuf}


Thanks,
--Cheng Cui
NetApp Scale Out Networking


[-- Attachment #2 --]
diff --git a/sys/netinet/tcp_output.c b/sys/netinet/tcp_output.c
index 2043fc9..fa124f1 100644
--- a/sys/netinet/tcp_output.c
+++ b/sys/netinet/tcp_output.c
@@ -938,25 +938,16 @@ send:
 			 * fractional unless the send sockbuf can be
 			 * emptied:
 			 */
-			max_len = (tp->t_maxseg - optlen);
-			if ((off + len) < sbavail(&so->so_snd)) {
+			max_len = (tp->t_maxopd - optlen);
+			if (len > (max_len << 1)) {
 				moff = len % max_len;
 				if (moff != 0) {
 					len -= moff;
 					sendalot = 1;
 				}
 			}
-
-			/*
-			 * In case there are too many small fragments
-			 * don't use TSO:
-			 */
-			if (len <= max_len) {
-				len = max_len;
-				sendalot = 1;
-				tso = 0;
-			}
-
+			KASSERT(len > max_len,
+			    ("[%s:%d]: len <= max_len", __func__, __LINE__));
 			/*
 			 * Send the FIN in a separate segment
 			 * after the bulk sending is done.

[-- Attachment #3 --]
Thu Apr  7 14:42:21 MDT 2016
MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to r1-link1 () port 0 AF_INET : +/-5.000% @ 95% conf.  : histogram : interval : dirty data : demo
!!! WARNING
!!! Desired confidence was not achieved within the specified iterations.
!!! This implies that there was variability in the test environment that
!!! must be investigated before going further.
!!! Confidence intervals: Throughput      : 10.783%
!!!                       Local CPU util  : 6.277%
!!!                       Remote CPU util : 1.153%

Recv   Send    Send                          Utilization       Service Demand
Socket Socket  Message  Elapsed              Send     Recv     Send    Recv
Size   Size    Size     Time     Throughput  local    remote   local   remote
bytes  bytes   bytes    secs.    10^6bits/s  % C      % C      us/KB   us/KB

 65536  32768  32768    400.01     4670.31   4.87     5.48     2.747   3.091  
Thu Apr  7 15:49:02 MDT 2016



Thu Apr  7 15:49:12 MDT 2016
MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to r1-link1 () port 0 AF_INET : +/-5.000% @ 95% conf.  : histogram : interval : dirty data : demo
!!! WARNING
!!! Desired confidence was not achieved within the specified iterations.
!!! This implies that there was variability in the test environment that
!!! must be investigated before going further.
!!! Confidence intervals: Throughput      : 7.347%
!!!                       Local CPU util  : 11.658%
!!!                       Remote CPU util : 1.524%

Recv   Send    Send                          Utilization       Service Demand
Socket Socket  Message  Elapsed              Send     Recv     Send    Recv
Size   Size    Size     Time     Throughput  local    remote   local   remote
bytes  bytes   bytes    secs.    10^6bits/s  % C      % C      us/KB   us/KB

131072  65536  65536    400.01     4742.45   4.99     5.53     2.759   3.064  
Thu Apr  7 16:55:52 MDT 2016



Thu Apr  7 16:56:02 MDT 2016
MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to r1-link1 () port 0 AF_INET : +/-5.000% @ 95% conf.  : histogram : interval : dirty data : demo
!!! WARNING
!!! Desired confidence was not achieved within the specified iterations.
!!! This implies that there was variability in the test environment that
!!! must be investigated before going further.
!!! Confidence intervals: Throughput      : 10.212%
!!!                       Local CPU util  : 12.850%
!!!                       Remote CPU util : 0.874%

Recv   Send    Send                          Utilization       Service Demand
Socket Socket  Message  Elapsed              Send     Recv     Send    Recv
Size   Size    Size     Time     Throughput  local    remote   local   remote
bytes  bytes   bytes    secs.    10^6bits/s  % C      % C      us/KB   us/KB

262144 131072 131072    400.02     4881.49   5.42     5.53     2.915   2.981  
Thu Apr  7 18:02:42 MDT 2016



Thu Apr  7 18:02:52 MDT 2016
MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to r1-link1 () port 0 AF_INET : +/-5.000% @ 95% conf.  : histogram : interval : dirty data : demo
!!! WARNING
!!! Desired confidence was not achieved within the specified iterations.
!!! This implies that there was variability in the test environment that
!!! must be investigated before going further.
!!! Confidence intervals: Throughput      : 36.686%
!!!                       Local CPU util  : 12.641%
!!!                       Remote CPU util : 12.322%

Recv   Send    Send                          Utilization       Service Demand
Socket Socket  Message  Elapsed              Send     Recv     Send    Recv
Size   Size    Size     Time     Throughput  local    remote   local   remote
bytes  bytes   bytes    secs.    10^6bits/s  % C      % C      us/KB   us/KB

524288 262144 262144    400.02     3734.29   5.01     5.03     3.678   3.671  
Thu Apr  7 19:09:33 MDT 2016



Thu Apr  7 19:09:43 MDT 2016
MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to r1-link1 () port 0 AF_INET : +/-5.000% @ 95% conf.  : histogram : interval : dirty data : demo
Recv   Send    Send                          Utilization       Service Demand
Socket Socket  Message  Elapsed              Send     Recv     Send    Recv
Size   Size    Size     Time     Throughput  local    remote   local   remote
bytes  bytes   bytes    secs.    10^6bits/s  % C      % C      us/KB   us/KB

1048576 524288 524288    400.02     2891.10   4.58     4.64     4.155   4.210  
Thu Apr  7 19:43:03 MDT 2016



Thu Apr  7 19:43:13 MDT 2016
MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to r1-link1 () port 0 AF_INET : +/-5.000% @ 95% conf.  : histogram : interval : dirty data : demo
!!! WARNING
!!! Desired confidence was not achieved within the specified iterations.
!!! This implies that there was variability in the test environment that
!!! must be investigated before going further.
!!! Confidence intervals: Throughput      : 11.692%
!!!                       Local CPU util  : 10.800%
!!!                       Remote CPU util : 7.792%

Recv   Send    Send                          Utilization       Service Demand
Socket Socket  Message  Elapsed              Send     Recv     Send    Recv
Size   Size    Size     Time     Throughput  local    remote   local   remote
bytes  bytes   bytes    secs.    10^6bits/s  % C      % C      us/KB   us/KB

2097152 1048576 1048576    400.04     2984.77   4.78     4.80     4.201   4.221  
Thu Apr  7 20:49:54 MDT 2016



Thu Apr  7 20:50:05 MDT 2016
MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to r1-link1 () port 0 AF_INET : +/-5.000% @ 95% conf.  : histogram : interval : dirty data : demo
Recv   Send    Send                          Utilization       Service Demand
Socket Socket  Message  Elapsed              Send     Recv     Send    Recv
Size   Size    Size     Time     Throughput  local    remote   local   remote
bytes  bytes   bytes    secs.    10^6bits/s  % C      % C      us/KB   us/KB

4194304 2097152 2097152    400.34     2908.97   4.66     4.68     4.196   4.213  
Thu Apr  7 21:16:47 MDT 2016



Thu Apr  7 21:16:57 MDT 2016
MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to r1-link1 () port 0 AF_INET : +/-5.000% @ 95% conf.  : histogram : interval : dirty data : demo
Recv   Send    Send                          Utilization       Service Demand
Socket Socket  Message  Elapsed              Send     Recv     Send    Recv
Size   Size    Size     Time     Throughput  local    remote   local   remote
bytes  bytes   bytes    secs.    10^6bits/s  % C      % C      us/KB   us/KB

8388608 4194304 4194304    400.28     2922.54   3.82     4.69     3.431   4.205  
Thu Apr  7 21:57:01 MDT 2016




[-- Attachment #4 --]
Sat Apr  9 09:56:28 MDT 2016
MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to s1-link1 () port 0 AF_INET : +/-5.000% @ 95% conf.  : histogram : interval : dirty data : demo
!!! WARNING
!!! Desired confidence was not achieved within the specified iterations.
!!! This implies that there was variability in the test environment that
!!! must be investigated before going further.
!!! Confidence intervals: Throughput      : 21.820%
!!!                       Local CPU util  : 10.174%
!!!                       Remote CPU util : 14.250%

Recv   Send    Send                          Utilization       Service Demand
Socket Socket  Message  Elapsed              Send     Recv     Send    Recv
Size   Size    Size     Time     Throughput  local    remote   local   remote
bytes  bytes   bytes    secs.    10^6bits/s  % C      % C      us/KB   us/KB

 65536  32768  32768    400.01     4537.24   5.17     5.28     3.039   3.085  
Sat Apr  9 11:03:08 MDT 2016



Sat Apr  9 11:03:19 MDT 2016
MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to s1-link1 () port 0 AF_INET : +/-5.000% @ 95% conf.  : histogram : interval : dirty data : demo
Recv   Send    Send                          Utilization       Service Demand
Socket Socket  Message  Elapsed              Send     Recv     Send    Recv
Size   Size    Size     Time     Throughput  local    remote   local   remote
bytes  bytes   bytes    secs.    10^6bits/s  % C      % C      us/KB   us/KB

131072  65536  65536    400.02     4628.43   5.70     5.42     3.231   3.071  
Sat Apr  9 11:49:59 MDT 2016



Sat Apr  9 11:50:09 MDT 2016
MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to s1-link1 () port 0 AF_INET : +/-5.000% @ 95% conf.  : histogram : interval : dirty data : demo
!!! WARNING
!!! Desired confidence was not achieved within the specified iterations.
!!! This implies that there was variability in the test environment that
!!! must be investigated before going further.
!!! Confidence intervals: Throughput      : 3.551%
!!!                       Local CPU util  : 10.216%
!!!                       Remote CPU util : 2.961%

Recv   Send    Send                          Utilization       Service Demand
Socket Socket  Message  Elapsed              Send     Recv     Send    Recv
Size   Size    Size     Time     Throughput  local    remote   local   remote
bytes  bytes   bytes    secs.    10^6bits/s  % C      % C      us/KB   us/KB

262144 131072 131072    400.01     4551.61   5.30     5.46     3.057   3.148  
Sat Apr  9 12:56:49 MDT 2016



Sat Apr  9 12:56:59 MDT 2016
MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to s1-link1 () port 0 AF_INET : +/-5.000% @ 95% conf.  : histogram : interval : dirty data : demo
!!! WARNING
!!! Desired confidence was not achieved within the specified iterations.
!!! This implies that there was variability in the test environment that
!!! must be investigated before going further.
!!! Confidence intervals: Throughput      : 25.918%
!!!                       Local CPU util  : 14.030%
!!!                       Remote CPU util : 11.608%

Recv   Send    Send                          Utilization       Service Demand
Socket Socket  Message  Elapsed              Send     Recv     Send    Recv
Size   Size    Size     Time     Throughput  local    remote   local   remote
bytes  bytes   bytes    secs.    10^6bits/s  % C      % C      us/KB   us/KB

524288 262144 262144    400.02     4137.23   5.51     5.19     3.587   3.356  
Sat Apr  9 14:03:40 MDT 2016



Sat Apr  9 14:03:50 MDT 2016
MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to s1-link1 () port 0 AF_INET : +/-5.000% @ 95% conf.  : histogram : interval : dirty data : demo
Recv   Send    Send                          Utilization       Service Demand
Socket Socket  Message  Elapsed              Send     Recv     Send    Recv
Size   Size    Size     Time     Throughput  local    remote   local   remote
bytes  bytes   bytes    secs.    10^6bits/s  % C      % C      us/KB   us/KB

1048576 524288 524288    400.02     2952.14   4.75     4.73     4.216   4.196  
Sat Apr  9 14:43:50 MDT 2016



Sat Apr  9 14:44:01 MDT 2016
MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to s1-link1 () port 0 AF_INET : +/-5.000% @ 95% conf.  : histogram : interval : dirty data : demo
Recv   Send    Send                          Utilization       Service Demand
Socket Socket  Message  Elapsed              Send     Recv     Send    Recv
Size   Size    Size     Time     Throughput  local    remote   local   remote
bytes  bytes   bytes    secs.    10^6bits/s  % C      % C      us/KB   us/KB

2097152 1048576 1048576    400.03     3001.44   4.94     4.84     4.310   4.231  
Sat Apr  9 15:44:02 MDT 2016



Sat Apr  9 15:44:12 MDT 2016
MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to s1-link1 () port 0 AF_INET : +/-5.000% @ 95% conf.  : histogram : interval : dirty data : demo
Recv   Send    Send                          Utilization       Service Demand
Socket Socket  Message  Elapsed              Send     Recv     Send    Recv
Size   Size    Size     Time     Throughput  local    remote   local   remote
bytes  bytes   bytes    secs.    10^6bits/s  % C      % C      us/KB   us/KB

4194304 2097152 2097152    400.34     2948.57   4.79     4.79     4.259   4.262  
Sat Apr  9 16:10:54 MDT 2016



Sat Apr  9 16:11:04 MDT 2016
MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to s1-link1 () port 0 AF_INET : +/-5.000% @ 95% conf.  : histogram : interval : dirty data : demo
Recv   Send    Send                          Utilization       Service Demand
Socket Socket  Message  Elapsed              Send     Recv     Send    Recv
Size   Size    Size     Time     Throughput  local    remote   local   remote
bytes  bytes   bytes    secs.    10^6bits/s  % C      % C      us/KB   us/KB

8388608 4194304 4194304    400.34     2940.28   4.28     4.70     3.811   4.194  
Sat Apr  9 17:04:30 MDT 2016




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?D3300112.FE63%Cheng.Cui>