Date: Sun, 10 Apr 2016 20:44:22 +0000 From: "Cui, Cheng" <Cheng.Cui@netapp.com> To: Hans Petter Selasky <hps@selasky.org> Cc: "svn-src-head@freebsd.org" <svn-src-head@freebsd.org> Subject: Re: question about trimning data "len" conditions in TSO in tcp_output.c Message-ID: <D3300112.FE63%Cheng.Cui@netapp.com> In-Reply-To: <D26BC410.B4AD%Cheng.Cui@netapp.com> References: <C44A2900-40E0-41EF-83B1-6DD4B31DABD5@netapp.com> <563D1892.3050406@selasky.org> <E6DBCD15-A41C-49A4-88C5-FB79EC969716@netapp.com> <563D2C26.2070300@selasky.org> <D26BC410.B4AD%Cheng.Cui@netapp.com>
next in thread | previous in thread | raw e-mail | index | archive | help
[-- Attachment #1 --]
Hi Hans,
I would continue this discussion with a different change. The piece of
change is
here and also I attached the patch "change.patch" against the FreeBSD HEAD
code-line.
diff --git a/sys/netinet/tcp_output.c b/sys/netinet/tcp_output.c
index 2043fc9..fa124f1 100644
--- a/sys/netinet/tcp_output.c
+++ b/sys/netinet/tcp_output.c
@@ -938,25 +938,16 @@ send:
* fractional unless the send sockbuf can be
* emptied:
*/
- max_len = (tp->t_maxseg - optlen);
- if ((off + len) < sbavail(&so->so_snd)) {
+ max_len = (tp->t_maxopd - optlen);
+ if (len > (max_len << 1)) {
moff = len % max_len;
if (moff != 0) {
len -= moff;
sendalot = 1;
}
}
-
- /*
- * In case there are too many small fragments
- * don't use TSO:
- */
- if (len <= max_len) {
- len = max_len;
- sendalot = 1;
- tso = 0;
- }
-
+ KASSERT(len > max_len,
+ ("[%s:%d]: len <= max_len", __func__,
__LINE__));
/*
* Send the FIN in a separate segment
* after the bulk sending is done.
I think this change could save additional loops that send single MSS-size
packets. So I think some CPU cycles can be saved as well, due to this
change
reduced software sends and pushed more data to offloading sends.
Here is my test. The iperf command I choose pushes 100Mbytes data to the
wire by setting the default TCP sendspace to 1MB and recvspace to 2MB. I
tested this TCP connection performance on a pair of 10Gbps FreeBSD 10.2
nodes
(s1 and r1) with a switch in between. Both nodes have TSO and delayed ACK
enabled.
root@s1:~ # ping -c 3 r1
PING r1-link1 (10.1.2.3): 56 data bytes
64 bytes from 10.1.2.3: icmp_seq=0 ttl=64 time=0.045 ms
64 bytes from 10.1.2.3: icmp_seq=1 ttl=64 time=0.037 ms
64 bytes from 10.1.2.3: icmp_seq=2 ttl=64 time=0.038 ms
--- r1-link1 ping statistics ---
3 packets transmitted, 3 packets received, 0.0% packet loss
round-trip min/avg/max/stddev = 0.037/0.040/0.045/0.004 ms
1M snd buffer/2M rcv buffer
sysctl -w net.inet.tcp.hostcache.expire=1
sysctl -w net.inet.tcp.sendspace=1048576
sysctl -w net.inet.tcp.recvspace=2097152
iperf -s <== iperf command@receiver
iperf -c r1 -m -n 100M <== iperf command@sender
root@s1:~ # iperf -c r1 -m -n 100M
------------------------------------------------------------
Client connecting to r1, TCP port 5001
TCP window size: 1.00 MByte (default)
------------------------------------------------------------
[ 3] local 10.1.2.2 port 22491 connected with 10.1.2.3 port 5001
[ ID] Interval Transfer Bandwidth
[ 3] 0.0- 0.3 sec 100 MBytes 2.69 Gbits/sec
[ 3] MSS size 1448 bytes (MTU 1500 bytes, ethernet)
root@r1:~ # iperf -s
------------------------------------------------------------
Server listening on TCP port 5001
TCP window size: 2.00 MByte (default)
------------------------------------------------------------
[ 4] local 10.1.2.3 port 5001 connected with 10.1.2.2 port 22491
[ ID] Interval Transfer Bandwidth
[ 4] 0.0- 0.3 sec 100 MBytes 2.62 Gbits/sec
Each test sent 100MBytes of data, and I collected the packet trace from
both
nodes by tcpdump. I did this test twice to confirm the result can be
reproduced.
>From the trace files of both nodes before my code change, I see a lot of
single-MSS size packets. See the attached trace files in
"before_change.zip".
For example, in a sender trace file I see 43480 single-MSS size
packets(tcp.len==1448) out of 57005 packets that contain data(tcp.len >
0).
That's 76.2%.
And I did the same iperf test and gathered trace files. I did not find
many single-MSS packets this time. See the attached trace files in
"after_change.zip". For example, in a sender trace file I see zero
single-MSS
size packets(tcp.len==1448) out of 35729 data packets(tcp.len > 0).
Compared with the receiver traces, I did not see significant more
fractional
packets received after change.
I also did tests using netperf, although I did not get enough 95%
confidence for
every test on snd/rcv buffer size. Attached are my netperf result on
different
snd/rcv buffer size before and after the change (netperf_before_change.txt
and
netperf_after_change.txt), which also look good.
used netperf command:
netperf -H s1 -t TCP_STREAM -C -c -l 400 -i 10,3 -I 95,10 -- -s
${LocalSndBuf} -S ${RemoteSndBuf}
Thanks,
--Cheng Cui
NetApp Scale Out Networking
[-- Attachment #2 --]
diff --git a/sys/netinet/tcp_output.c b/sys/netinet/tcp_output.c
index 2043fc9..fa124f1 100644
--- a/sys/netinet/tcp_output.c
+++ b/sys/netinet/tcp_output.c
@@ -938,25 +938,16 @@ send:
* fractional unless the send sockbuf can be
* emptied:
*/
- max_len = (tp->t_maxseg - optlen);
- if ((off + len) < sbavail(&so->so_snd)) {
+ max_len = (tp->t_maxopd - optlen);
+ if (len > (max_len << 1)) {
moff = len % max_len;
if (moff != 0) {
len -= moff;
sendalot = 1;
}
}
-
- /*
- * In case there are too many small fragments
- * don't use TSO:
- */
- if (len <= max_len) {
- len = max_len;
- sendalot = 1;
- tso = 0;
- }
-
+ KASSERT(len > max_len,
+ ("[%s:%d]: len <= max_len", __func__, __LINE__));
/*
* Send the FIN in a separate segment
* after the bulk sending is done.
[-- Attachment #3 --]
Thu Apr 7 14:42:21 MDT 2016
MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to r1-link1 () port 0 AF_INET : +/-5.000% @ 95% conf. : histogram : interval : dirty data : demo
!!! WARNING
!!! Desired confidence was not achieved within the specified iterations.
!!! This implies that there was variability in the test environment that
!!! must be investigated before going further.
!!! Confidence intervals: Throughput : 10.783%
!!! Local CPU util : 6.277%
!!! Remote CPU util : 1.153%
Recv Send Send Utilization Service Demand
Socket Socket Message Elapsed Send Recv Send Recv
Size Size Size Time Throughput local remote local remote
bytes bytes bytes secs. 10^6bits/s % C % C us/KB us/KB
65536 32768 32768 400.01 4670.31 4.87 5.48 2.747 3.091
Thu Apr 7 15:49:02 MDT 2016
Thu Apr 7 15:49:12 MDT 2016
MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to r1-link1 () port 0 AF_INET : +/-5.000% @ 95% conf. : histogram : interval : dirty data : demo
!!! WARNING
!!! Desired confidence was not achieved within the specified iterations.
!!! This implies that there was variability in the test environment that
!!! must be investigated before going further.
!!! Confidence intervals: Throughput : 7.347%
!!! Local CPU util : 11.658%
!!! Remote CPU util : 1.524%
Recv Send Send Utilization Service Demand
Socket Socket Message Elapsed Send Recv Send Recv
Size Size Size Time Throughput local remote local remote
bytes bytes bytes secs. 10^6bits/s % C % C us/KB us/KB
131072 65536 65536 400.01 4742.45 4.99 5.53 2.759 3.064
Thu Apr 7 16:55:52 MDT 2016
Thu Apr 7 16:56:02 MDT 2016
MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to r1-link1 () port 0 AF_INET : +/-5.000% @ 95% conf. : histogram : interval : dirty data : demo
!!! WARNING
!!! Desired confidence was not achieved within the specified iterations.
!!! This implies that there was variability in the test environment that
!!! must be investigated before going further.
!!! Confidence intervals: Throughput : 10.212%
!!! Local CPU util : 12.850%
!!! Remote CPU util : 0.874%
Recv Send Send Utilization Service Demand
Socket Socket Message Elapsed Send Recv Send Recv
Size Size Size Time Throughput local remote local remote
bytes bytes bytes secs. 10^6bits/s % C % C us/KB us/KB
262144 131072 131072 400.02 4881.49 5.42 5.53 2.915 2.981
Thu Apr 7 18:02:42 MDT 2016
Thu Apr 7 18:02:52 MDT 2016
MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to r1-link1 () port 0 AF_INET : +/-5.000% @ 95% conf. : histogram : interval : dirty data : demo
!!! WARNING
!!! Desired confidence was not achieved within the specified iterations.
!!! This implies that there was variability in the test environment that
!!! must be investigated before going further.
!!! Confidence intervals: Throughput : 36.686%
!!! Local CPU util : 12.641%
!!! Remote CPU util : 12.322%
Recv Send Send Utilization Service Demand
Socket Socket Message Elapsed Send Recv Send Recv
Size Size Size Time Throughput local remote local remote
bytes bytes bytes secs. 10^6bits/s % C % C us/KB us/KB
524288 262144 262144 400.02 3734.29 5.01 5.03 3.678 3.671
Thu Apr 7 19:09:33 MDT 2016
Thu Apr 7 19:09:43 MDT 2016
MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to r1-link1 () port 0 AF_INET : +/-5.000% @ 95% conf. : histogram : interval : dirty data : demo
Recv Send Send Utilization Service Demand
Socket Socket Message Elapsed Send Recv Send Recv
Size Size Size Time Throughput local remote local remote
bytes bytes bytes secs. 10^6bits/s % C % C us/KB us/KB
1048576 524288 524288 400.02 2891.10 4.58 4.64 4.155 4.210
Thu Apr 7 19:43:03 MDT 2016
Thu Apr 7 19:43:13 MDT 2016
MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to r1-link1 () port 0 AF_INET : +/-5.000% @ 95% conf. : histogram : interval : dirty data : demo
!!! WARNING
!!! Desired confidence was not achieved within the specified iterations.
!!! This implies that there was variability in the test environment that
!!! must be investigated before going further.
!!! Confidence intervals: Throughput : 11.692%
!!! Local CPU util : 10.800%
!!! Remote CPU util : 7.792%
Recv Send Send Utilization Service Demand
Socket Socket Message Elapsed Send Recv Send Recv
Size Size Size Time Throughput local remote local remote
bytes bytes bytes secs. 10^6bits/s % C % C us/KB us/KB
2097152 1048576 1048576 400.04 2984.77 4.78 4.80 4.201 4.221
Thu Apr 7 20:49:54 MDT 2016
Thu Apr 7 20:50:05 MDT 2016
MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to r1-link1 () port 0 AF_INET : +/-5.000% @ 95% conf. : histogram : interval : dirty data : demo
Recv Send Send Utilization Service Demand
Socket Socket Message Elapsed Send Recv Send Recv
Size Size Size Time Throughput local remote local remote
bytes bytes bytes secs. 10^6bits/s % C % C us/KB us/KB
4194304 2097152 2097152 400.34 2908.97 4.66 4.68 4.196 4.213
Thu Apr 7 21:16:47 MDT 2016
Thu Apr 7 21:16:57 MDT 2016
MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to r1-link1 () port 0 AF_INET : +/-5.000% @ 95% conf. : histogram : interval : dirty data : demo
Recv Send Send Utilization Service Demand
Socket Socket Message Elapsed Send Recv Send Recv
Size Size Size Time Throughput local remote local remote
bytes bytes bytes secs. 10^6bits/s % C % C us/KB us/KB
8388608 4194304 4194304 400.28 2922.54 3.82 4.69 3.431 4.205
Thu Apr 7 21:57:01 MDT 2016
[-- Attachment #4 --]
Sat Apr 9 09:56:28 MDT 2016
MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to s1-link1 () port 0 AF_INET : +/-5.000% @ 95% conf. : histogram : interval : dirty data : demo
!!! WARNING
!!! Desired confidence was not achieved within the specified iterations.
!!! This implies that there was variability in the test environment that
!!! must be investigated before going further.
!!! Confidence intervals: Throughput : 21.820%
!!! Local CPU util : 10.174%
!!! Remote CPU util : 14.250%
Recv Send Send Utilization Service Demand
Socket Socket Message Elapsed Send Recv Send Recv
Size Size Size Time Throughput local remote local remote
bytes bytes bytes secs. 10^6bits/s % C % C us/KB us/KB
65536 32768 32768 400.01 4537.24 5.17 5.28 3.039 3.085
Sat Apr 9 11:03:08 MDT 2016
Sat Apr 9 11:03:19 MDT 2016
MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to s1-link1 () port 0 AF_INET : +/-5.000% @ 95% conf. : histogram : interval : dirty data : demo
Recv Send Send Utilization Service Demand
Socket Socket Message Elapsed Send Recv Send Recv
Size Size Size Time Throughput local remote local remote
bytes bytes bytes secs. 10^6bits/s % C % C us/KB us/KB
131072 65536 65536 400.02 4628.43 5.70 5.42 3.231 3.071
Sat Apr 9 11:49:59 MDT 2016
Sat Apr 9 11:50:09 MDT 2016
MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to s1-link1 () port 0 AF_INET : +/-5.000% @ 95% conf. : histogram : interval : dirty data : demo
!!! WARNING
!!! Desired confidence was not achieved within the specified iterations.
!!! This implies that there was variability in the test environment that
!!! must be investigated before going further.
!!! Confidence intervals: Throughput : 3.551%
!!! Local CPU util : 10.216%
!!! Remote CPU util : 2.961%
Recv Send Send Utilization Service Demand
Socket Socket Message Elapsed Send Recv Send Recv
Size Size Size Time Throughput local remote local remote
bytes bytes bytes secs. 10^6bits/s % C % C us/KB us/KB
262144 131072 131072 400.01 4551.61 5.30 5.46 3.057 3.148
Sat Apr 9 12:56:49 MDT 2016
Sat Apr 9 12:56:59 MDT 2016
MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to s1-link1 () port 0 AF_INET : +/-5.000% @ 95% conf. : histogram : interval : dirty data : demo
!!! WARNING
!!! Desired confidence was not achieved within the specified iterations.
!!! This implies that there was variability in the test environment that
!!! must be investigated before going further.
!!! Confidence intervals: Throughput : 25.918%
!!! Local CPU util : 14.030%
!!! Remote CPU util : 11.608%
Recv Send Send Utilization Service Demand
Socket Socket Message Elapsed Send Recv Send Recv
Size Size Size Time Throughput local remote local remote
bytes bytes bytes secs. 10^6bits/s % C % C us/KB us/KB
524288 262144 262144 400.02 4137.23 5.51 5.19 3.587 3.356
Sat Apr 9 14:03:40 MDT 2016
Sat Apr 9 14:03:50 MDT 2016
MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to s1-link1 () port 0 AF_INET : +/-5.000% @ 95% conf. : histogram : interval : dirty data : demo
Recv Send Send Utilization Service Demand
Socket Socket Message Elapsed Send Recv Send Recv
Size Size Size Time Throughput local remote local remote
bytes bytes bytes secs. 10^6bits/s % C % C us/KB us/KB
1048576 524288 524288 400.02 2952.14 4.75 4.73 4.216 4.196
Sat Apr 9 14:43:50 MDT 2016
Sat Apr 9 14:44:01 MDT 2016
MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to s1-link1 () port 0 AF_INET : +/-5.000% @ 95% conf. : histogram : interval : dirty data : demo
Recv Send Send Utilization Service Demand
Socket Socket Message Elapsed Send Recv Send Recv
Size Size Size Time Throughput local remote local remote
bytes bytes bytes secs. 10^6bits/s % C % C us/KB us/KB
2097152 1048576 1048576 400.03 3001.44 4.94 4.84 4.310 4.231
Sat Apr 9 15:44:02 MDT 2016
Sat Apr 9 15:44:12 MDT 2016
MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to s1-link1 () port 0 AF_INET : +/-5.000% @ 95% conf. : histogram : interval : dirty data : demo
Recv Send Send Utilization Service Demand
Socket Socket Message Elapsed Send Recv Send Recv
Size Size Size Time Throughput local remote local remote
bytes bytes bytes secs. 10^6bits/s % C % C us/KB us/KB
4194304 2097152 2097152 400.34 2948.57 4.79 4.79 4.259 4.262
Sat Apr 9 16:10:54 MDT 2016
Sat Apr 9 16:11:04 MDT 2016
MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to s1-link1 () port 0 AF_INET : +/-5.000% @ 95% conf. : histogram : interval : dirty data : demo
Recv Send Send Utilization Service Demand
Socket Socket Message Elapsed Send Recv Send Recv
Size Size Size Time Throughput local remote local remote
bytes bytes bytes secs. 10^6bits/s % C % C us/KB us/KB
8388608 4194304 4194304 400.34 2940.28 4.28 4.70 3.811 4.194
Sat Apr 9 17:04:30 MDT 2016
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?D3300112.FE63%Cheng.Cui>
