Date: Fri, 24 Jan 2014 00:06:49 -0500 From: J David <j.david.lists@gmail.com> To: Rick Macklem <rmacklem@uoguelph.ca> Cc: freebsd-net@freebsd.org Subject: Re: Terrible NFS performance under 9.2-RELEASE? Message-ID: <CABXB=RSebaWTD1LjQz__ZZ3EJwTpOMpxq0Q=bt4280dx%2B0auCw@mail.gmail.com> In-Reply-To: <390483613.15499210.1390530437153.JavaMail.root@uoguelph.ca> References: <CABXB=RToav%2B%2BV38pOorVPWpgZSuYmL-x7e8oxd3ayJCmAtLn-g@mail.gmail.com> <390483613.15499210.1390530437153.JavaMail.root@uoguelph.ca>
next in thread | previous in thread | raw e-mail | index | archive | help
On Thu, Jan 23, 2014 at 9:27 PM, Rick Macklem <rmacklem@uoguelph.ca> wrote: > Well, my TCP is pretty rusty, but... > Since your stats didn't show any jumbo frames, each IP > datagram needs to fit in the MTU of 1500bytes. NFS hands an mbuf > list of just over 64K (or 32K) to TCP in a single sosend(), then TCP > will generate about 45 (or about 23 for 32K) TCP segments and put > each in an IP datagram, then hand it to the network device driver > for transmission. This is *not* what happens with TSO/LRO. With TSO, TCP generates IP datagrams of up to 64k which are passed directly to the driver, which passes them directly to the hardware. Furthermore, in this unique case (two virtual machines on the same host and bridge with both TSO and LRO enabled end-to-end), the packet is *never* fragmented. The host takes the 64k packet off of one guest's output ring and puts it onto the other guest's input ring, intact. This is, as you might expect, a *massive* performance win. With TSO & LRO: $ time iperf -c 172.20.20.162 -d ------------------------------------------------------------ Server listening on TCP port 5001 TCP window size: 1.00 MByte (default) ------------------------------------------------------------ ------------------------------------------------------------ Client connecting to 172.20.20.162, TCP port 5001 TCP window size: 1.00 MByte (default) ------------------------------------------------------------ [ 5] local 172.20.20.169 port 60889 connected with 172.20.20.162 port 5001 [ 4] local 172.20.20.169 port 5001 connected with 172.20.20.162 port 44101 [ ID] Interval Transfer Bandwidth [ 5] 0.0-10.0 sec 17.0 GBytes 14.6 Gbits/sec [ 4] 0.0-10.0 sec 17.4 GBytes 14.9 Gbits/sec real 0m10.061s user 0m0.229s sys 0m7.711s Without TSO & LRO: $ time iperf -c 172.20.20.162 -d ------------------------------------------------------------ Server listening on TCP port 5001 TCP window size: 1.00 MByte (default) ------------------------------------------------------------ ------------------------------------------------------------ Client connecting to 172.20.20.162, TCP port 5001 TCP window size: 1.26 MByte (default) ------------------------------------------------------------ [ 5] local 172.20.20.169 port 22088 connected with 172.20.20.162 port 5001 [ 4] local 172.20.20.169 port 5001 connected with 172.20.20.162 port 48615 [ ID] Interval Transfer Bandwidth [ 5] 0.0-10.0 sec 637 MBytes 534 Mbits/sec [ 4] 0.0-10.0 sec 767 MBytes 642 Mbits/sec real 0m10.057s user 0m0.231s sys 0m3.935s Look at the difference. In this bidirectional test, TSO is over 25x faster using not even 2x the CPU. This shows how essential TSO/LRO is if you plan to move data at real world speeds and still have enough CPU left to operate on that data. > I recall you saying you tried turning off TSO with no > effect. You might also try turning off checksum offload. I doubt it will > be where things are broken, but might be worth a try. That was not me, that was someone else. If there is a problem with NFS and TSO, the solution is *not* to disable TSO. That is, at best, a workaround that produces much more CPU load and much less throughput. The solution is to find the problem and fix it. More data to follow. Thanks!
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CABXB=RSebaWTD1LjQz__ZZ3EJwTpOMpxq0Q=bt4280dx%2B0auCw>