Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 24 Jan 2014 17:54:29 -0500 (EST)
From:      Rick Macklem <rmacklem@uoguelph.ca>
To:        J David <j.david.lists@gmail.com>
Cc:        freebsd-net@freebsd.org
Subject:   Re: Terrible NFS performance under 9.2-RELEASE?
Message-ID:  <659117348.16015750.1390604069888.JavaMail.root@uoguelph.ca>
In-Reply-To: <CABXB=RSebaWTD1LjQz__ZZ3EJwTpOMpxq0Q=bt4280dx%2B0auCw@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
J David wrote:
> On Thu, Jan 23, 2014 at 9:27 PM, Rick Macklem <rmacklem@uoguelph.ca>
> wrote:
> > Well, my TCP is pretty rusty, but...
> > Since your stats didn't show any jumbo frames, each IP
> > datagram needs to fit in the MTU of 1500bytes. NFS hands an mbuf
> > list of just over 64K (or 32K) to TCP in a single sosend(), then
> > TCP
> > will generate about 45 (or about 23 for 32K) TCP segments and put
> > each in an IP datagram, then hand it to the network device driver
> > for transmission.
> 
> This is *not* what happens with TSO/LRO.
> 
> With TSO, TCP generates IP datagrams of up to 64k which are passed
> directly to the driver, which passes them directly to the hardware.
> 
> Furthermore, in this unique case (two virtual machines on the same
> host and bridge with both TSO and LRO enabled end-to-end), the packet
> is *never* fragmented.  The host takes the 64k packet off of one
> guest's output ring and puts it onto the other guest's input ring,
> intact.
> 
> This is, as you might expect, a *massive* performance win.
> 
Ok, I mistakenly assumed that this driver emulated an ethernet. It
does not (at least w.r.t. MTU). It appears that it allows an MTU of
up to 64K (I had never heard of such a thing until now).

So, who knows what effect that has when an NFS RPC message is just
over 64K.

The largest jumbo packet supported by the generic mbuf code is 16K
(or maybe 9K for 9.2). I have no idea if this matters or not.

I've cc'd glebius, since he's the last guy to make commits to the
virtio network driver. Maybe he can guess at what is going on.

rick

> With TSO & LRO:
> 
> $ time iperf -c 172.20.20.162  -d
> 
> ------------------------------------------------------------
> 
> Server listening on TCP port 5001
> 
> TCP window size: 1.00 MByte (default)
> 
> ------------------------------------------------------------
> 
> ------------------------------------------------------------
> 
> Client connecting to 172.20.20.162, TCP port 5001
> 
> TCP window size: 1.00 MByte (default)
> 
> ------------------------------------------------------------
> 
> [  5] local 172.20.20.169 port 60889 connected with 172.20.20.162
> port 5001
> 
> [  4] local 172.20.20.169 port 5001 connected with 172.20.20.162 port
> 44101
> 
> [ ID] Interval       Transfer     Bandwidth
> 
> [  5]  0.0-10.0 sec  17.0 GBytes  14.6 Gbits/sec
> 
> [  4]  0.0-10.0 sec  17.4 GBytes  14.9 Gbits/sec
> 
> 
> real 0m10.061s
> 
> user 0m0.229s
> 
> sys 0m7.711s
> 
> 
> Without TSO & LRO:
> 
> 
> $ time iperf -c 172.20.20.162  -d
> 
> ------------------------------------------------------------
> 
> Server listening on TCP port 5001
> 
> TCP window size: 1.00 MByte (default)
> 
> ------------------------------------------------------------
> 
> ------------------------------------------------------------
> 
> Client connecting to 172.20.20.162, TCP port 5001
> 
> TCP window size: 1.26 MByte (default)
> 
> ------------------------------------------------------------
> 
> [  5] local 172.20.20.169 port 22088 connected with 172.20.20.162
> port 5001
> 
> [  4] local 172.20.20.169 port 5001 connected with 172.20.20.162 port
> 48615
> 
> [ ID] Interval       Transfer     Bandwidth
> 
> [  5]  0.0-10.0 sec   637 MBytes   534 Mbits/sec
> 
> [  4]  0.0-10.0 sec   767 MBytes   642 Mbits/sec
> 
> 
> real 0m10.057s
> 
> user 0m0.231s
> 
> sys 0m3.935s
> 
> 
> Look at the difference.  In this bidirectional test, TSO is over 25x
> faster using not even 2x the CPU.  This shows how essential TSO/LRO
> is
> if you plan to move data at real world speeds and still have enough
> CPU left to operate on that data.
> 
> 
> > I recall you saying you tried turning off TSO with no
> > effect. You might also try turning off checksum offload. I doubt it
> > will
> > be where things are broken, but might be worth a try.
> 
> That was not me, that was someone else.  If there is a problem with
> NFS and TSO, the solution is *not* to disable TSO.  That is, at best,
> a workaround that produces much more CPU load and much less
> throughput.  The solution is to find the problem and fix it.
> 
But disabling it will identify if that is causing the problem. And it
is a workaround that often helps people get things to work. (With real
hardware, there may be no way to "fix" such things, depending on the
chipset, etc.)

rick
ps: If you had looked at the link I had in the email, you would have
    seen that he gets very good performance once he disables TSO. As
    they say, your mileage may vary.

> More data to follow.
> 
> Thanks!
> _______________________________________________
> freebsd-net@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-net
> To unsubscribe, send any mail to
> "freebsd-net-unsubscribe@freebsd.org"
> 



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?659117348.16015750.1390604069888.JavaMail.root>