Date: Mon, 27 Jan 2014 14:50:47 +0900 From: Yonghyeon PYUN <pyunyh@gmail.com> To: Rick Macklem <rmacklem@uoguelph.ca> Cc: freebsd-net@freebsd.org, Adam McDougall <mcdouga9@egr.msu.edu> Subject: Re: Terrible NFS performance under 9.2-RELEASE? Message-ID: <20140127055047.GA1368@michelle.cdnetworks.com> In-Reply-To: <1629593139.16590858.1390789014324.JavaMail.root@uoguelph.ca> References: <52DC1241.7010004@egr.msu.edu> <1629593139.16590858.1390789014324.JavaMail.root@uoguelph.ca>
next in thread | previous in thread | raw e-mail | index | archive | help
On Sun, Jan 26, 2014 at 09:16:54PM -0500, Rick Macklem wrote: > Adam McDougall wrote: > > Also try rsize=32768,wsize=32768 in your mount options, made a huge > > difference for me. I've noticed slow file transfers on NFS in 9 and > > finally did some searching a couple months ago, someone suggested it > > and > > they were on to something. > > > I have a "hunch" that might explain why 64K NFS reads/writes perform > poorly for some network environments. > A 64K NFS read reply/write request consists of a list of 34 mbufs when > passed to TCP via sosend() and a total data length of around 65680bytes. > Looking at a couple of drivers (virtio and ixgbe), they seem to expect > no more than 32-33 mbufs in a list for a 65535 byte TSO xmit. I think > (I don't have anything that does TSO to confirm this) that NFS will pass > a list that is longer (34 plus a TCP/IP header). > At a glance, it appears that the drivers call m_defrag() or m_collapse() > when the mbuf list won't fit in their scatter table (32 or 33 elements) > and if this fails, just silently drop the data without sending it. > If I'm right, there would considerable overhead from m_defrag()/m_collapse() > and near disaster if they fail to fix the problem and the data is silently > dropped instead of xmited. > I think the actual number of DMA segments allocated for the mbuf chain is determined by bus_dma(9). bus_dma(9) will coalesce current segment with previous segment if possible. I'm not sure whether you're referring to ixgbe(4) or ix(4) but I see the total length of all segment size of ix(4) is 65535 so it has no room for ethernet/VLAN header of the mbuf chain. The driver should be fixed to transmit a 64KB datagram. I think the use of m_defrag(9) in TSO is suboptimal. All TSO capable controllers are able to handle multiple TX buffers so it should have used m_collapse(9) rather than copying entire chain with m_defrag(9). > Anyhow, I have attached a patch that makes NFS use MJUMPAGESIZE clusters, > so the mbuf count drops from 34 to 18. > Could we make it conditional on size? > If anyone has a TSO scatter/gather enabled net interface and can test this > patch on it with NFS I/O (default of 64K rsize/wsize) when TSO is enabled > and see what effect it has, that would be appreciated. > > Btw, thanks go to Garrett Wollman for suggesting the change to MJUMPAGESIZE > clusters. > > rick > ps: If the attachment doesn't make it through and you want the patch, just > email me and I'll send you a copy. >
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20140127055047.GA1368>