Date: Tue, 18 Mar 2014 10:39:57 +0200 From: Alexander Motin <mav@FreeBSD.org> To: Rick Macklem <rmacklem@uoguelph.ca>, FreeBSD Filesystems <freebsd-fs@freebsd.org> Subject: Re: review/test: NFS patch to use pagesize mbuf clusters Message-ID: <5328065D.60201@FreeBSD.org> In-Reply-To: <570922189.23999456.1395105983047.JavaMail.root@uoguelph.ca> References: <570922189.23999456.1395105983047.JavaMail.root@uoguelph.ca>
next in thread | previous in thread | raw e-mail | index | archive | help
Hi. On 18.03.2014 03:26, Rick Macklem wrote: > Several of the TSO capable network interfaces have a limit of > 32 mbufs in the transmit mbuf chain (the drivers call these transmit > segments, which I admit I find confusing). > > For a 64K read/readdir reply or 64K write request, NFS passes > a list of 34 mbufs down to TCP. TCP will split the list, since > it is slightly more than 64K bytes, but that split will normally > be a copy by reference of the last mbuf cluster. As such, normally > the network interface will get a list of 34 mbufs. > > For TSO enabled interfaces that are limited to 32 mbufs in the > list, the usual workaround in the driver is to copy { real copy, > not copy by reference } the list to 32 mbuf clusters via m_defrag(). > (A few drivers use m_collapse() which is less likely to succeed.) > > As a workaround to this problem, the attached patch modifies NFS > to use larger pagesize clusters, so that the 64K RPC message is > in 18 mbufs (assuming a 4K pagesize). > > Testing on my slow hardware which does not have TSO capability > shows it to be performance neutral, but I believe avoiding the > overhead of copying via m_defrag() { and possible failures > resulting in the message never being transmitted } makes this > patch worth doing. > > As such, I'd like to request review and/or testing of this patch > by anyone who can do so. First, I've tried to find respective NIC to test: cxgb/cxgbe have limit of 36, and so probably unaffected, ixgb -- 100, igb -- 64, only on em I've found limit of 32. I run several profiles on em NIC with and without the patch. I can confirm that without the patch m_defrag() is indeed called, while with patch it is not any more. But profiler shows to me that very small amount of time (percents or even fractions) is spent there. I can't measure the effect (my Core-i7 desktop test system has only about 5% CPU load while serving full 1Gbps NFS over the em), though I can't say for sure that effect can't be there on some low-end system. I am also not very sure about replacing M_WAITOK with M_NOWAIT. Instead of waiting a bit while VM find a cluster, NFSMCLGET() will return single mbuf, as result, replacing chain of 2K clusters instead of 4K ones with chain of 256b mbufs. -- Alexander Motin
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?5328065D.60201>