Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 18 Mar 2014 20:06:52 -0400 (EDT)
From:      Rick Macklem <rmacklem@uoguelph.ca>
To:        araujo@FreeBSD.org
Cc:        FreeBSD Filesystems <freebsd-fs@freebsd.org>, Alexander Motin <mav@freebsd.org>
Subject:   Re: review/test: NFS patch to use pagesize mbuf clusters
Message-ID:  <459657309.24706896.1395187612496.JavaMail.root@uoguelph.ca>
In-Reply-To: <CAOfEmZhYCsA8bCHW2WGokSsFeLM3XbzaTOXJv=AWdpDEo7jLZg@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
Marcelo Araujo wrote:
> 
> Hello Rick,
> 
> 
> I have couple machines with 10G interface capable with TSO.
> Which kind of result do you expecting? Is it a speed up in read?
> 
Well, if NFS is working well on these systems, I would hope you
don't see any regression.

If your TSO enabled interfaces can handle more than 32 transmit
segments (there is usually a #define constant in the driver with
something like TX_SEGMAX in it and if this is >= 34 you should
see very little effect).

Even if your network interface is one of the ones limited to 32
transmit segments, the driver usually fixes the list via a call
to m_defrag(). Although this involves a bunch of bcopy()'ng, you
still might not see any easily measured performance improvement,
assuming m_defrag() is getting the job done.
(Network latency and disk latency in the server will predominate,
 I suspect. A server built entirely using SSDs might be a different
 story?)

Thanks for doing testing, since a lack of a regression is what I
care about most. (I am hoping this resolves cases where users have
had to disable TSO to make NFS work ok for them.)

rick

> 
> I'm gonna make some tests today, but against 9.1-RELEASE, where my
> servers are working on.
> 
> 
> Best Regards,
> 
> 
> 
> 
> 
> 2014-03-18 9:26 GMT+08:00 Rick Macklem < rmacklem@uoguelph.ca > :
> 
> 
> Hi,
> 
> Several of the TSO capable network interfaces have a limit of
> 32 mbufs in the transmit mbuf chain (the drivers call these transmit
> segments, which I admit I find confusing).
> 
> For a 64K read/readdir reply or 64K write request, NFS passes
> a list of 34 mbufs down to TCP. TCP will split the list, since
> it is slightly more than 64K bytes, but that split will normally
> be a copy by reference of the last mbuf cluster. As such, normally
> the network interface will get a list of 34 mbufs.
> 
> For TSO enabled interfaces that are limited to 32 mbufs in the
> list, the usual workaround in the driver is to copy { real copy,
> not copy by reference } the list to 32 mbuf clusters via m_defrag().
> (A few drivers use m_collapse() which is less likely to succeed.)
> 
> As a workaround to this problem, the attached patch modifies NFS
> to use larger pagesize clusters, so that the 64K RPC message is
> in 18 mbufs (assuming a 4K pagesize).
> 
> Testing on my slow hardware which does not have TSO capability
> shows it to be performance neutral, but I believe avoiding the
> overhead of copying via m_defrag() { and possible failures
> resulting in the message never being transmitted } makes this
> patch worth doing.
> 
> As such, I'd like to request review and/or testing of this patch
> by anyone who can do so.
> 
> Thanks in advance for your help, rick
> ps: If you don't get the attachment, just email and I'll
> send you a copy.
> 
> _______________________________________________
> freebsd-fs@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-fs
> To unsubscribe, send any mail to " freebsd-fs-unsubscribe@freebsd.org
> "
> 
> 
> 
> 
> --
> Marcelo Araujo
> araujo@FreeBSD.org



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?459657309.24706896.1395187612496.JavaMail.root>