From owner-freebsd-net@FreeBSD.ORG Tue Jan 28 00:28:31 2014 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 4CCA03B0 for ; Tue, 28 Jan 2014 00:28:31 +0000 (UTC) Received: from h2.funkthat.com (gate2.funkthat.com [208.87.223.18]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 2815E1156 for ; Tue, 28 Jan 2014 00:28:30 +0000 (UTC) Received: from h2.funkthat.com (localhost [127.0.0.1]) by h2.funkthat.com (8.14.3/8.14.3) with ESMTP id s0S0SRwC063807 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Mon, 27 Jan 2014 16:28:27 -0800 (PST) (envelope-from jmg@h2.funkthat.com) Received: (from jmg@localhost) by h2.funkthat.com (8.14.3/8.14.3/Submit) id s0S0SQ4E063806; Mon, 27 Jan 2014 16:28:26 -0800 (PST) (envelope-from jmg) Date: Mon, 27 Jan 2014 16:28:26 -0800 From: John-Mark Gurney To: Rick Macklem Subject: Re: Terrible NFS performance under 9.2-RELEASE? Message-ID: <20140128002826.GU13704@funkthat.com> Mail-Followup-To: Rick Macklem , freebsd-net@freebsd.org, Adam McDougall References: <20140127032338.GP13704@funkthat.com> <222089865.17245782.1390866430479.JavaMail.root@uoguelph.ca> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <222089865.17245782.1390866430479.JavaMail.root@uoguelph.ca> User-Agent: Mutt/1.4.2.3i X-Operating-System: FreeBSD 7.2-RELEASE i386 X-PGP-Fingerprint: 54BA 873B 6515 3F10 9E88 9322 9CB1 8F74 6D3F A396 X-Files: The truth is out there X-URL: http://resnet.uoregon.edu/~gurney_j/ X-Resume: http://resnet.uoregon.edu/~gurney_j/resume.html X-to-the-FBI-CIA-and-NSA: HI! HOW YA DOIN? can i haz chizburger? X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.2.2 (h2.funkthat.com [127.0.0.1]); Mon, 27 Jan 2014 16:28:27 -0800 (PST) Cc: freebsd-net@freebsd.org, Adam McDougall X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.17 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 28 Jan 2014 00:28:31 -0000 Rick Macklem wrote this message on Mon, Jan 27, 2014 at 18:47 -0500: > John-Mark Gurney wrote: > > Rick Macklem wrote this message on Sun, Jan 26, 2014 at 21:16 -0500: > > > Btw, thanks go to Garrett Wollman for suggesting the change to > > > MJUMPAGESIZE > > > clusters. > > > > > > rick > > > ps: If the attachment doesn't make it through and you want the > > > patch, just > > > email me and I'll send you a copy. > > > > The patch looks good, but we probably shouldn't change _readlink.. > > The chances of a link being >2k are pretty slim, and the chances of > > the link being >32k are even smaller... > > > Yea, I already thought of that, actually. However, see below w.r.t. > NFSv4. > > However, at this point I > mostly want to find out if it the long mbuf chain that causes problems > for TSO enabled network interfaces. I agree, though a long mbuf chain is more of a driver issue than an NFS issue... > > In fact, we might want to switch _readlink to MGET (could be > > conditional > > upon cnt) so that if it fits in an mbuf we don't allocate a cluster > > for > > it... > > > For NFSv4, what was an RPC for NFSv3 becomes one of several Ops. in > a compound RPC. As such, there is no way to know how much additional > RPC message there will be. So, although the readlink reply won't use > much of the 4K allocation, replies for subsequent Ops. in the compound > certainly could. (Is it more efficient to allocate 4K now and use > part of it for subsequent message reply stuff or allocate additional > mbuf clusters later for subsequent stuff, as required? On a small > memory constrained machine, I suspect the latter is correct, but for > the kind of hardware that has TSO scatter/gather enabled network > interfaces, I'm not so sure. At this point, I wouldn't even say > that using 4K clusters is going to be a win and my hunch is that > any win wouldn't apply to small memory constrained machines.) Though the code that was patched wasn't using any partial buffers, it was always allocating a new buffer... If the code in _read/_readlinks starts using a previous mbuf chain, then obviously things are different and I'd agree, always allocating a 2k/4k cluster makes sense... > My test server has 256Mbytes of ram and it certainly doesn't show > any improvement (big surprise;-), but it also doesn't show any > degradation for the limited testing I've done. I'm not too surprised, unless you're on a heavy server pushing >200MB/sec, the allocation cost is probably cheap enough that it doesn't show up... going to 4k means immediately half as many mbufs are needed/allocated, and as they are page sized, don't have the problems of physical memory fragmentation, nor do they have to do an IPI/tlb shoot down in the case of multipage allocations... (I'm dealing w/ this for geli.) > Again, my main interest at this point is whether reducing the > number of mbufs in the chain fixes the TSO issues. I think > the question of whether or not 4K clusters are performance > improvement in general, is an interesting one that comes later. Another thing I noticed is that we are getting an mbuf and then allocating a cluster... Is there a reason we aren't using something like m_getm or m_getcl? We have a special uma zone that has mbuf and mbuf cluster already paired meaning we save some lock operations for each segment allocated... -- John-Mark Gurney Voice: +1 415 225 5579 "All that I will do, has been done, All that I have, has not."