From owner-freebsd-fs@FreeBSD.ORG  Wed Oct  3 20:59:17 2012
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
	by hub.freebsd.org (Postfix) with ESMTP id D5891106564A;
	Wed,  3 Oct 2012 20:59:17 +0000 (UTC)
	(envelope-from wollman@hergotha.csail.mit.edu)
Received: from hergotha.csail.mit.edu
	(wollman-1-pt.tunnel.tserv4.nyc4.ipv6.he.net
	[IPv6:2001:470:1f06:ccb::2])
	by mx1.freebsd.org (Postfix) with ESMTP id 802858FC1C;
	Wed,  3 Oct 2012 20:59:17 +0000 (UTC)
Received: from hergotha.csail.mit.edu (localhost [127.0.0.1])
	by hergotha.csail.mit.edu (8.14.5/8.14.5) with ESMTP id q93KxGTY061139; 
	Wed, 3 Oct 2012 16:59:16 -0400 (EDT)
	(envelope-from wollman@hergotha.csail.mit.edu)
Received: (from wollman@localhost)
	by hergotha.csail.mit.edu (8.14.5/8.14.4/Submit) id q93KxG4D061136;
	Wed, 3 Oct 2012 16:59:16 -0400 (EDT) (envelope-from wollman)
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Message-ID: <20588.42788.103863.179701@hergotha.csail.mit.edu>
Date: Wed, 3 Oct 2012 16:59:16 -0400
From: Garrett Wollman <wollman@freebsd.org>
To: Rick Macklem <rmacklem@uoguelph.ca>
In-Reply-To: <1571646304.1630985.1349270466529.JavaMail.root@erie.cs.uoguelph.ca>
References: <20587.47363.504969.926603@hergotha.csail.mit.edu>
	<1571646304.1630985.1349270466529.JavaMail.root@erie.cs.uoguelph.ca>
X-Mailer: VM 7.17 under 21.4 (patch 22) "Instant Classic" XEmacs Lucid
X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.2.7
	(hergotha.csail.mit.edu [127.0.0.1]);
	Wed, 03 Oct 2012 16:59:16 -0400 (EDT)
X-Spam-Status: No, score=-1.0 required=5.0 tests=ALL_TRUSTED
	autolearn=disabled version=3.3.2
X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on
	hergotha.csail.mit.edu
Cc: freebsd-fs@freebsd.org, rmacklem@freebsd.org, hackers@freebsd.org
Subject: Re: NFS server bottlenecks
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 03 Oct 2012 20:59:18 -0000

<<On Wed, 3 Oct 2012 09:21:06 -0400 (EDT), Rick Macklem <rmacklem@uoguelph.ca> said:

>> Simple: just use a sepatate mutex for each list that a cache entry
>> is on, rather than a global lock for everything. This would reduce
>> the mutex contention, but I'm not sure how significantly since I
>> don't have the means to measure it yet.
>> 
> Well, since the cache trimming is removing entries from the lists, I don't
> see how that can be done with a global lock for list updates?

Well, the global lock is what we have now, but the cache trimming
process only looks at one list at a time, so not locking the list that
isn't being iterated over probably wouldn't hurt, unless there's some
mechanism (that I didn't see) for entries to move from one list to
another.  Note that I'm considering each hash bucket a separate
"list".  (One issue to worry about in that case would be cache-line
contention in the array of hash buckets; perhaps NFSRVCACHE_HASHSIZE
ought to be increased to reduce that.)

> Only doing it once/sec would result in a very large cache when bursts of
> traffic arrives.

My servers have 96 GB of memory so that's not a big deal for me.

> I'm not sure I see why doing it as a separate thread will improve things.
> There are N nfsd threads already (N can be bumped up to 256 if you wish)
> and having a bunch more "cache trimming threads" would just increase
> contention, wouldn't it?

Only one cache-trimming thread.  The cache trim holds the (global)
mutex for much longer than any individual nfsd service thread has any
need to, and having N threads doing that in parallel is why it's so
heavily contended.  If there's only one thread doing the trim, then
the nfsd service threads aren't spending time either contending on the
mutex (it will be held less frequently and for shorter periods).

> The only negative effect I can think of w.r.t.  having the nfsd
> threads doing it would be a (I believe negligible) increase in RPC
> response times (the time the nfsd thread spends trimming the cache).
> As noted, I think this time would be negligible compared to disk I/O
> and network transit times in the total RPC response time?

With adaptive mutexes, many CPUs, lots of in-memory cache, and 10G
network connectivity, spinning on a contended mutex takes a
significant amount of CPU time.  (For the current design of the NFS
server, it may actually be a win to turn off adaptive mutexes -- I
should give that a try once I'm able to do more testing.)

-GAWollman