From owner-freebsd-fs@FreeBSD.ORG Wed Oct 3 20:59:17 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id D5891106564A; Wed, 3 Oct 2012 20:59:17 +0000 (UTC) (envelope-from wollman@hergotha.csail.mit.edu) Received: from hergotha.csail.mit.edu (wollman-1-pt.tunnel.tserv4.nyc4.ipv6.he.net [IPv6:2001:470:1f06:ccb::2]) by mx1.freebsd.org (Postfix) with ESMTP id 802858FC1C; Wed, 3 Oct 2012 20:59:17 +0000 (UTC) Received: from hergotha.csail.mit.edu (localhost [127.0.0.1]) by hergotha.csail.mit.edu (8.14.5/8.14.5) with ESMTP id q93KxGTY061139; Wed, 3 Oct 2012 16:59:16 -0400 (EDT) (envelope-from wollman@hergotha.csail.mit.edu) Received: (from wollman@localhost) by hergotha.csail.mit.edu (8.14.5/8.14.4/Submit) id q93KxG4D061136; Wed, 3 Oct 2012 16:59:16 -0400 (EDT) (envelope-from wollman) MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-ID: <20588.42788.103863.179701@hergotha.csail.mit.edu> Date: Wed, 3 Oct 2012 16:59:16 -0400 From: Garrett Wollman To: Rick Macklem In-Reply-To: <1571646304.1630985.1349270466529.JavaMail.root@erie.cs.uoguelph.ca> References: <20587.47363.504969.926603@hergotha.csail.mit.edu> <1571646304.1630985.1349270466529.JavaMail.root@erie.cs.uoguelph.ca> X-Mailer: VM 7.17 under 21.4 (patch 22) "Instant Classic" XEmacs Lucid X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.2.7 (hergotha.csail.mit.edu [127.0.0.1]); Wed, 03 Oct 2012 16:59:16 -0400 (EDT) X-Spam-Status: No, score=-1.0 required=5.0 tests=ALL_TRUSTED autolearn=disabled version=3.3.2 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on hergotha.csail.mit.edu Cc: freebsd-fs@freebsd.org, rmacklem@freebsd.org, hackers@freebsd.org Subject: Re: NFS server bottlenecks X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 03 Oct 2012 20:59:18 -0000 < said: >> Simple: just use a sepatate mutex for each list that a cache entry >> is on, rather than a global lock for everything. This would reduce >> the mutex contention, but I'm not sure how significantly since I >> don't have the means to measure it yet. >> > Well, since the cache trimming is removing entries from the lists, I don't > see how that can be done with a global lock for list updates? Well, the global lock is what we have now, but the cache trimming process only looks at one list at a time, so not locking the list that isn't being iterated over probably wouldn't hurt, unless there's some mechanism (that I didn't see) for entries to move from one list to another. Note that I'm considering each hash bucket a separate "list". (One issue to worry about in that case would be cache-line contention in the array of hash buckets; perhaps NFSRVCACHE_HASHSIZE ought to be increased to reduce that.) > Only doing it once/sec would result in a very large cache when bursts of > traffic arrives. My servers have 96 GB of memory so that's not a big deal for me. > I'm not sure I see why doing it as a separate thread will improve things. > There are N nfsd threads already (N can be bumped up to 256 if you wish) > and having a bunch more "cache trimming threads" would just increase > contention, wouldn't it? Only one cache-trimming thread. The cache trim holds the (global) mutex for much longer than any individual nfsd service thread has any need to, and having N threads doing that in parallel is why it's so heavily contended. If there's only one thread doing the trim, then the nfsd service threads aren't spending time either contending on the mutex (it will be held less frequently and for shorter periods). > The only negative effect I can think of w.r.t. having the nfsd > threads doing it would be a (I believe negligible) increase in RPC > response times (the time the nfsd thread spends trimming the cache). > As noted, I think this time would be negligible compared to disk I/O > and network transit times in the total RPC response time? With adaptive mutexes, many CPUs, lots of in-memory cache, and 10G network connectivity, spinning on a contended mutex takes a significant amount of CPU time. (For the current design of the NFS server, it may actually be a win to turn off adaptive mutexes -- I should give that a try once I'm able to do more testing.) -GAWollman