From owner-freebsd-hackers@FreeBSD.ORG Wed Oct 3 21:37:17 2012 Return-Path: Delivered-To: hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 22F30106564A; Wed, 3 Oct 2012 21:37:17 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from esa-jnhn.mail.uoguelph.ca (esa-jnhn.mail.uoguelph.ca [131.104.91.44]) by mx1.freebsd.org (Postfix) with ESMTP id 727FE8FC0A; Wed, 3 Oct 2012 21:37:15 +0000 (UTC) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: Ap8EAOCvbFCDaFvO/2dsb2JhbABFhg+5d4IgAQEBAwEBAQEgKyALGxgCAg0ZAikBCSYGCAcEARwBA4deBgulMZJhgSGKAhqFDoESA5M8gi2BFY8WgwmBRzQ X-IronPort-AV: E=Sophos;i="4.80,530,1344225600"; d="scan'208";a="181843957" Received: from erie.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca) ([131.104.91.206]) by esa-jnhn-pri.mail.uoguelph.ca with ESMTP; 03 Oct 2012 17:36:59 -0400 Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1]) by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id 34DE3B403A; Wed, 3 Oct 2012 17:36:59 -0400 (EDT) Date: Wed, 3 Oct 2012 17:36:59 -0400 (EDT) From: Rick Macklem To: Garrett Wollman Message-ID: <1666343702.1682678.1349300219198.JavaMail.root@erie.cs.uoguelph.ca> In-Reply-To: <20588.42788.103863.179701@hergotha.csail.mit.edu> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Originating-IP: [172.17.91.203] X-Mailer: Zimbra 6.0.10_GA_2692 (ZimbraWebClient - FF3.0 (Win)/6.0.10_GA_2692) Cc: freebsd-fs@freebsd.org, rmacklem@freebsd.org, hackers@freebsd.org Subject: Re: NFS server bottlenecks X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 03 Oct 2012 21:37:17 -0000 Garrett Wollman wrote: > < said: > > >> Simple: just use a sepatate mutex for each list that a cache entry > >> is on, rather than a global lock for everything. This would reduce > >> the mutex contention, but I'm not sure how significantly since I > >> don't have the means to measure it yet. > >> > > Well, since the cache trimming is removing entries from the lists, I > > don't > > see how that can be done with a global lock for list updates? > > Well, the global lock is what we have now, but the cache trimming > process only looks at one list at a time, so not locking the list that > isn't being iterated over probably wouldn't hurt, unless there's some > mechanism (that I didn't see) for entries to move from one list to > another. Note that I'm considering each hash bucket a separate > "list". (One issue to worry about in that case would be cache-line > contention in the array of hash buckets; perhaps NFSRVCACHE_HASHSIZE > ought to be increased to reduce that.) > Yea, a separate mutex for each hash list might help. There is also the LRU list that all entries end up on, that gets used by the trimming code. (I think? I wrote this stuff about 8 years ago, so I haven't looked at it in a while.) Also, increasing the hash table size is probably a good idea, especially if you reduce how aggressively the cache is trimmed. > > Only doing it once/sec would result in a very large cache when > > bursts of > > traffic arrives. > > My servers have 96 GB of memory so that's not a big deal for me. > This code was originally "production tested" on a server with 1Gbyte, so times have changed a bit;-) > > I'm not sure I see why doing it as a separate thread will improve > > things. > > There are N nfsd threads already (N can be bumped up to 256 if you > > wish) > > and having a bunch more "cache trimming threads" would just increase > > contention, wouldn't it? > > Only one cache-trimming thread. The cache trim holds the (global) > mutex for much longer than any individual nfsd service thread has any > need to, and having N threads doing that in parallel is why it's so > heavily contended. If there's only one thread doing the trim, then > the nfsd service threads aren't spending time either contending on the > mutex (it will be held less frequently and for shorter periods). > I think the little drc2.patch which will keep the nfsd threads from acquiring the mutex and doing the trimming most of the time, might be sufficient. I still don't see why a separate trimming thread will be an advantage. I'd also be worried that the one cache trimming thread won't get the job done soon enough. When I did production testing on a 1Gbyte server that saw a peak load of about 100RPCs/sec, it was necessary to trim aggressively. (Although I'd be tempted to say that a server with 1Gbyte is no longer relevant, I recently recall someone trying to run FreeBSD on a i486, although I doubt they wanted to run the nfsd on it.) > > The only negative effect I can think of w.r.t. having the nfsd > > threads doing it would be a (I believe negligible) increase in RPC > > response times (the time the nfsd thread spends trimming the cache). > > As noted, I think this time would be negligible compared to disk I/O > > and network transit times in the total RPC response time? > > With adaptive mutexes, many CPUs, lots of in-memory cache, and 10G > network connectivity, spinning on a contended mutex takes a > significant amount of CPU time. (For the current design of the NFS > server, it may actually be a win to turn off adaptive mutexes -- I > should give that a try once I'm able to do more testing.) > Have fun with it. Let me know when you have what you think is a good patch. rick > -GAWollman > _______________________________________________ > freebsd-hackers@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-hackers > To unsubscribe, send any mail to > "freebsd-hackers-unsubscribe@freebsd.org"