From owner-freebsd-fs@FreeBSD.ORG Sun Oct 14 02:18:29 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 54C92C48; Sun, 14 Oct 2012 02:18:29 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from esa-jnhn.mail.uoguelph.ca (esa-jnhn.mail.uoguelph.ca [131.104.91.44]) by mx1.freebsd.org (Postfix) with ESMTP id E4B078FC08; Sun, 14 Oct 2012 02:18:28 +0000 (UTC) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: Ap8EAN6MclCDaFvO/2dsb2JhbABFFoV7uhmCIAEBBAEjSwsbDgoCAg0ZAlkGE4d/BgumTJF3gSGKLhWEaYESA5VrgRWPGYMJgT88 X-IronPort-AV: E=Sophos;i="4.80,583,1344225600"; d="scan'208";a="183535966" Received: from erie.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca) ([131.104.91.206]) by esa-jnhn-pri.mail.uoguelph.ca with ESMTP; 13 Oct 2012 22:18:22 -0400 Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1]) by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id 489F0B4037; Sat, 13 Oct 2012 22:18:22 -0400 (EDT) Date: Sat, 13 Oct 2012 22:18:22 -0400 (EDT) From: Rick Macklem To: Ivan Voras Message-ID: <1472578725.2201172.1350181102240.JavaMail.root@erie.cs.uoguelph.ca> In-Reply-To: Subject: Re: NFS server bottlenecks MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Originating-IP: [172.17.91.201] X-Mailer: Zimbra 6.0.10_GA_2692 (ZimbraWebClient - IE7 (Win)/6.0.10_GA_2692) Cc: FS List X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 14 Oct 2012 02:18:29 -0000 Ivan Voras wrote: > On 13 October 2012 23:43, Rick Macklem wrote: > > > If, as you proposed, use separate LRU lists for each hash bucket, > > then > > how do you know if the least recently used for one hash backet isn't > > much more recently used than the least recently used for another > > hash > > bucket? (The hash code is using xid, which might be about the same > > for > > different clients at the same time.) > > I'm not that familiar with the code to judge: would that be a problem, > other than a (seemingly slight) loss of efficiency? > > Is there any other purpose to the LRU list except to help remove stale > entries? > I haven't done any real examination of how it works, but > looking at the code in: > > http://fxr.watson.org/fxr/source/fs/nfsserver/nfs_nfsdcache.c#L780 > > ... I don't see how the LRU property of the list actually helps > anything (I.e. - would the correctness of the code be damaged if this > was an orfinary list without the LRU property?) The concept behind the DRC is (published in Usenix long ago, the reference is in a comment in the code): - When NFS is run over UDP, the client will wait for a reply from the server with a timeout. When there is a timeout, the client will resend the RPC request. - If the timeout occurs because the server was slow to reply (due to heavy load or ???) or the reply was lost by the network, this retransmit of the RPC request would result in the RPC being re-done on the server. - for idempotent RPCs (like read), this increases load on the server - for non-idempotent RPCs, this can result in corrupted data - The DRC minimizes the likelyhood of this occurring, by caching replies for non-idempotent RPCs, so the server can reply from the cache instead of re-doing the RPC. As such, cached replies need to be cached long enough, so that it is unlikely that the server will be retrying the RPC. Unfortunately, there is no well defined time limit, since retry timeout and network delay varies for different clients. Therefore, the server wants to hold onto the cached reply as long as possible. This means that if you don't replace the least recently used cached reply, you make the DRC less effective. rick > > > ps: I hope you didn't mind me adding the mailing list. I'd like > > others to > > be able to comment/read the discussion. > > For the others to catch up, I was proposing this approach to Rick: > > http://people.freebsd.org/~ivoras/diffs/nfscache_lock.patch > > (this patch is far from being complete, it's just a sketch of an > idea). Basically, I'd like to break the global hash lock into > per-bucket locks and to break the global LRU list into per-bucket > lists, protected by the same locks.