From owner-freebsd-fs@FreeBSD.ORG Thu Jan 19 15:50:48 2012 Return-Path: Delivered-To: fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 41055106567A; Thu, 19 Jan 2012 15:50:48 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from cyrus.watson.org (cyrus.watson.org [65.122.17.42]) by mx1.freebsd.org (Postfix) with ESMTP id 129288FC08; Thu, 19 Jan 2012 15:50:48 +0000 (UTC) Received: from bigwig.baldwin.cx (bigwig.baldwin.cx [96.47.65.170]) by cyrus.watson.org (Postfix) with ESMTPSA id BF44C46B0D; Thu, 19 Jan 2012 10:50:47 -0500 (EST) Received: from jhbbsd.localnet (unknown [209.249.190.124]) by bigwig.baldwin.cx (Postfix) with ESMTPSA id 1DF82B99C; Thu, 19 Jan 2012 10:50:47 -0500 (EST) From: John Baldwin To: Kostik Belousov Date: Thu, 19 Jan 2012 10:26:09 -0500 User-Agent: KMail/1.13.5 (FreeBSD/8.2-CBSD-20110714-p10; KDE/4.5.5; amd64; ; ) References: <201201181707.21293.jhb@freebsd.org> <20120119140613.GD31224@deviant.kiev.zoral.com.ua> In-Reply-To: <20120119140613.GD31224@deviant.kiev.zoral.com.ua> MIME-Version: 1.0 Content-Type: Text/Plain; charset="iso-8859-15" Content-Transfer-Encoding: 7bit Message-Id: <201201191026.09431.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.7 (bigwig.baldwin.cx); Thu, 19 Jan 2012 10:50:47 -0500 (EST) Cc: Rick Macklem , fs@freebsd.org, Peter Wemm Subject: Re: Race in NFS lookup can result in stale namecache entries X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 19 Jan 2012 15:50:48 -0000 On Thursday, January 19, 2012 9:06:13 am Kostik Belousov wrote: > On Wed, Jan 18, 2012 at 05:07:21PM -0500, John Baldwin wrote: > ... > > What I concluded is that it would really be far simpler and more > > obvious if the cached timestamps were stored in the namecache entry > > directly rather than having multiple name cache entries validated by > > shared state in the nfsnode. This does mean allowing the name cache > > to hold some filesystem-specific state. However, I felt this was much > > cleaner than adding a lot more complexity to nfs_lookup(). Also, this > > turns out to be fairly non-invasive to implement since nfs_lookup() > > calls cache_lookup() directly, but other filesystems only call it > > indirectly via vfs_cache_lookup(). I considered letting filesystems > > store a void * cookie in the name cache entry and having them provide > > a destructor, etc. However, that would require extra allocations for > > NFS lookups. Instead, I just adjusted the name cache API to > > explicitly allow the filesystem to store a single timestamp in a name > > cache entry by adding a new 'cache_enter_time()' that accepts a struct > > timespec that is copied into the entry. 'cache_enter_time()' also > > saves the current value of 'ticks' in the entry. 'cache_lookup()' is > > modified to add two new arguments used to return the timespec and > > ticks value used for a namecache entry when a hit in the cache occurs. > > > > One wrinkle with this is that the name cache does not create actual > > entries for ".", and thus it would not store any timestamps for those > > lookups. To fix this I changed the NFS client to explicitly fast-path > > lookups of "." by always returning the current directory as setup by > > cache_lookup() and never bothering to do a LOOKUP or check for stale > > attributes in that case. > > > > The current patch against 8 is at > > http://www.FreeBSD.org/~jhb/patches/nfs_lookup.patch > ... > > So now you add 8*2+4 bytes to each namecache entry on amd64 unconditionally. > Current size of the struct namecache invariant part on amd64 is 72 bytes, > so addition of 20 bytes looks slightly excessive. I am not sure about > typical distribution of the namecache nc_name length, so it is unobvious > does the change changes the memory usage significantly. > > A flag could be added to nc_flags to indicate the presence of timestamp. > The timestamps would be conditionally placed after nc_nlen, we probably > could use union to ease the access. Then, the direct dereferences of > nc_name would need to be converted to some inline function. > > I can do this after your patch is committed, if you consider the memory > usage saving worth it. Hmm, if the memory usage really is worrying then I could move to using the void * cookie method instead. -- John Baldwin