Date: Thu, 19 Jan 2012 11:17:28 -0500 From: John Baldwin <jhb@freebsd.org> To: Kostik Belousov <kostikbel@gmail.com> Cc: Rick Macklem <rmacklem@freebsd.org>, fs@freebsd.org, Peter Wemm <peter@freebsd.org> Subject: Re: Race in NFS lookup can result in stale namecache entries Message-ID: <201201191117.28128.jhb@freebsd.org> In-Reply-To: <20120119160156.GF31224@deviant.kiev.zoral.com.ua> References: <201201181707.21293.jhb@freebsd.org> <201201191026.09431.jhb@freebsd.org> <20120119160156.GF31224@deviant.kiev.zoral.com.ua>
next in thread | previous in thread | raw e-mail | index | archive | help
On Thursday, January 19, 2012 11:01:56 am Kostik Belousov wrote: > On Thu, Jan 19, 2012 at 10:26:09AM -0500, John Baldwin wrote: > > On Thursday, January 19, 2012 9:06:13 am Kostik Belousov wrote: > > > On Wed, Jan 18, 2012 at 05:07:21PM -0500, John Baldwin wrote: > > > ... > > > > What I concluded is that it would really be far simpler and more > > > > obvious if the cached timestamps were stored in the namecache entry > > > > directly rather than having multiple name cache entries validated by > > > > shared state in the nfsnode. This does mean allowing the name cache > > > > to hold some filesystem-specific state. However, I felt this was much > > > > cleaner than adding a lot more complexity to nfs_lookup(). Also, this > > > > turns out to be fairly non-invasive to implement since nfs_lookup() > > > > calls cache_lookup() directly, but other filesystems only call it > > > > indirectly via vfs_cache_lookup(). I considered letting filesystems > > > > store a void * cookie in the name cache entry and having them provide > > > > a destructor, etc. However, that would require extra allocations for > > > > NFS lookups. Instead, I just adjusted the name cache API to > > > > explicitly allow the filesystem to store a single timestamp in a name > > > > cache entry by adding a new 'cache_enter_time()' that accepts a struct > > > > timespec that is copied into the entry. 'cache_enter_time()' also > > > > saves the current value of 'ticks' in the entry. 'cache_lookup()' is > > > > modified to add two new arguments used to return the timespec and > > > > ticks value used for a namecache entry when a hit in the cache occurs. > > > > > > > > One wrinkle with this is that the name cache does not create actual > > > > entries for ".", and thus it would not store any timestamps for those > > > > lookups. To fix this I changed the NFS client to explicitly fast-path > > > > lookups of "." by always returning the current directory as setup by > > > > cache_lookup() and never bothering to do a LOOKUP or check for stale > > > > attributes in that case. > > > > > > > > The current patch against 8 is at > > > > http://www.FreeBSD.org/~jhb/patches/nfs_lookup.patch > > > ... > > > > > > So now you add 8*2+4 bytes to each namecache entry on amd64 unconditionally. > > > Current size of the struct namecache invariant part on amd64 is 72 bytes, > > > so addition of 20 bytes looks slightly excessive. I am not sure about > > > typical distribution of the namecache nc_name length, so it is unobvious > > > does the change changes the memory usage significantly. > > > > > > A flag could be added to nc_flags to indicate the presence of timestamp. > > > The timestamps would be conditionally placed after nc_nlen, we probably > > > could use union to ease the access. Then, the direct dereferences of > > > nc_name would need to be converted to some inline function. > > > > > > I can do this after your patch is committed, if you consider the memory > > > usage saving worth it. > > > > Hmm, if the memory usage really is worrying then I could move to using the > > void * cookie method instead. > > I think the current approach is better then cookie that again will be > used only for NFS. With the cookie, you still has 8 bytes for each ncp. > With union, you do not have the overhead for !NFS. > > Default setup allows for ~300000 vnodes on not too powerful amd64 machine, > the ncsizefactor 2 together with 8 bytes for cookie is 4.5MB. For 20 bytes > per ncp, we get 12MB overhead. Ok. If you want to tackle the union bits I'm happy to let you do so. That will at least break up the changes a bit. -- John Baldwin
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?201201191117.28128.jhb>