Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 19 Jan 2012 18:01:56 +0200
From:      Kostik Belousov <kostikbel@gmail.com>
To:        John Baldwin <jhb@freebsd.org>
Cc:        Rick Macklem <rmacklem@freebsd.org>, fs@freebsd.org, Peter Wemm <peter@freebsd.org>
Subject:   Re: Race in NFS lookup can result in stale namecache entries
Message-ID:  <20120119160156.GF31224@deviant.kiev.zoral.com.ua>
In-Reply-To: <201201191026.09431.jhb@freebsd.org>
References:  <201201181707.21293.jhb@freebsd.org> <20120119140613.GD31224@deviant.kiev.zoral.com.ua> <201201191026.09431.jhb@freebsd.org>

next in thread | previous in thread | raw e-mail | index | archive | help

--3CtsHjCpq0rLy5Nm
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Thu, Jan 19, 2012 at 10:26:09AM -0500, John Baldwin wrote:
> On Thursday, January 19, 2012 9:06:13 am Kostik Belousov wrote:
> > On Wed, Jan 18, 2012 at 05:07:21PM -0500, John Baldwin wrote:
> > ...
> > > What I concluded is that it would really be far simpler and more
> > > obvious if the cached timestamps were stored in the namecache entry
> > > directly rather than having multiple name cache entries validated by
> > > shared state in the nfsnode.  This does mean allowing the name cache
> > > to hold some filesystem-specific state.  However, I felt this was much
> > > cleaner than adding a lot more complexity to nfs_lookup().  Also, this
> > > turns out to be fairly non-invasive to implement since nfs_lookup()
> > > calls cache_lookup() directly, but other filesystems only call it
> > > indirectly via vfs_cache_lookup().  I considered letting filesystems
> > > store a void * cookie in the name cache entry and having them provide
> > > a destructor, etc.  However, that would require extra allocations for
> > > NFS lookups.  Instead, I just adjusted the name cache API to
> > > explicitly allow the filesystem to store a single timestamp in a name
> > > cache entry by adding a new 'cache_enter_time()' that accepts a struct
> > > timespec that is copied into the entry.  'cache_enter_time()' also
> > > saves the current value of 'ticks' in the entry.  'cache_lookup()' is
> > > modified to add two new arguments used to return the timespec and
> > > ticks value used for a namecache entry when a hit in the cache occurs.
> > >=20
> > > One wrinkle with this is that the name cache does not create actual
> > > entries for ".", and thus it would not store any timestamps for those
> > > lookups.  To fix this I changed the NFS client to explicitly fast-path
> > > lookups of "." by always returning the current directory as setup by
> > > cache_lookup() and never bothering to do a LOOKUP or check for stale
> > > attributes in that case.
> > >=20
> > > The current patch against 8 is at
> > > http://www.FreeBSD.org/~jhb/patches/nfs_lookup.patch
> > ...
> >=20
> > So now you add 8*2+4 bytes to each namecache entry on amd64 uncondition=
ally.
> > Current size of the struct namecache invariant part on amd64 is 72 byte=
s,
> > so addition of 20 bytes looks slightly excessive. I am not sure about
> > typical distribution of the namecache nc_name length, so it is unobvious
> > does the change changes the memory usage significantly.
> >=20
> > A flag could be added to nc_flags to indicate the presence of timestamp.
> > The timestamps would be conditionally placed after nc_nlen, we probably
> > could use union to ease the access. Then, the direct dereferences of
> > nc_name would need to be converted to some inline function.
> >=20
> > I can do this after your patch is committed, if you consider the memory
> > usage saving worth it.
>=20
> Hmm, if the memory usage really is worrying then I could move to using the
> void * cookie method instead.

I think the current approach is better then cookie that again will be
used only for NFS. With the cookie, you still has 8 bytes for each ncp.
With union, you do not have the overhead for !NFS.

Default setup allows for ~300000 vnodes on not too powerful amd64 machine,
the ncsizefactor 2 together with 8 bytes for cookie is 4.5MB. For 20 bytes
per ncp, we get 12MB overhead.


--3CtsHjCpq0rLy5Nm
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (FreeBSD)

iEYEARECAAYFAk8YPnQACgkQC3+MBN1Mb4ipHwCeORnmBgA4rozRlEEWBgAErGj7
gWgAoJiA9rkUITvywvz3H+EyxYHH04ga
=riod
-----END PGP SIGNATURE-----

--3CtsHjCpq0rLy5Nm--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20120119160156.GF31224>