Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 15 Oct 2006 11:37:06 -0400
From:      "Chuck Lever" <chucklever@gmail.com>
To:        "Bruce Evans" <bde@zeta.org.au>
Cc:        cvs-src@freebsd.org, src-committers@freebsd.org, mjacob@freebsd.org, cvs-all@freebsd.org, Kris Kennaway <kris@obsecurity.org>
Subject:   Re: cvs commit: src/sys/nfsclient nfs_vnops.c
Message-ID:  <76bd70e30610150837w61689cf6ya2499d100a15c3e8@mail.gmail.com>
In-Reply-To: <20061015153454.G59979@delplex.bde.org>
References:  <200610140725.k9E7PC37008454@repoman.freebsd.org> <20061014231502.GA38708@rink.nu> <20061015105809.M59123@delplex.bde.org> <20061015051044.GA42764@xor.obsecurity.org> <20061014222221.H97880@ns1.feral.com> <20061014222437.N4701@ns1.feral.com> <20061015153454.G59979@delplex.bde.org>

next in thread | previous in thread | raw e-mail | index | archive | help
Hi Bruce-

On 10/15/06, Bruce Evans <bde@zeta.org.au> wrote:
> My previous mail more or less explained why rc.conf began setting it to
> 2 in 1998:  It didn't exist before then, so it was initially set to a
> conservative default of 2.  Only the mount options for the _attribute_
> cache existed before then.  rc.conf and fstab never had any special
> support for these, so I think rc.conf shouldn't have any special support
> for the _access_ cache timeout (it now defaults to setting it to its
> kernel default value).

I agree with this... other clients I'm familiar with don't tune the
access cache timeout separately from the attribute cache.

> I just noticed even sillier configuration bogusness for the default
> attribute cache timeouts:
> - in 1994, the default timeouts were ifdefed.  I think there was no
>    other (easy) way to change the timeouts.
> - in May 1998, mount_nfs started supporting setting the timeouts per-mount.
> - in June 1998, 6 weeks after 1994 hack became unnecesary, the timeouts
>    were turned into first class kernel options (put in an options header).
>
> I just got around to looking at some nfs RFCs.  nfs4 has a lot to say
> about the problem of too many RPCs.  nfs3 has less to say.

What you're discussing here is known as "close-to-open cache
consistency" and is not discussed thoroughly in the NFS RFCs.  A
better source for learning about expected behavior is Callaghan's "NFS
Illustrated."

The idea is that RPCs can be avoided because we can assume that most
files are shared serially.  Application A on client 1 opens a file,
writes to it, then closes it.  Application A on client 2 opens the
same file, reads from it, then closes it.  Thus we want client 1 to
flush the changes made by Application A on close(), and client 2 to
check for changes when Application A opens the same file.

This is in fact what Solaris (the NFS reference implementation) does,
and most good clients have followed suit.

We usually simplify this to "GETATTR on open(2), flush on close(2)".
It is somewhat more complicated in the face of asynchronous writes --
after a file is closed and all the writes are flushed, another GETATTR
is required to snapshot the attributes *after* all the async writes
have completed.  That way, another GETATTR on the next open can tell
if the file was subsequently changed again by another client.

> I think I now know how to fix the second largest source of extra RPCs
> properly:
>
> We used clear the _attribute_ cache (and the access cache as a side
> effect?) at the _end_ of nfs_open(), but we are supposed to clear it
> at the _start_ of every open().  We can't do the latter properly because
> a fresh set of attributes are needed or at least preferred when
> nfs_lookup() is called before nfs_open().

The fact that open is not an atomic operation in the VFS layer is
problematic.  You have to consider how the VFS will recover in the
case where the client believes a file is there because it has just
been accessing it, but it was just removed by another client.

One reason for the "GETATTR on open" is to make sure the file being
opened is the same file as what may be cached.  I think FreeBSD does
this by completely clearing the attribute cache on each open?  Kind of
not efficient computationally, since that implies you are never using
your cache.

> Clearing the attribute cache
> at the end of nfs_open() seems to be just a hack which gives a good
> chance of the clearing living until the next open().  It doesn't always
> work.  Clearing the cache in nfs_close() is further from always working.
> It fails if the file is re-open()ed before the first open() instance
> is close()d, and wasn't done.
>
> Now we clear the attribute cache at the start of nfs_open() and clear
> it in nfs_close().  For simple open-close sequences, this gives 2
> clearings and 2 refreshes.  First, nfs_lookup() refreshes; then
> nfs_open() clears and refreshes; finally, nfs_close() clears.

Again, I'm not sure why it is necessary to clear the attribute cache,
when simply verifying that the file object remains the same is all
that is necessary.

> I think the correct refreshing is:
> - force a clear in nfs_lookup() only if we are sure that the lookup is for
>    open().  Then refresh normally (if we just cleared or the cache was
>    already clear).  Implement this using a namei() flag?

Lookup intents are typical of most implementations.  If the lookup is
done via a local cache (for example, if the client is using
READDIRPLUS, or has done the lookup recently and is relying on a
cache), the parent's attributes should also be verified at this point
to ensure the parent has not been modified (ie that the cached lookup
is still valid).

> - force a clear in nfs_open() only if we aren't sure that nfs_lookup()
>    didn't already do it.  This is for safety in cases like core dumps
>    where namei() isn't called by open(2) and we forget to tell namei()
>    that the lookup is for open().  Then refresh normally.  Implement this
>    using a generation count or another flag?

I think you also need to clear in the case where the file being opened
is different that what is in the client's cache (the case where
another client has replaced the file).

You haven't spelled out how the access cache should behave in these
cases.  Usually the access cache is treated identically to the
attribute cache.

-- 
"We who cut mere stones must always be envisioning cathedrals"
   -- Quarry worker's creed



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?76bd70e30610150837w61689cf6ya2499d100a15c3e8>