From owner-cvs-all@FreeBSD.ORG Sun Oct 15 15:37:25 2006 Return-Path: X-Original-To: cvs-all@freebsd.org Delivered-To: cvs-all@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 6B38B16A40F for ; Sun, 15 Oct 2006 15:37:25 +0000 (UTC) (envelope-from chucklever@gmail.com) Received: from nf-out-0910.google.com (nf-out-0910.google.com [64.233.182.188]) by mx1.FreeBSD.org (Postfix) with ESMTP id A67C543D8C for ; Sun, 15 Oct 2006 15:37:07 +0000 (GMT) (envelope-from chucklever@gmail.com) Received: by nf-out-0910.google.com with SMTP id p77so1553483nfc for ; Sun, 15 Oct 2006 08:37:06 -0700 (PDT) DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=beta; d=gmail.com; h=received:message-id:date:from:to:subject:cc:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references; b=l2sxP3fTpOwotob+SwcfmjS4TqaHSJe6749jpgyevIouoziwE4K7ZMID4pxdttD5ZxeHIO4z5lZPWeuZlRemZl3qc/ahegdKDrdcgqnvrXO+ePQY0P8pRJobsb7hzYHsEd/Z3JeNbqiPuRzRiiE+YIhu8KHzHkxR3JNV4iUUCg4= Received: by 10.78.90.10 with SMTP id n10mr6497210hub; Sun, 15 Oct 2006 08:37:06 -0700 (PDT) Received: by 10.78.200.15 with HTTP; Sun, 15 Oct 2006 08:37:06 -0700 (PDT) Message-ID: <76bd70e30610150837w61689cf6ya2499d100a15c3e8@mail.gmail.com> Date: Sun, 15 Oct 2006 11:37:06 -0400 From: "Chuck Lever" To: "Bruce Evans" In-Reply-To: <20061015153454.G59979@delplex.bde.org> MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Content-Disposition: inline References: <200610140725.k9E7PC37008454@repoman.freebsd.org> <20061014231502.GA38708@rink.nu> <20061015105809.M59123@delplex.bde.org> <20061015051044.GA42764@xor.obsecurity.org> <20061014222221.H97880@ns1.feral.com> <20061014222437.N4701@ns1.feral.com> <20061015153454.G59979@delplex.bde.org> Cc: cvs-src@freebsd.org, src-committers@freebsd.org, mjacob@freebsd.org, cvs-all@freebsd.org, Kris Kennaway Subject: Re: cvs commit: src/sys/nfsclient nfs_vnops.c X-BeenThere: cvs-all@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: CVS commit messages for the entire tree List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 15 Oct 2006 15:37:25 -0000 Hi Bruce- On 10/15/06, Bruce Evans wrote: > My previous mail more or less explained why rc.conf began setting it to > 2 in 1998: It didn't exist before then, so it was initially set to a > conservative default of 2. Only the mount options for the _attribute_ > cache existed before then. rc.conf and fstab never had any special > support for these, so I think rc.conf shouldn't have any special support > for the _access_ cache timeout (it now defaults to setting it to its > kernel default value). I agree with this... other clients I'm familiar with don't tune the access cache timeout separately from the attribute cache. > I just noticed even sillier configuration bogusness for the default > attribute cache timeouts: > - in 1994, the default timeouts were ifdefed. I think there was no > other (easy) way to change the timeouts. > - in May 1998, mount_nfs started supporting setting the timeouts per-mount. > - in June 1998, 6 weeks after 1994 hack became unnecesary, the timeouts > were turned into first class kernel options (put in an options header). > > I just got around to looking at some nfs RFCs. nfs4 has a lot to say > about the problem of too many RPCs. nfs3 has less to say. What you're discussing here is known as "close-to-open cache consistency" and is not discussed thoroughly in the NFS RFCs. A better source for learning about expected behavior is Callaghan's "NFS Illustrated." The idea is that RPCs can be avoided because we can assume that most files are shared serially. Application A on client 1 opens a file, writes to it, then closes it. Application A on client 2 opens the same file, reads from it, then closes it. Thus we want client 1 to flush the changes made by Application A on close(), and client 2 to check for changes when Application A opens the same file. This is in fact what Solaris (the NFS reference implementation) does, and most good clients have followed suit. We usually simplify this to "GETATTR on open(2), flush on close(2)". It is somewhat more complicated in the face of asynchronous writes -- after a file is closed and all the writes are flushed, another GETATTR is required to snapshot the attributes *after* all the async writes have completed. That way, another GETATTR on the next open can tell if the file was subsequently changed again by another client. > I think I now know how to fix the second largest source of extra RPCs > properly: > > We used clear the _attribute_ cache (and the access cache as a side > effect?) at the _end_ of nfs_open(), but we are supposed to clear it > at the _start_ of every open(). We can't do the latter properly because > a fresh set of attributes are needed or at least preferred when > nfs_lookup() is called before nfs_open(). The fact that open is not an atomic operation in the VFS layer is problematic. You have to consider how the VFS will recover in the case where the client believes a file is there because it has just been accessing it, but it was just removed by another client. One reason for the "GETATTR on open" is to make sure the file being opened is the same file as what may be cached. I think FreeBSD does this by completely clearing the attribute cache on each open? Kind of not efficient computationally, since that implies you are never using your cache. > Clearing the attribute cache > at the end of nfs_open() seems to be just a hack which gives a good > chance of the clearing living until the next open(). It doesn't always > work. Clearing the cache in nfs_close() is further from always working. > It fails if the file is re-open()ed before the first open() instance > is close()d, and wasn't done. > > Now we clear the attribute cache at the start of nfs_open() and clear > it in nfs_close(). For simple open-close sequences, this gives 2 > clearings and 2 refreshes. First, nfs_lookup() refreshes; then > nfs_open() clears and refreshes; finally, nfs_close() clears. Again, I'm not sure why it is necessary to clear the attribute cache, when simply verifying that the file object remains the same is all that is necessary. > I think the correct refreshing is: > - force a clear in nfs_lookup() only if we are sure that the lookup is for > open(). Then refresh normally (if we just cleared or the cache was > already clear). Implement this using a namei() flag? Lookup intents are typical of most implementations. If the lookup is done via a local cache (for example, if the client is using READDIRPLUS, or has done the lookup recently and is relying on a cache), the parent's attributes should also be verified at this point to ensure the parent has not been modified (ie that the cached lookup is still valid). > - force a clear in nfs_open() only if we aren't sure that nfs_lookup() > didn't already do it. This is for safety in cases like core dumps > where namei() isn't called by open(2) and we forget to tell namei() > that the lookup is for open(). Then refresh normally. Implement this > using a generation count or another flag? I think you also need to clear in the case where the file being opened is different that what is in the client's cache (the case where another client has replaced the file). You haven't spelled out how the access cache should behave in these cases. Usually the access cache is treated identically to the attribute cache. -- "We who cut mere stones must always be envisioning cathedrals" -- Quarry worker's creed