From owner-freebsd-hackers@FreeBSD.ORG Fri Jun 20 02:07:32 2003 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id B595437B401 for ; Fri, 20 Jun 2003 02:07:32 -0700 (PDT) Received: from heron.mail.pas.earthlink.net (heron.mail.pas.earthlink.net [207.217.120.189]) by mx1.FreeBSD.org (Postfix) with ESMTP id 1FCF343F3F for ; Fri, 20 Jun 2003 02:07:32 -0700 (PDT) (envelope-from tlambert2@mindspring.com) Received: from user-uinj93o.dialup.mindspring.com ([165.121.164.120] helo=mindspring.com) by heron.mail.pas.earthlink.net with asmtp (SSLv3:RC4-MD5:128) (Exim 3.33 #1) id 19THrk-0001Fg-00; Fri, 20 Jun 2003 02:07:29 -0700 Message-ID: <3EF2CDF0.6014ACB6@mindspring.com> Date: Fri, 20 Jun 2003 02:03:44 -0700 From: Terry Lambert X-Mailer: Mozilla 4.79 [en] (Win98; U) X-Accept-Language: en MIME-Version: 1.0 To: Andrey Alekseyev References: <200306200705.LAA00432@slt.oz> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-ELNK-Trace: b1a02af9316fbb217a47c185c03b154d40683398e744b8a44034bf79371c0c1e247944e9f4ddb226350badd9bab72f9c350badd9bab72f9c350badd9bab72f9c cc: freebsd-hackers@freebsd.org Subject: Re: open() and ESTALE error X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 20 Jun 2003 09:07:33 -0000 Andrey Alekseyev wrote: > Terry, > > Thanks much for you comments, but see below. > > > The real problem here is that you know you did an operation > > on the file which would break the name/nfsnode relationship, > > but did not flush the cached name and nfsnode data. > > nfs_request() actually calls cache_purge() on ESTALE, and vn_open() > frees vnode with vput() if a lookup was successful but there were > an error from the underlying filesystem (like ESTALE resulting from > nfs_request() which is eventually called from VOP_ACCESS or VOP_OPEN). The place to correct this is probably the underlying FS. I'd argue that getting ESTALE is a poke with a sharp stick that makes this more likely to happen. ;^). > > A more correct solution would resync the nfsnode. > > I think this is exactly what happens :) Actually, I believe, I'm just > getting another namecache entry with another vnode/nfsnode/file handle. You can't have this for other reasons; specifically, if you have the file open at th time of the rename, and it becomes a ".#nfs..." file (or whatever) on the server. > > The main problem with your solution is that it doesn't work > > in the case that you don't know the name of the remote file > > (in which case, all you really have is a stale file handle, > > with no way to unstale it). > > I think, in this case (if the file was rm'd on the server), I'll just > get ENOENT from the second vn_open() attempt, which would be more > than appropriate. A real drawback is that for a stale "current" > directory it'll take another lookup to detect "true" ESTALE. This is more a problem is the ESTALE handling. In the case where you are doing a lookup, and get an ESTALE, it's probably correct to translateit based on the semantics you are expecting in the upper layer. The problem here is that a given VOP can be called from multiple system call implementations, and a given system call implementation can call multiple VOPs to implement its functionality. This means that you'd have to model the system call later state machine within the filesystem itselt in order to return the "expected" error for every possible case. This isn't a reasonable thing to expect. > > This would fix a lot more cases than the single failure you > > are fixing. > > Actually, as I said, I played with different parts of the code to solve > this (including, nfs_open(), nfs_access(), nfs_lookup() and vn_open()) > only to find the previously mentioned solution to be the simpliest and > most suitable for all situations (for me!) :) Don Lewis has a good posting in response to you; you will likely have read it before you read this response, so fee free to not respond directly to this point. Don points out that Solaris tries to fix this via the "noac" mount option for client NFS. What his quote: noac Suppress data and attribute caching. The data caching that is suppressed is the write-behind. The local page cache is still maintained, but data copied into it is immediately written to the server. hints at, but doesn't come right out and say, is that the cache is flushed on write operations ("the data caching that is suppressed is write-behind"). What this means practically, in terms of the implementation of the NFS client code, is that everywhere there is a client triggered change of state for metadata in the server that could result in an ESTALE, the client cached information is flushed out and has to be reacquired. If this were happening in the NFS client today, then your rename would not end up giving you an ESTALE, because the stale data would have been discarded. I'd also like to point out the following case: { A, B } fd1 open on B rename B -> C rename A -> B In this case, the FH in question would still work for B. What would happen if it were: { A, B, C } fd1 open on B fd2 open on C rename B -> C rename A -> B ? With your patch, I think we would potentially convert fd2 to point to B whien it really *should* be "ESTALE", which is wrong (think in terms of 2 or more clients doing the operations). -- Terry