From owner-freebsd-hackers@FreeBSD.ORG Fri Jun 20 13:39:19 2003 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 9757837B401 for ; Fri, 20 Jun 2003 13:39:19 -0700 (PDT) Received: from gw.catspoiler.org (217-ip-163.nccn.net [209.79.217.163]) by mx1.FreeBSD.org (Postfix) with ESMTP id 8DCD543F93 for ; Fri, 20 Jun 2003 13:39:18 -0700 (PDT) (envelope-from truckman@FreeBSD.org) Received: from FreeBSD.org (mousie.catspoiler.org [192.168.101.2]) by gw.catspoiler.org (8.12.9/8.12.9) with ESMTP id h5KKd5M7060679; Fri, 20 Jun 2003 13:39:10 -0700 (PDT) (envelope-from truckman@FreeBSD.org) Message-Id: <200306202039.h5KKd5M7060679@gw.catspoiler.org> Date: Fri, 20 Jun 2003 13:39:05 -0700 (PDT) From: Don Lewis To: uitm@blackflag.ru In-Reply-To: <200306201835.WAA00763@slt.oz> MIME-Version: 1.0 Content-Type: TEXT/plain; charset=us-ascii cc: freebsd-hackers@FreeBSD.org Subject: Re: open() and ESTALE error X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 20 Jun 2003 20:39:19 -0000 On 20 Jun, Andrey Alekseyev wrote: > Don, > >> One case where there is a difference between timing out old file handles >> and just invalidating them on ESTALE: > > Frankly, I just didn't find any mechanism in the STABLE kernel that > does "timing out" for file handles. Do you mean, it would be nice to have > it or are you trying to point it out to me? ;-P If there isn't such a mechanism, there should be. >> client% cmd1 > file1; cmd2 > file2 >> server% mv file1 tmpfile; mv file2 file1; mv tmpfile file1 >> >> wait an hour >> >> client% cat /dev/null > file1 >> >> If file handles are cached indefinitely, and the client didn't recycle >> the vnode for file1, which file on the server got truncated? Since >> neither file was deleted on the server, you can't rely on ESTALE to >> detect this situation. > > Eh, but the generation number for file1 should have been changed! This will > result in a definite ESTALE error for file1 from the server. That is, I > believe that if you attempt to open("file1", O_CREAT) after an hour, you'll > get ESTALE from the server (on which nfs_request() will invalidate "file1" > namecache entry and vnode+nfsnode+old-file-handle) and the second vn_open() > will re-lookup file1 and get a valid new file handle. If the client still has a cached copy of the file handle for file1, won't it just use that and truncate file2 on the server? The handle never doesn't stale because the file was never deleted on the server. > Actually, this is what indeed happens if the second open() comes from the > userland application :) I'm just trying to eliminate the need of modifying > a generic application. > > For my example with moves, the next "cat" will always(!) succeed. > >> Question: does the timeout of the directory attributes cause open() do >> do an NFS lookup on the file, or does open() just find the vnode in the >> cache and use its cached handle? > > Well, for open() without O_CREAT the sequence is this: > open() -> vn_open() -> namei() -> lookup() -> VOP_LOOKUP() -> nfs_lookup() > | > VOP_ACCESS() -> nfs_access() [ -> nfs3_access_otw() ] > | > VOP_OPEN() -> nfs_open() > > Lookup is always done first (obviously). It may return cached name which > contains a pointer to a cached vnode/nfsnode. Cached vnode/nfsnode is used > further in VOP_ACCESS() and VOP_OPEN(). Either function may or may not > update file attributes cached inside nfsnode. Neither VOP_ACCESS() or > VOP_OPEN() ever updates the *file handle*. File handle comes from > VOP_LOOKUP(). And VOP_LOOKUP() only places it there if vnode/nfsnode isn't > cached. Which I believe happens only if there is no cached filename in > the namecache. I really tried to do my best to describe everything in: > http://www.blackflag.ru/patches/nfs_attr.txt > Please take a look. If the client is mostly idle, then the cached filename is unlikely to be flushed, so even after a long period of time, namei() will return the old vnode and its associated file handle. If the file on the server was renamed and not deleted, the server won't return ESTALE for the handle and open() will return a descriptor for the original file on the server that has since been renamed, not for the new file on the server that lives at the path name passed to open() on the client. Another example: client% cmd1 > file1 client% cmd2 > file2 client% more file1 ^Z suspended server% mv file1 tmpfile; mv file2 file1; mv tmpfile file2 wait 24 hours client% cat /dev/null > file1 client% fg The last cat comand should truncate file1 on the server, which is the output of cmd2. When the more command resumes, it should still be able to able to see the output of cmd1. The old file1 vnode and file handle should remain valid, but the lookup to open file1 for the last cat command needs to know that the cache entry has timed out and that the handle associated with the cached vnode for file1 hasn't been validated in a while. Lookup() needs to bypass the cache in the case and pass the lookup request to the server. If the file handle returned is the same as before, the cache entry should be freshened, if the file handle is different then a new vnode needs to be allocated and associated with the name cache entry and the new handle. The old vnode and its handle need to be retained until either an rpc using this handle returns ESTALE, or the the file is closed and the vnode is recycled. > Whether ESTALE came from VOP_ACCESS() or VOP_OPEN() depends on several > factors. Namely, the value of nfsaccess_cache_timeout sysctl, acmin/acmax > and the age of the file in question. > > Generally speaking, if nfsaccess_cache_timeout is less than acmin, > VOP_ACCESS() that comes right before VOP_OPEN() in vn_open() will try to do > an "access" RPC request and it'll fail if the file handle is stale. If > nfsaccess_cache_timeout is greater than acmin, than it's possible that > VOP_ACCESS() will answer "yes" basing on the cached attributes, but > VOP_GETATTR(), which is called from nfs_open() (which is VOP_OPEN() for > NFS) will in turn "go to the wire" and still nfs_request() will fail with > ESTALE. > > Hope, I'm making it clear :) Yeah, but the solution that you propose doesn't fix the case where ESTALE is not returned but namei() returns a cached vnode associated with a file on the server that doesn't exist at the specified path name. Also, fixing open() doesn't fix similar problems that can occur with other syscalls that take path names, such as stat() and readlink(). If the lookup code is changed so that it more frequently revalidates the name->vnode->handle entries, then the window where open() can fail due to ESTALE would be greatly reduced.