From owner-freebsd-hackers@FreeBSD.ORG Thu Jun 19 23:18:01 2003 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 40F5E37B401 for ; Thu, 19 Jun 2003 23:18:01 -0700 (PDT) Received: from gw.catspoiler.org (217-ip-163.nccn.net [209.79.217.163]) by mx1.FreeBSD.org (Postfix) with ESMTP id 69C8843F93 for ; Thu, 19 Jun 2003 23:18:00 -0700 (PDT) (envelope-from truckman@FreeBSD.org) Received: from FreeBSD.org (mousie.catspoiler.org [192.168.101.2]) by gw.catspoiler.org (8.12.9/8.12.9) with ESMTP id h5K6HaM7058935; Thu, 19 Jun 2003 23:17:49 -0700 (PDT) (envelope-from truckman@FreeBSD.org) Message-Id: <200306200617.h5K6HaM7058935@gw.catspoiler.org> Date: Thu, 19 Jun 2003 23:17:36 -0700 (PDT) From: Don Lewis To: uitm@blackflag.ru In-Reply-To: <200306190955.NAA00538@slt.oz> MIME-Version: 1.0 Content-Type: TEXT/plain; charset=us-ascii cc: freebsd-hackers@FreeBSD.org Subject: Re: open() and ESTALE error X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 20 Jun 2003 06:18:01 -0000 On 19 Jun, Andrey Alekseyev wrote: > Hello, > > I've been trying lately to develop a solution for the problem with > open() that manifests itself in ESTALE error in the following situation: > > 1. NFS server: echo "1111" > file01 > 2. NFS client: cat file01 > 3. NFS server: echo "2222" > file02 && mv file02 file01 > 4. NFS client: cat file01 (either old file01 contents or ESTALE) > > My study shows that actually the problem appears to be in VOP_ACCESS() > which is called from vn_open(). If nfs_access() decides to "go to the wire" > in #4, it then uses a cached file handle which is indeed stale. Thus, > open() eventually fails with ESTALE too (ESTALE comes from underlying > nfs_request()). > > I understand all the fundamental NFS-related integrity problems, but not > this one :) That is, I see no reason for open() to fail to open a file for > reading or writing if the system knows the problem is it's own. Why not > just do another lookup and try obtain a valid file handle? > > I was playing with different parts of the kernel while "fixing" this for > myself. However, I believe, the simpliest patch would be for > vfs_syscalls.c:open() (I've also made a working patch against vn_open(), > though). > > Could anyone please be so kind to comment this issue? > > TIA > > --- kern/vfs_syscalls.c.orig Thu Jun 19 13:22:50 2003 > +++ kern/vfs_syscalls.c Thu Jun 19 13:29:11 2003 > @@ -1008,6 +1008,7 @@ > int type, indx, error; > struct flock lf; > struct nameidata nd; > + int stale = 0; > > oflags = SCARG(uap, flags); > if ((oflags & O_ACCMODE) == O_ACCMODE) > @@ -1025,8 +1026,15 @@ > * the descriptor while we are blocked in vn_open() > */ > fhold(fp); > +again: > error = vn_open(&nd, flags, cmode); > if (error) { > + /* > + * if the underlying filesystem returns ESTALE > + * we must have used a cached file handle. > + */ > + if (error == ESTALE && stale++ == 0) > + goto again; > /* > * release our own reference > */ I can't get very enthusiastic about changing the file system independent code to fix a deficiency in the NFS implementation. If the name of the file are you attempting to open is relative to your current working directory, and your current working directory is nuked on the server, vn_open will return ESTALE, and your patch above will loop forever. NFS really doesn't work very well if modifications are make by both a client and the server, or by multiple clients. Solaris attempts to compensate with a mount option: noac Suppress data and attribute caching. The data caching that is suppressed is the write-behind. The local page cache is still maintained, but data copied into it is immediately written to the server. If the rename on the server was done within the attribute validity time on the client, vn_open() will succeed even without your patch, but you may encounter the ESTALE error when you actually try to read or write the file. Unless you have some sort of locking protocol or other way of synchronizing this sequence of operations on the client and server, the server could do the rename while the client has the file open, after which some I/O operation on the client will encounter ESTALE. If the problem is that open() is failing a long time after the server did the rename, then the best solution may be for the client to time out file handles more aggressively. If the vnode on the client is closed, the file handle could be timed out after acregmin/acregmax or acdirmin/acdirmax, or a new handle timeout parameter. This may decrease performance, but nothing is free ...