From owner-freebsd-hackers@FreeBSD.ORG  Thu Jun 19 23:18:01 2003
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 40F5E37B401
	for <freebsd-hackers@FreeBSD.org>;
	Thu, 19 Jun 2003 23:18:01 -0700 (PDT)
Received: from gw.catspoiler.org (217-ip-163.nccn.net [209.79.217.163])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 69C8843F93
	for <freebsd-hackers@FreeBSD.org>;
	Thu, 19 Jun 2003 23:18:00 -0700 (PDT)
	(envelope-from truckman@FreeBSD.org)
Received: from FreeBSD.org (mousie.catspoiler.org [192.168.101.2])
	by gw.catspoiler.org (8.12.9/8.12.9) with ESMTP id h5K6HaM7058935;
	Thu, 19 Jun 2003 23:17:49 -0700 (PDT)
	(envelope-from truckman@FreeBSD.org)
Message-Id: <200306200617.h5K6HaM7058935@gw.catspoiler.org>
Date: Thu, 19 Jun 2003 23:17:36 -0700 (PDT)
From: Don Lewis <truckman@FreeBSD.org>
To: uitm@blackflag.ru
In-Reply-To: <200306190955.NAA00538@slt.oz>
MIME-Version: 1.0
Content-Type: TEXT/plain; charset=us-ascii
cc: freebsd-hackers@FreeBSD.org
Subject: Re: open() and ESTALE error
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
	<freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
	<mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
	<mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 20 Jun 2003 06:18:01 -0000

On 19 Jun, Andrey Alekseyev wrote:
> Hello,
> 
> I've been trying lately to develop a solution for the problem with
> open() that manifests itself in ESTALE error in the following situation:
> 
> 1. NFS server: echo "1111" > file01
> 2. NFS client: cat file01
> 3. NFS server: echo "2222" > file02 && mv file02 file01
> 4. NFS client: cat file01 (either old file01 contents or ESTALE)
> 
> My study shows that actually the problem appears to be in VOP_ACCESS()
> which is called from vn_open(). If nfs_access() decides to "go to the wire"
> in #4, it then uses a cached file handle which is indeed stale. Thus,
> open() eventually fails with ESTALE too (ESTALE comes from underlying
> nfs_request()).
> 
> I understand all the fundamental NFS-related integrity problems, but not
> this one :) That is, I see no reason for open() to fail to open a file for
> reading or writing if the system knows the problem is it's own. Why not
> just do another lookup and try obtain a valid file handle?
> 
> I was playing with different parts of the kernel while "fixing" this for
> myself. However, I believe, the simpliest patch would be for
> vfs_syscalls.c:open() (I've also made a working patch against vn_open(),
> though).
> 
> Could anyone please be so kind to comment this issue?
> 
> TIA
> 
> --- kern/vfs_syscalls.c.orig	Thu Jun 19 13:22:50 2003
> +++ kern/vfs_syscalls.c	Thu Jun 19 13:29:11 2003
> @@ -1008,6 +1008,7 @@
>  	int type, indx, error;
>  	struct flock lf;
>  	struct nameidata nd;
> +	int stale = 0;
>  
>  	oflags = SCARG(uap, flags);
>  	if ((oflags & O_ACCMODE) == O_ACCMODE)
> @@ -1025,8 +1026,15 @@
>  	 * the descriptor while we are blocked in vn_open()
>  	 */
>  	fhold(fp);
> +again:
>  	error = vn_open(&nd, flags, cmode);
>  	if (error) {
> +		/*
> +		 * if the underlying filesystem returns ESTALE
> +		 * we must have used a cached file handle.
> +		 */
> +		if (error == ESTALE && stale++ == 0)
> +			goto again;
>  		/*
>  		 * release our own reference
>  		 */

I can't get very enthusiastic about changing the file system independent
code to fix a deficiency in the NFS implementation.

If the name of the file are you attempting to open is relative to your
current working directory, and your current working directory is nuked
on the server, vn_open will return ESTALE, and your patch above will
loop forever.

NFS really doesn't work very well if modifications are make by both a
client and the server, or by multiple clients.  Solaris attempts to
compensate with a mount option:
           noac  Suppress data and attribute  caching.  The  data
                 caching  that is suppressed is the write-behind.
                 The local page cache is  still  maintained,  but
                 data  copied  into  it is immediately written to
                 the server.


If the rename on the server was done within the attribute validity time
on the client, vn_open() will succeed even without your patch, but you
may encounter the ESTALE error when you actually try to read or write
the file.

Unless you have some sort of locking protocol or other way of
synchronizing this sequence of operations on the client and server, the
server could do the rename while the client has the file open, after
which some I/O operation on the client will encounter ESTALE.

If the problem is that open() is failing a long time after the server
did the rename, then the best solution may be for the client to time out
file handles more aggressively.  If the vnode on the client is closed,
the file handle could be timed out after acregmin/acregmax or
acdirmin/acdirmax, or a new handle timeout parameter.  This may decrease
performance, but nothing is free ...