From owner-freebsd-hackers@FreeBSD.ORG Thu Jun 19 02:55:43 2003 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 1956C37B401 for ; Thu, 19 Jun 2003 02:55:43 -0700 (PDT) Received: from frontend3.aha.ru (elk.zenon.net [213.189.198.216]) by mx1.FreeBSD.org (Postfix) with ESMTP id 2FCD943F3F for ; Thu, 19 Jun 2003 02:55:40 -0700 (PDT) (envelope-from uitm@blackflag.ru) Received: from [195.2.90.70] (HELO slt.oz) by frontend3.aha.ru (CommuniGate Pro SMTP 4.0.6) with ESMTP id 7721958 for freebsd-hackers@freebsd.org; Thu, 19 Jun 2003 13:55:38 +0400 Received: (from uitm@localhost) by slt.oz (8.8.8/8.8.8) id NAA00538 for freebsd-hackers@freebsd.org; Thu, 19 Jun 2003 13:55:53 +0400 (MSD) From: Andrey Alekseyev Message-Id: <200306190955.NAA00538@slt.oz> To: freebsd-hackers@freebsd.org Date: Thu, 19 Jun 2003 13:54:31 +0400 (MSD) X-Mailer: ELM [version 2.4ME+ PL31 (25)] MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Subject: open() and ESTALE error X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 19 Jun 2003 09:55:43 -0000 Hello, I've been trying lately to develop a solution for the problem with open() that manifests itself in ESTALE error in the following situation: 1. NFS server: echo "1111" > file01 2. NFS client: cat file01 3. NFS server: echo "2222" > file02 && mv file02 file01 4. NFS client: cat file01 (either old file01 contents or ESTALE) My study shows that actually the problem appears to be in VOP_ACCESS() which is called from vn_open(). If nfs_access() decides to "go to the wire" in #4, it then uses a cached file handle which is indeed stale. Thus, open() eventually fails with ESTALE too (ESTALE comes from underlying nfs_request()). I understand all the fundamental NFS-related integrity problems, but not this one :) That is, I see no reason for open() to fail to open a file for reading or writing if the system knows the problem is it's own. Why not just do another lookup and try obtain a valid file handle? I was playing with different parts of the kernel while "fixing" this for myself. However, I believe, the simpliest patch would be for vfs_syscalls.c:open() (I've also made a working patch against vn_open(), though). Could anyone please be so kind to comment this issue? TIA --- kern/vfs_syscalls.c.orig Thu Jun 19 13:22:50 2003 +++ kern/vfs_syscalls.c Thu Jun 19 13:29:11 2003 @@ -1008,6 +1008,7 @@ int type, indx, error; struct flock lf; struct nameidata nd; + int stale = 0; oflags = SCARG(uap, flags); if ((oflags & O_ACCMODE) == O_ACCMODE) @@ -1025,8 +1026,15 @@ * the descriptor while we are blocked in vn_open() */ fhold(fp); +again: error = vn_open(&nd, flags, cmode); if (error) { + /* + * if the underlying filesystem returns ESTALE + * we must have used a cached file handle. + */ + if (error == ESTALE && stale++ == 0) + goto again; /* * release our own reference */