From owner-freebsd-fs@FreeBSD.ORG  Wed May 19 15:44:16 2010
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 1EEB81065670;
	Wed, 19 May 2010 15:44:16 +0000 (UTC) (envelope-from jhb@freebsd.org)
Received: from cyrus.watson.org (cyrus.watson.org [65.122.17.42])
	by mx1.freebsd.org (Postfix) with ESMTP id CDF138FC16;
	Wed, 19 May 2010 15:44:15 +0000 (UTC)
Received: from bigwig.baldwin.cx (66.111.2.69.static.nyinternet.net
	[66.111.2.69])
	by cyrus.watson.org (Postfix) with ESMTPSA id 52A5246BC5;
	Wed, 19 May 2010 11:44:15 -0400 (EDT)
Received: from jhbbsd.localnet (smtp.hudson-trading.com [209.249.190.9])
	by bigwig.baldwin.cx (Postfix) with ESMTPA id 57F228A01F;
	Wed, 19 May 2010 11:44:14 -0400 (EDT)
From: John Baldwin <jhb@freebsd.org>
To: fs@freebsd.org
Date: Wed, 19 May 2010 11:44:00 -0400
User-Agent: KMail/1.12.1 (FreeBSD/7.3-CBSD-20100217; KDE/4.3.1; amd64; ; )
MIME-Version: 1.0
Content-Type: Text/Plain;
  charset="us-ascii"
Content-Transfer-Encoding: 7bit
Message-Id: <201005191144.00382.jhb@freebsd.org>
X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.0.1
	(bigwig.baldwin.cx); Wed, 19 May 2010 11:44:14 -0400 (EDT)
X-Virus-Scanned: clamav-milter 0.95.1 at bigwig.baldwin.cx
X-Virus-Status: Clean
X-Spam-Status: No, score=-2.5 required=4.2 tests=AWL,BAYES_00 autolearn=ham
	version=3.2.5
X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on bigwig.baldwin.cx
Cc: Rick Macklem <rmacklem@freebsd.org>, Robert Watson <rwatson@freebsd.org>
Subject: [PATCH] Better handling of stale filehandles in open() in the NFS
	client
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 19 May 2010 15:44:16 -0000

One of the things the NFS client does to provide close-to-open consistency is 
that the client mandates that at least one ACCESS or GETATTR RPC is sent over 
the wire as part of every open(2) system call.  However, we currently only 
enforce that during nfs_open() (VOP_OPEN()).  If nfs_open() encounters a stale 
file handle, it fails the open(2) system call with ESTALE.

A much nicer user experience is for nfs_lookup() to actually send the ACCESS 
or GETATTR RPC instead.  If that RPC fails with ESTALE, then nfs_lookup() will 
send a LOOKUP RPC which will find the new file handle (assuming a rename has 
caused the file handle for a given filename to change) and the open(2) will 
succeed with the new file handle.  I believe that this in fact used to happen 
quite often until I merged a change from Yahoo! which stopped flushing cached 
attributes during nfs_close().  With that change an open() -> close() -> 
open() sequence in quick succession will now use cached attributes during the 
lookup and only notice a stale filehandle in nfs_open().

This can lead to some astonishing behavior.  To reproduce, run 'cat 
/some/file' in an loop every 2 seconds or so on an NFS client.  In another 
window, login to the NFS server and replace /some/file with /some/otherfile 
using mv(1).  The next cat in the NFS client window will usually fail with 
ESTALE.  The subsequent cat will work as it will relookup the filename and 
find the new filehandle.

The fix I came up with is to modify the NFS client lookup routine.  Before we 
trust a hit in the namecache, we check the attributes to see if we should 
trust the namecache hit.  What my patch does is to force that attribute check 
to send a GETATTR or ACCESS RPC over the wire instead of using cached 
attributes when doing a lookup on the last component of an ISOPEN lookup (so a 
lookup for open(2) or execve(2)).  This forces the ESTALE error to occur 
during the VOP_LOOKUP() stage of open(2) instead of VOP_OPEN().

Thoughts?

Index: nfs_vnops.c
===================================================================
--- nfs_vnops.c	(revision 208707)
+++ nfs_vnops.c	(working copy)
@@ -875,7 +875,7 @@
 	struct mbuf *mreq, *mrep, *md, *mb;
 	long len;
 	nfsfh_t *fhp;
-	struct nfsnode *np;
+	struct nfsnode *np, *newnp;
 	int error = 0, attrflag, fhsize;
 	int v3 = NFS_ISV3(dvp);
 	struct thread *td = cnp->cn_thread;
@@ -901,8 +901,24 @@
 		 * change time of the file matches our cached copy.
 		 * Otherwise, we discard the cache entry and fallback
 		 * to doing a lookup RPC.
+		 *
+		 * To better handle stale file handles, clear the
+		 * attribute cache of this node if it is a leaf
+		 * component and part of an open() call before
+		 * fetching the attributes.  This should allow stale
+		 * file handles to be detected here where we can fall
+		 * back to a LOOKUP RPC to recover rather than having
+		 * nfs_open() detect the stale file handle and failing
+		 * open(2) with ESTALE.
 		 */
 		newvp = *vpp;
+		if ((cnp->cn_flags & (ISLASTCN | ISOPEN)) ==
+		    (ISLASTCN | ISOPEN)) {
+			newnp = VTONFS(newvp);
+			mtx_lock(&newnp->n_mtx);
+			newnp->n_attrstamp = 0;
+			mtx_unlock(&newnp->n_mtx);
+		}
 		if (!VOP_GETATTR(newvp, &vattr, cnp->cn_cred, td)
 		    && vattr.va_ctime.tv_sec == VTONFS(newvp)->n_ctime) {
 			nfsstats.lookupcache_hits++;


-- 
John Baldwin