From owner-freebsd-fs@FreeBSD.ORG Wed May 19 15:44:16 2010 Return-Path: Delivered-To: fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 1EEB81065670; Wed, 19 May 2010 15:44:16 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from cyrus.watson.org (cyrus.watson.org [65.122.17.42]) by mx1.freebsd.org (Postfix) with ESMTP id CDF138FC16; Wed, 19 May 2010 15:44:15 +0000 (UTC) Received: from bigwig.baldwin.cx (66.111.2.69.static.nyinternet.net [66.111.2.69]) by cyrus.watson.org (Postfix) with ESMTPSA id 52A5246BC5; Wed, 19 May 2010 11:44:15 -0400 (EDT) Received: from jhbbsd.localnet (smtp.hudson-trading.com [209.249.190.9]) by bigwig.baldwin.cx (Postfix) with ESMTPA id 57F228A01F; Wed, 19 May 2010 11:44:14 -0400 (EDT) From: John Baldwin To: fs@freebsd.org Date: Wed, 19 May 2010 11:44:00 -0400 User-Agent: KMail/1.12.1 (FreeBSD/7.3-CBSD-20100217; KDE/4.3.1; amd64; ; ) MIME-Version: 1.0 Content-Type: Text/Plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Message-Id: <201005191144.00382.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.0.1 (bigwig.baldwin.cx); Wed, 19 May 2010 11:44:14 -0400 (EDT) X-Virus-Scanned: clamav-milter 0.95.1 at bigwig.baldwin.cx X-Virus-Status: Clean X-Spam-Status: No, score=-2.5 required=4.2 tests=AWL,BAYES_00 autolearn=ham version=3.2.5 X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on bigwig.baldwin.cx Cc: Rick Macklem , Robert Watson Subject: [PATCH] Better handling of stale filehandles in open() in the NFS client X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 19 May 2010 15:44:16 -0000 One of the things the NFS client does to provide close-to-open consistency is that the client mandates that at least one ACCESS or GETATTR RPC is sent over the wire as part of every open(2) system call. However, we currently only enforce that during nfs_open() (VOP_OPEN()). If nfs_open() encounters a stale file handle, it fails the open(2) system call with ESTALE. A much nicer user experience is for nfs_lookup() to actually send the ACCESS or GETATTR RPC instead. If that RPC fails with ESTALE, then nfs_lookup() will send a LOOKUP RPC which will find the new file handle (assuming a rename has caused the file handle for a given filename to change) and the open(2) will succeed with the new file handle. I believe that this in fact used to happen quite often until I merged a change from Yahoo! which stopped flushing cached attributes during nfs_close(). With that change an open() -> close() -> open() sequence in quick succession will now use cached attributes during the lookup and only notice a stale filehandle in nfs_open(). This can lead to some astonishing behavior. To reproduce, run 'cat /some/file' in an loop every 2 seconds or so on an NFS client. In another window, login to the NFS server and replace /some/file with /some/otherfile using mv(1). The next cat in the NFS client window will usually fail with ESTALE. The subsequent cat will work as it will relookup the filename and find the new filehandle. The fix I came up with is to modify the NFS client lookup routine. Before we trust a hit in the namecache, we check the attributes to see if we should trust the namecache hit. What my patch does is to force that attribute check to send a GETATTR or ACCESS RPC over the wire instead of using cached attributes when doing a lookup on the last component of an ISOPEN lookup (so a lookup for open(2) or execve(2)). This forces the ESTALE error to occur during the VOP_LOOKUP() stage of open(2) instead of VOP_OPEN(). Thoughts? Index: nfs_vnops.c =================================================================== --- nfs_vnops.c (revision 208707) +++ nfs_vnops.c (working copy) @@ -875,7 +875,7 @@ struct mbuf *mreq, *mrep, *md, *mb; long len; nfsfh_t *fhp; - struct nfsnode *np; + struct nfsnode *np, *newnp; int error = 0, attrflag, fhsize; int v3 = NFS_ISV3(dvp); struct thread *td = cnp->cn_thread; @@ -901,8 +901,24 @@ * change time of the file matches our cached copy. * Otherwise, we discard the cache entry and fallback * to doing a lookup RPC. + * + * To better handle stale file handles, clear the + * attribute cache of this node if it is a leaf + * component and part of an open() call before + * fetching the attributes. This should allow stale + * file handles to be detected here where we can fall + * back to a LOOKUP RPC to recover rather than having + * nfs_open() detect the stale file handle and failing + * open(2) with ESTALE. */ newvp = *vpp; + if ((cnp->cn_flags & (ISLASTCN | ISOPEN)) == + (ISLASTCN | ISOPEN)) { + newnp = VTONFS(newvp); + mtx_lock(&newnp->n_mtx); + newnp->n_attrstamp = 0; + mtx_unlock(&newnp->n_mtx); + } if (!VOP_GETATTR(newvp, &vattr, cnp->cn_cred, td) && vattr.va_ctime.tv_sec == VTONFS(newvp)->n_ctime) { nfsstats.lookupcache_hits++; -- John Baldwin