From owner-freebsd-fs@FreeBSD.ORG Thu May 20 13:33:15 2010 Return-Path: Delivered-To: fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id A0D3A106566C; Thu, 20 May 2010 13:33:15 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from cyrus.watson.org (cyrus.watson.org [65.122.17.42]) by mx1.freebsd.org (Postfix) with ESMTP id 5BC868FC1C; Thu, 20 May 2010 13:33:15 +0000 (UTC) Received: from bigwig.baldwin.cx (66.111.2.69.static.nyinternet.net [66.111.2.69]) by cyrus.watson.org (Postfix) with ESMTPSA id E7E9F46BA1; Thu, 20 May 2010 09:33:14 -0400 (EDT) Received: from jhbbsd.localnet (smtp.hudson-trading.com [209.249.190.9]) by bigwig.baldwin.cx (Postfix) with ESMTPA id D26FD8A021; Thu, 20 May 2010 09:33:13 -0400 (EDT) From: John Baldwin To: Rick Macklem Date: Thu, 20 May 2010 09:22:17 -0400 User-Agent: KMail/1.12.1 (FreeBSD/7.3-CBSD-20100217; KDE/4.3.1; amd64; ; ) References: <201005191144.00382.jhb@freebsd.org> In-Reply-To: MIME-Version: 1.0 Content-Type: Text/Plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Message-Id: <201005200922.17245.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.0.1 (bigwig.baldwin.cx); Thu, 20 May 2010 09:33:14 -0400 (EDT) X-Virus-Scanned: clamav-milter 0.95.1 at bigwig.baldwin.cx X-Virus-Status: Clean X-Spam-Status: No, score=-2.4 required=4.2 tests=AWL,BAYES_00 autolearn=ham version=3.2.5 X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on bigwig.baldwin.cx Cc: Rick Macklem , Robert Watson , fs@freebsd.org Subject: Re: [PATCH] Better handling of stale filehandles in open() in the NFS client X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 20 May 2010 13:33:15 -0000 On Wednesday 19 May 2010 8:12:10 pm Rick Macklem wrote: > > On Wed, 19 May 2010, John Baldwin wrote: > > > One of the things the NFS client does to provide close-to-open consistency is > > that the client mandates that at least one ACCESS or GETATTR RPC is sent over > > the wire as part of every open(2) system call. However, we currently only > > enforce that during nfs_open() (VOP_OPEN()). If nfs_open() encounters a stale > > file handle, it fails the open(2) system call with ESTALE. > > > > A much nicer user experience is for nfs_lookup() to actually send the ACCESS > > or GETATTR RPC instead. If that RPC fails with ESTALE, then nfs_lookup() will > > send a LOOKUP RPC which will find the new file handle (assuming a rename has > > caused the file handle for a given filename to change) and the open(2) will > > succeed with the new file handle. I believe that this in fact used to happen > > quite often until I merged a change from Yahoo! which stopped flushing cached > > attributes during nfs_close(). With that change an open() -> close() -> > > open() sequence in quick succession will now use cached attributes during the > > lookup and only notice a stale filehandle in nfs_open(). > > > > This can lead to some astonishing behavior. To reproduce, run 'cat > > /some/file' in an loop every 2 seconds or so on an NFS client. In another > > window, login to the NFS server and replace /some/file with /some/otherfile > > using mv(1). The next cat in the NFS client window will usually fail with > > ESTALE. The subsequent cat will work as it will relookup the filename and > > find the new filehandle. > > > > Not astonishing at all:-) That's just NFS not having any cache coherency > protocol. (Many moons ago, I tried via nqnfs, but nobody cared.:-) > Btw, many server's don't change a file handle upon a rename and it was > once considered bad form to do so, but nowadays some don't and some do. True, though I guess that implies that CTO doesn't cover renames, only open and close of a given filehandle. It's probably non-obvious to many users of NFS though. > > The fix I came up with is to modify the NFS client lookup routine. Before we > > trust a hit in the namecache, we check the attributes to see if we should > > trust the namecache hit. What my patch does is to force that attribute check > > to send a GETATTR or ACCESS RPC over the wire instead of using cached > > attributes when doing a lookup on the last component of an ISOPEN lookup (so a > > lookup for open(2) or execve(2)). This forces the ESTALE error to occur > > during the VOP_LOOKUP() stage of open(2) instead of VOP_OPEN(). > > > > Thoughts? > > > > It sounds fine but seems like it's going to increase the Getattr RPC cnt > since nfs_open() invalidates the attribute cache for some cases? It doesn't change the RPC count because of changes that Mohan added to the NFS client a while ago so that nfs_open() doesn't invalide the attribute cache during nfs_open() if it was already updated via nfs_lookup() during the same system call. With Mohan's changes in place, all this change does is move the GETATTR/ACCESS RPC earlier in the case of a namecache hit. -- John Baldwin