From owner-freebsd-fs@FreeBSD.ORG Thu May 20 00:27:05 2010 Return-Path: Delivered-To: fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 8D02B106566C for ; Thu, 20 May 2010 00:27:05 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from esa-jnhn.mail.uoguelph.ca (esa-jnhn.mail.uoguelph.ca [131.104.91.44]) by mx1.freebsd.org (Postfix) with ESMTP id 3F4238FC15 for ; Thu, 20 May 2010 00:27:05 +0000 (UTC) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AvsEACMY9EuDaFvG/2dsb2JhbACeAXG9e4J2AYIZBA X-IronPort-AV: E=Sophos;i="4.53,266,1272859200"; d="scan'208";a="77016758" Received: from amazon.cs.uoguelph.ca ([131.104.91.198]) by esa-jnhn-pri.mail.uoguelph.ca with ESMTP; 19 May 2010 19:56:56 -0400 Received: from localhost (localhost.localdomain [127.0.0.1]) by amazon.cs.uoguelph.ca (Postfix) with ESMTP id E8C1F210193; Wed, 19 May 2010 19:56:56 -0400 (EDT) X-Virus-Scanned: amavisd-new at amazon.cs.uoguelph.ca Received: from amazon.cs.uoguelph.ca ([127.0.0.1]) by localhost (amazon.cs.uoguelph.ca [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id PF53E3H7nOxh; Wed, 19 May 2010 19:56:55 -0400 (EDT) Received: from muncher.cs.uoguelph.ca (muncher.cs.uoguelph.ca [131.104.91.102]) by amazon.cs.uoguelph.ca (Postfix) with ESMTP id 67CA42101DF; Wed, 19 May 2010 19:56:54 -0400 (EDT) Received: from localhost (rmacklem@localhost) by muncher.cs.uoguelph.ca (8.11.7p3+Sun/8.11.6) with ESMTP id o4K0CBs10768; Wed, 19 May 2010 20:12:12 -0400 (EDT) X-Authentication-Warning: muncher.cs.uoguelph.ca: rmacklem owned process doing -bs Date: Wed, 19 May 2010 20:12:10 -0400 (EDT) From: Rick Macklem X-X-Sender: rmacklem@muncher.cs.uoguelph.ca To: John Baldwin In-Reply-To: <201005191144.00382.jhb@freebsd.org> Message-ID: References: <201005191144.00382.jhb@freebsd.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: Rick Macklem , Robert Watson , fs@freebsd.org Subject: Re: [PATCH] Better handling of stale filehandles in open() in the NFS client X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 20 May 2010 00:27:05 -0000 On Wed, 19 May 2010, John Baldwin wrote: > One of the things the NFS client does to provide close-to-open consistency is > that the client mandates that at least one ACCESS or GETATTR RPC is sent over > the wire as part of every open(2) system call. However, we currently only > enforce that during nfs_open() (VOP_OPEN()). If nfs_open() encounters a stale > file handle, it fails the open(2) system call with ESTALE. > > A much nicer user experience is for nfs_lookup() to actually send the ACCESS > or GETATTR RPC instead. If that RPC fails with ESTALE, then nfs_lookup() will > send a LOOKUP RPC which will find the new file handle (assuming a rename has > caused the file handle for a given filename to change) and the open(2) will > succeed with the new file handle. I believe that this in fact used to happen > quite often until I merged a change from Yahoo! which stopped flushing cached > attributes during nfs_close(). With that change an open() -> close() -> > open() sequence in quick succession will now use cached attributes during the > lookup and only notice a stale filehandle in nfs_open(). > > This can lead to some astonishing behavior. To reproduce, run 'cat > /some/file' in an loop every 2 seconds or so on an NFS client. In another > window, login to the NFS server and replace /some/file with /some/otherfile > using mv(1). The next cat in the NFS client window will usually fail with > ESTALE. The subsequent cat will work as it will relookup the filename and > find the new filehandle. > Not astonishing at all:-) That's just NFS not having any cache coherency protocol. (Many moons ago, I tried via nqnfs, but nobody cared.:-) Btw, many server's don't change a file handle upon a rename and it was once considered bad form to do so, but nowadays some don't and some do. > The fix I came up with is to modify the NFS client lookup routine. Before we > trust a hit in the namecache, we check the attributes to see if we should > trust the namecache hit. What my patch does is to force that attribute check > to send a GETATTR or ACCESS RPC over the wire instead of using cached > attributes when doing a lookup on the last component of an ISOPEN lookup (so a > lookup for open(2) or execve(2)). This forces the ESTALE error to occur > during the VOP_LOOKUP() stage of open(2) instead of VOP_OPEN(). > > Thoughts? > It sounds fine but seems like it's going to increase the Getattr RPC cnt since nfs_open() invalidates the attribute cache for some cases? Did you happen to try something like a "make buildworld" with and without the patch and compare RPC counts? I'd say sounds great so long as the RPC counts don't go up much. If they do, I suspect somebody won't be happy. (When I talked to Alfred last week, all Juniper cares about is build performance and doesn't care diddly w.r.t. coherence between multiple clients/client and server.) Have fun with it, rick