From owner-freebsd-questions@FreeBSD.ORG Mon Mar 22 14:51:07 2010 Return-Path: Delivered-To: freebsd-questions@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id A3C861065678; Mon, 22 Mar 2010 14:51:07 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from cyrus.watson.org (cyrus.watson.org [65.122.17.42]) by mx1.freebsd.org (Postfix) with ESMTP id 722188FC08; Mon, 22 Mar 2010 14:51:07 +0000 (UTC) Received: from bigwig.baldwin.cx (66.111.2.69.static.nyinternet.net [66.111.2.69]) by cyrus.watson.org (Postfix) with ESMTPSA id 218FE46B7E; Mon, 22 Mar 2010 10:51:07 -0400 (EDT) Received: from jhbbsd.localnet (smtp.hudson-trading.com [209.249.190.9]) by bigwig.baldwin.cx (Postfix) with ESMTPA id 40B748A021; Mon, 22 Mar 2010 10:51:06 -0400 (EDT) From: John Baldwin To: Rick Macklem Date: Mon, 22 Mar 2010 09:46:57 -0400 User-Agent: KMail/1.12.1 (FreeBSD/7.3-CBSD-20100217; KDE/4.3.1; amd64; ; ) References: <4BA3613F.4070606@comcast.net> <4BA432C8.4040707@comcast.net> In-Reply-To: MIME-Version: 1.0 Content-Type: Text/Plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Message-Id: <201003220946.57087.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.0.1 (bigwig.baldwin.cx); Mon, 22 Mar 2010 10:51:06 -0400 (EDT) X-Virus-Scanned: clamav-milter 0.95.1 at bigwig.baldwin.cx X-Virus-Status: Clean X-Spam-Status: No, score=-1.7 required=4.2 tests=AWL,BAYES_00 autolearn=ham version=3.2.5 X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on bigwig.baldwin.cx Cc: freebsd-fs@freebsd.org, Steve Polyack , bseklecki@noc.cfi.pgh.pa.us, User Questions Subject: Re: FreeBSD NFS client goes into infinite retry loop X-BeenThere: freebsd-questions@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: User questions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 22 Mar 2010 14:51:07 -0000 On Friday 19 March 2010 11:27:13 pm Rick Macklem wrote: > > On Fri, 19 Mar 2010, Steve Polyack wrote: > > [good stuff snipped] > > > > This makes sense. According to wireshark, the server is indeed transmitting > > "Status: NFS3ERR_IO (5)". Perhaps this should be STALE instead; it sounds > > more correct than marking it a general IO error. Also, the NFS server is > > serving its share off of a ZFS filesystem, if it makes any difference. I > > suppose ZFS could be talking to the NFS server threads with some mismatched > > language, but I doubt it. > > > Ok, now I think we're making progress. If VFS_FHTOVP() doesn't return > ESTALE when the file no longer exists, the NFS server returns whatever > error it has returned. > > So, either VFS_FHTOVP() succeeds after the file has been deleted, which > would be a problem that needs to be fixed within ZFS > OR > ZFS returns an error other than ESTALE when it doesn't exist. > > Try the following patch on the server (which just makes any error > returned by VFS_FHTOVP() into ESTALE) and see if that helps. > > --- nfsserver/nfs_srvsubs.c.sav 2010-03-19 22:06:43.000000000 -0400 > +++ nfsserver/nfs_srvsubs.c 2010-03-19 22:07:22.000000000 -0400 > @@ -1127,6 +1127,8 @@ > } > } > error = VFS_FHTOVP(mp, &fhp->fh_fid, vpp); > + if (error != 0) > + error = ESTALE; > vfs_unbusy(mp); > if (error) > goto out; > > Please let me know if the patch helps, rick I can confirm that ZFS's FHTOVP() method never returns ESTALE. Perhaps this patch would fix it? It changes zfs_fhtovp() to return ESTALE if the generation count doesn't match. If this doesn't help, you can try changing some of the other return cases in this function to ESTALE (many use EINVAL) until you find the one that matches this condition. Index: zfs_vfsops.c =================================================================== --- zfs_vfsops.c (revision 205334) +++ zfs_vfsops.c (working copy) @@ -1256,7 +1256,7 @@ dprintf("znode gen (%u) != fid gen (%u)\n", zp_gen, fid_gen); VN_RELE(ZTOV(zp)); ZFS_EXIT(zfsvfs); - return (EINVAL); + return (ESTALE); } *vpp = ZTOV(zp); -- John Baldwin