From owner-freebsd-questions@FreeBSD.ORG  Mon Mar 22 14:52:46 2010
Return-Path: <owner-freebsd-questions@FreeBSD.ORG>
Delivered-To: freebsd-questions@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 6CF04106566C
	for <freebsd-questions@freebsd.org>;
	Mon, 22 Mar 2010 14:52:46 +0000 (UTC)
	(envelope-from korvus@comcast.net)
Received: from qmta06.westchester.pa.mail.comcast.net
	(qmta06.westchester.pa.mail.comcast.net [76.96.62.56])
	by mx1.freebsd.org (Postfix) with ESMTP id 15CE68FC1A
	for <freebsd-questions@freebsd.org>;
	Mon, 22 Mar 2010 14:52:44 +0000 (UTC)
Received: from omta14.westchester.pa.mail.comcast.net ([76.96.62.60])
	by qmta06.westchester.pa.mail.comcast.net with comcast
	id wBqR1d0071HzFnQ56Esl7Y; Mon, 22 Mar 2010 14:52:45 +0000
Received: from [10.0.0.51] ([71.199.122.142])
	by omta14.westchester.pa.mail.comcast.net with comcast
	id wEsk1d00D34Sj4f3aEskB9; Mon, 22 Mar 2010 14:52:45 +0000
Message-ID: <4BA78444.4040707@comcast.net>
Date: Mon, 22 Mar 2010 10:52:52 -0400
From: Steve Polyack <korvus@comcast.net>
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 6.1; en-GB;
	rv:1.9.1.8) Gecko/20100227 Lightning/1.0b1 Thunderbird/3.0.3
MIME-Version: 1.0
To: Rick Macklem <rmacklem@uoguelph.ca>
References: <4BA3613F.4070606@comcast.net> <201003190831.00950.jhb@freebsd.org>
	<4BA37AE9.4060806@comcast.net> <4BA392B1.4050107@comcast.net>
	<4BA3DEBC.2000608@comcast.net>
	<Pine.GSO.4.63.1003192120470.17841@muncher.cs.uoguelph.ca>
	<4BA432C8.4040707@comcast.net>
	<Pine.GSO.4.63.1003192322420.6295@muncher.cs.uoguelph.ca>
In-Reply-To: <Pine.GSO.4.63.1003192322420.6295@muncher.cs.uoguelph.ca>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Cc: freebsd-fs@freebsd.org, bseklecki@noc.cfi.pgh.pa.us,
	User Questions <freebsd-questions@freebsd.org>,
	John Baldwin <jhb@freebsd.org>
Subject: Re: FreeBSD NFS client goes into infinite retry loop
X-BeenThere: freebsd-questions@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: User questions <freebsd-questions.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-questions>, 
	<mailto:freebsd-questions-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-questions>
List-Post: <mailto:freebsd-questions@freebsd.org>
List-Help: <mailto:freebsd-questions-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-questions>, 
	<mailto:freebsd-questions-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 22 Mar 2010 14:52:46 -0000

On 3/19/2010 11:27 PM, Rick Macklem wrote:
>
>
> On Fri, 19 Mar 2010, Steve Polyack wrote:
>
> [good stuff snipped]
>>
>> This makes sense.  According to wireshark, the server is indeed 
>> transmitting "Status: NFS3ERR_IO (5)".  Perhaps this should be STALE 
>> instead; it sounds more correct than marking it a general IO error.  
>> Also, the NFS server is serving its share off of a ZFS filesystem, if 
>> it makes any difference.  I suppose ZFS could be talking to the NFS 
>> server threads with some mismatched language, but I doubt it.
>>
> Ok, now I think we're making progress. If VFS_FHTOVP() doesn't return
> ESTALE when the file no longer exists, the NFS server returns whatever
> error it has returned.
>
> So, either VFS_FHTOVP() succeeds after the file has been deleted, which
> would be a problem that needs to be fixed within ZFS
> OR
> ZFS returns an error other than ESTALE when it doesn't exist.
>
> Try the following patch on the server (which just makes any error
> returned by VFS_FHTOVP() into ESTALE) and see if that helps.
>
> --- nfsserver/nfs_srvsubs.c.sav    2010-03-19 22:06:43.000000000 -0400
> +++ nfsserver/nfs_srvsubs.c    2010-03-19 22:07:22.000000000 -0400
> @@ -1127,6 +1127,8 @@
>          }
>      }
>      error = VFS_FHTOVP(mp, &fhp->fh_fid, vpp);
> +    if (error != 0)
> +        error = ESTALE;
>      vfs_unbusy(mp);
>      if (error)
>          goto out;
>
> Please let me know if the patch helps, rick
>
>
The patch seems to fix the bad behavior.  Running with the patch, I see 
the following output from my patch (return code of nfs_doio from within 
nfsiod):
nfssvc_iod: iod 0 nfs_doio returned errno: 70

Furthermore, when inspecting the transaction with Wireshark, after 
deleting the file on the NFS server it looks like there is only a single 
error.  This time there it is a reply to a V3 Lookup call that contains 
a status of "NFS3ERR_NOENT (2)" coming from the NFS server.  The client 
also does not repeatedly try to complete the failed request.

Any suggestions on the next step here?  Based on what you said it looks 
like ZFS is falsely reporting an IO error to VFS instead of ESTALE / 
NOENT.  I tried looking around zfs_fhtovp() and only saw returns of 
EINVAL, but I'm not even sure I'm looking in the right place.