From owner-freebsd-fs@FreeBSD.ORG  Sat Mar 20 00:58:16 2010
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id E5E4D106564A;
	Sat, 20 Mar 2010 00:58:15 +0000 (UTC)
	(envelope-from rmacklem@uoguelph.ca)
Received: from esa-jnhn.mail.uoguelph.ca (esa-jnhn.mail.uoguelph.ca
	[131.104.91.44])
	by mx1.freebsd.org (Postfix) with ESMTP id 2B9AB8FC19;
	Sat, 20 Mar 2010 00:58:14 +0000 (UTC)
X-IronPort-Anti-Spam-Filtered: true
X-IronPort-Anti-Spam-Result: AvsEAEW6o0uDaFvH/2dsb2JhbACbPnO7O4R8BA
X-IronPort-AV: E=Sophos;i="4.51,277,1267419600"; d="scan'208";a="69311447"
Received: from danube.cs.uoguelph.ca ([131.104.91.199])
	by esa-jnhn-pri.mail.uoguelph.ca with ESMTP; 19 Mar 2010 20:58:14 -0400
Received: from localhost (localhost.localdomain [127.0.0.1])
	by danube.cs.uoguelph.ca (Postfix) with ESMTP id D69A410842BD;
	Fri, 19 Mar 2010 20:58:13 -0400 (EDT)
X-Virus-Scanned: amavisd-new at danube.cs.uoguelph.ca
Received: from danube.cs.uoguelph.ca ([127.0.0.1])
	by localhost (danube.cs.uoguelph.ca [127.0.0.1]) (amavisd-new,
	port 10024)
	with ESMTP id eQWt1OGBglIs; Fri, 19 Mar 2010 20:58:13 -0400 (EDT)
Received: from muncher.cs.uoguelph.ca (muncher.cs.uoguelph.ca [131.104.91.102])
	by danube.cs.uoguelph.ca (Postfix) with ESMTP id 635CB108402D;
	Fri, 19 Mar 2010 20:58:13 -0400 (EDT)
Received: from localhost (rmacklem@localhost)
	by muncher.cs.uoguelph.ca (8.11.7p3+Sun/8.11.6) with ESMTP id
	o2K1B3S21356; Fri, 19 Mar 2010 21:11:03 -0400 (EDT)
X-Authentication-Warning: muncher.cs.uoguelph.ca: rmacklem owned process doing
	-bs
Date: Fri, 19 Mar 2010 21:11:03 -0400 (EDT)
From: Rick Macklem <rmacklem@uoguelph.ca>
X-X-Sender: rmacklem@muncher.cs.uoguelph.ca
To: John Baldwin <jhb@freebsd.org>
In-Reply-To: <201003190831.00950.jhb@freebsd.org>
Message-ID: <Pine.GSO.4.63.1003192054080.17841@muncher.cs.uoguelph.ca>
References: <4BA3613F.4070606@comcast.net> <201003190831.00950.jhb@freebsd.org>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
Cc: freebsd-fs@freebsd.org, User Questions <freebsd-questions@freebsd.org>,
	bseklecki@noc.cfi.pgh.pa.us
Subject: Re: FreeBSD NFS client goes into infinite retry loop
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 20 Mar 2010 00:58:16 -0000


On Fri, 19 Mar 2010, John Baldwin wrote:

> On Friday 19 March 2010 7:34:23 am Steve Polyack wrote:
>> Hi, we use a FreeBSD 8-STABLE (from shortly after release) system as an
>> NFS server to provide user home directories which get mounted across a
>> few machines (all 6.3-RELEASE).  For the past few weeks we have been
>> running into problems where one particular client will go into an
>> infinite loop where it is repeatedly trying to write data which causes
>> the NFS server to return "reply ok 40 write ERROR: Input/output error
>> PRE: POST:".  This retry loop can cause between 20mbps and 500mbps of

I'm afraid I don't quite understand what you mean by "causes the NFS
server to return "reply ok 40 write ERROR..."". Is this something
logged by syslog (I can't find a printf like this in the kernel
sources) or is this something that tcpdump is giving you or ???

Why I ask is that it seems to say that the server is returning EIO
(or maybe 40 == EMSGSIZE).

The server should return ESTALE (NFSERR_STALE) after a file has
been deleted. If it is returning EIO, then that will cause the
client to keep trying to write the dirty block to the server.
(EIO is interpreted by the client as a "transient error".)

[good stuff snipped]
>>
>> I have a feeling that using NFS in such a matter may simply be prone to
>> such problems, but what confuses me is why the NFS client system is
>> infinitely retrying the write operation and causing itself so much grief.
>
> Yes, your feeling is correct.  This sort of race is inherent to NFS if you do
> not use some sort of locking protocol to resolve the race.  The infinite
> retries sound like a client-side issue.  Have you been able to try a newer OS
> version on a client to see if it still causes the same behavior?
>
As John notes, having one client delete a file while another is trying
to write it, is not a good thing.

However, the server should return ESTALE after the file is deleted and
that tells the client that the write can never succeed, so it marks the
buffer cache block invalid and returns the error to the app. (The app.
may not see it, if it doesn't check for error returns upon close as well
as write, but that's another story...)

If you could look at a packet trace via wireshark when the problem
occurs, it would be nice to see what the server is returning. (If it
isn't ESTALE and the file no longer exists on the server, then thats
a server problem.) If it is returning ESTALE, then the client is busted.
(At a glance, the client code looks like it would handle ESTALE as a
fatal error for the buffer cache, but that doesn't mean it isn't broken,
just that it doesn't appear wrong. Also, it looks like mmap'd writes
won't recognize a fatal write error and will just keep trying to write
the dirty page back to the server. Take this with a big grain of salt,
since I just took a quick look at the sources. FreeBSD6->8 appear to
be pretty much the same as far as this goes, in the client.

Please let us know if you can see the server's error reply code.

Good luck with it, rick
ps: If the server isn't returning ESTALE, you could try switching to
     the experimental nfs server and see if it exhibits the same behaviour?
     ("-e" option on both mountd and nfsd, assuming the server is
      FreeBSD8.)