From owner-freebsd-questions@FreeBSD.ORG  Fri Mar 19 11:34:24 2010
Return-Path: <owner-freebsd-questions@FreeBSD.ORG>
Delivered-To: freebsd-questions@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 69AF3106566C
	for <freebsd-questions@freebsd.org>;
	Fri, 19 Mar 2010 11:34:24 +0000 (UTC)
	(envelope-from korvus@comcast.net)
Received: from mx04.pub.collaborativefusion.com
	(mx04.pub.collaborativefusion.com [206.210.72.84])
	by mx1.freebsd.org (Postfix) with ESMTP id 36F948FC1D
	for <freebsd-questions@freebsd.org>;
	Fri, 19 Mar 2010 11:34:23 +0000 (UTC)
Received: from [192.168.2.164] ([206.210.89.202])
	by mx04.pub.collaborativefusion.com (StrongMail Enterprise
	4.1.1.4(4.1.1.4-47689)); Fri, 19 Mar 2010 07:49:59 -0400
X-VirtualServerGroup: Default
X-MailingID: 00000::00000::00000::00000::::25
X-SMHeaderMap: mid="X-MailingID"
X-Destination-ID: freebsd-questions@freebsd.org
X-SMFBL: ZnJlZWJzZC1xdWVzdGlvbnNAZnJlZWJzZC5vcmc=
Message-ID: <4BA3613F.4070606@comcast.net>
Date: Fri, 19 Mar 2010 07:34:23 -0400
From: Steve Polyack <korvus@comcast.net>
User-Agent: Mozilla/5.0 (X11; U; FreeBSD amd64; en-US;
	rv:1.9.1.7) Gecko/20100311 Thunderbird/3.0.1
MIME-Version: 1.0
To: User Questions <freebsd-questions@freebsd.org>,
 freebsd-fs@freebsd.org
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Cc: bseklecki@noc.cfi.pgh.pa.us
Subject: FreeBSD NFS client goes into infinite retry loop
X-BeenThere: freebsd-questions@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: User questions <freebsd-questions.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-questions>, 
	<mailto:freebsd-questions-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-questions>
List-Post: <mailto:freebsd-questions@freebsd.org>
List-Help: <mailto:freebsd-questions-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-questions>, 
	<mailto:freebsd-questions-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 19 Mar 2010 11:34:24 -0000

Hi, we use a FreeBSD 8-STABLE (from shortly after release) system as an 
NFS server to provide user home directories which get mounted across a 
few machines (all 6.3-RELEASE).  For the past few weeks we have been 
running into problems where one particular client will go into an 
infinite loop where it is repeatedly trying to write data which causes 
the NFS server to return "reply ok 40 write ERROR: Input/output error 
PRE: POST:".  This retry loop can cause between 20mbps and 500mbps of 
constant traffic on our network, depending on the size of the data 
associated with the failed write.

We spent some time on the issue and determined that something on one of 
the clients is deleting a file as it is being written to by another NFS 
client.  We were able to enable the NFS lockmgr and use lockf(1) to fix 
most of these conditions, and the frequency of this problem has dropped 
from once a night to once a week.  However, it's still a problem and we 
can't necessarily force all of our users to "play nice" and use lockf/flock.

Has anyone seen this before?  No errors are being logged on the NFS 
server itself, but the "Server Ret-Failed" counter begins to increase 
rapidly whenever a client gets stuck in this infinite retry loop:
Server Ret-Failed
         224768961

I have a feeling that using NFS in such a matter may simply be prone to 
such problems, but what confuses me is why the NFS client system is 
infinitely retrying the write operation and causing itself so much grief.

Thanks for any suggestions anyone can provide,
Steve Polyack