From owner-freebsd-questions@FreeBSD.ORG Fri Mar 19 11:34:24 2010 Return-Path: Delivered-To: freebsd-questions@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 69AF3106566C for ; Fri, 19 Mar 2010 11:34:24 +0000 (UTC) (envelope-from korvus@comcast.net) Received: from mx04.pub.collaborativefusion.com (mx04.pub.collaborativefusion.com [206.210.72.84]) by mx1.freebsd.org (Postfix) with ESMTP id 36F948FC1D for ; Fri, 19 Mar 2010 11:34:23 +0000 (UTC) Received: from [192.168.2.164] ([206.210.89.202]) by mx04.pub.collaborativefusion.com (StrongMail Enterprise 4.1.1.4(4.1.1.4-47689)); Fri, 19 Mar 2010 07:49:59 -0400 X-VirtualServerGroup: Default X-MailingID: 00000::00000::00000::00000::::25 X-SMHeaderMap: mid="X-MailingID" X-Destination-ID: freebsd-questions@freebsd.org X-SMFBL: ZnJlZWJzZC1xdWVzdGlvbnNAZnJlZWJzZC5vcmc= Message-ID: <4BA3613F.4070606@comcast.net> Date: Fri, 19 Mar 2010 07:34:23 -0400 From: Steve Polyack User-Agent: Mozilla/5.0 (X11; U; FreeBSD amd64; en-US; rv:1.9.1.7) Gecko/20100311 Thunderbird/3.0.1 MIME-Version: 1.0 To: User Questions , freebsd-fs@freebsd.org Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: bseklecki@noc.cfi.pgh.pa.us Subject: FreeBSD NFS client goes into infinite retry loop X-BeenThere: freebsd-questions@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: User questions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 19 Mar 2010 11:34:24 -0000 Hi, we use a FreeBSD 8-STABLE (from shortly after release) system as an NFS server to provide user home directories which get mounted across a few machines (all 6.3-RELEASE). For the past few weeks we have been running into problems where one particular client will go into an infinite loop where it is repeatedly trying to write data which causes the NFS server to return "reply ok 40 write ERROR: Input/output error PRE: POST:". This retry loop can cause between 20mbps and 500mbps of constant traffic on our network, depending on the size of the data associated with the failed write. We spent some time on the issue and determined that something on one of the clients is deleting a file as it is being written to by another NFS client. We were able to enable the NFS lockmgr and use lockf(1) to fix most of these conditions, and the frequency of this problem has dropped from once a night to once a week. However, it's still a problem and we can't necessarily force all of our users to "play nice" and use lockf/flock. Has anyone seen this before? No errors are being logged on the NFS server itself, but the "Server Ret-Failed" counter begins to increase rapidly whenever a client gets stuck in this infinite retry loop: Server Ret-Failed 224768961 I have a feeling that using NFS in such a matter may simply be prone to such problems, but what confuses me is why the NFS client system is infinitely retrying the write operation and causing itself so much grief. Thanks for any suggestions anyone can provide, Steve Polyack