From owner-freebsd-stable@FreeBSD.ORG  Mon Oct 25 16:11:51 2004
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
Delivered-To: freebsd-stable@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 091C916A4CE
	for <freebsd-stable@freebsd.org>;
	Mon, 25 Oct 2004 16:11:51 +0000 (GMT)
Received: from grummit.biaix.org (86.Red-213-97-212.pooles.rima-tde.net
	[213.97.212.86])	by mx1.FreeBSD.org (Postfix) with SMTP id AB39043D53
	for <freebsd-stable@freebsd.org>;
	Mon, 25 Oct 2004 16:11:49 +0000 (GMT)
	(envelope-from lists-freebsd-stable@biaix.org)
Received: (qmail 42188 invoked by uid 1000); 25 Oct 2004 16:09:30 -0000
Date: Mon, 25 Oct 2004 18:09:30 +0200
From: Joan Picanyol <lists-freebsd-stable@biaix.org>
To: Robert Watson <rwatson@freebsd.org>
Message-ID: <20041025160930.GA41784@grummit.biaix.org>
Mail-Followup-To: Robert Watson <rwatson@freebsd.org>,
	freebsd-stable@freebsd.org
References: <20041025092330.GB39457@grummit.biaix.org>
	<Pine.NEB.3.96L.1041025131419.3203A-100000@fledge.watson.org>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <Pine.NEB.3.96L.1041025131419.3203A-100000@fledge.watson.org>
User-Agent: Mutt/1.4.1i
cc: freebsd-stable@freebsd.org
Subject: Re: process stuck in nfsfsync state
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Production branch of FreeBSD source code
	<freebsd-stable.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
	<mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
	<mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 25 Oct 2004 16:11:51 -0000

* Robert Watson <rwatson@freebsd.org> [20041025 14:24]:
> 
> On Mon, 25 Oct 2004, Joan Picanyol wrote:
> 
> > > Is there an response to the request?  If not, that might suggest the
> > > server is wedged, not the client.  If you are willing to share the results
> > > of a tcpdump -s 1500 -w <whatever> output from a few seconds during the
> > > wedge, that would be very useful.
> > 
> > Available at http://biaix.org/pk/debug/nfs/ These are from just after
> > logging in to GNOME until gconfd-2 goes to nfsfsync, and the nfs server
> > not responding messages start appearing. 
> 

[snip *much* appreciated detailed analysis]

>   So if possible, I might try some of the following:
[...]
> - I think someone already suggested disabling hardware checksumming, but
>   if you haven't tried that, it would be worth trying it.

No difference. 

> - It would be useful to see if less complicated NFS meta-transactions than
>   "Start GTK" can trigger the problem.  For example, doing a large dd to a
>   file in NFS, varying the blocksize to see if you can find useful
>   thresholds that trigger the problem.  I see a lot of successful 512 byte
>   writes in the trace, but larger datagram sizes of 8192 for writes seem
>   to have problems.

Now this is interesting:

dd if=/dev/urandom of=/fs/bulk/mount/dummy bs=512 count=14

wedges the NFS mount point 100% of the times. Lowering the count to 13
doesn't reproduce the hang.

An another possibly interesting data point is that NFS over TCP works
ok. For this I'm particularly grateful, since I can now mount my /home
fs and do my work.

Am I the only one seeing this?

tks
-- 
pica