From owner-freebsd-stable@FreeBSD.ORG Mon Oct 25 12:26:19 2004 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id B1CB016A4CE for ; Mon, 25 Oct 2004 12:26:19 +0000 (GMT) Received: from fledge.watson.org (fledge.watson.org [204.156.12.50]) by mx1.FreeBSD.org (Postfix) with ESMTP id 51E6043D4C for ; Mon, 25 Oct 2004 12:26:19 +0000 (GMT) (envelope-from robert@fledge.watson.org) Received: from fledge.watson.org (localhost [127.0.0.1]) by fledge.watson.org (8.13.1/8.13.1) with ESMTP id i9PCPoYX004856; Mon, 25 Oct 2004 08:25:50 -0400 (EDT) (envelope-from robert@fledge.watson.org) Received: from localhost (robert@localhost)i9PCPn1S004853; Mon, 25 Oct 2004 13:25:49 +0100 (BST) (envelope-from robert@fledge.watson.org) Date: Mon, 25 Oct 2004 13:25:49 +0100 (BST) From: Robert Watson X-Sender: robert@fledge.watson.org To: Joan Picanyol In-Reply-To: <20041025092330.GB39457@grummit.biaix.org> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII cc: freebsd-stable@freebsd.org Subject: Re: process stuck in nfsfsync state X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 25 Oct 2004 12:26:19 -0000 On Mon, 25 Oct 2004, Joan Picanyol wrote: > > Is there an response to the request? If not, that might suggest the > > server is wedged, not the client. If you are willing to share the results > > of a tcpdump -s 1500 -w output from a few seconds during the > > wedge, that would be very useful. > > Available at http://biaix.org/pk/debug/nfs/ These are from just after > logging in to GNOME until gconfd-2 goes to nfsfsync, and the nfs server > not responding messages start appearing. Comparing the client and server traces, it looks like fragments in the client-generated writes are being lost. For example, frame 4175 in the client trace is a fragmented NFSv3 write over UDP. The total datagram size is 8192, but it's broken down into six IP fragments: Frame IP offset Length Arrived? 4175 0 1480 Yes 4176 1480 1480 Yes 4177 2960 1480 Yes 4178 4440 1480 Yes 4179 5920 1480 No 4180 7400 944 Yes Without the missing fragments, the datagrams (and hence RPCs) can't be reassembled, and with 6-fragment datagrams, even fairly low probability loss for individual packets adds up (or multiplies up!). So the question is: where are your fragments going? Since the fragments all ended up in the BPF trace on the client, we know that sufficient mbufs could be allocated on that side to build not only the datagram but the fragment stream, as well as insert it into the interface queue without an overflow; they could still have been dropped at a low level in the driver. Since they don't appear, even corrupted, in the server trace, we know they either didn't reach the server or were dropped very early in processing in the driver. Dropping in the IP stack would occur after the packet was submitted to BPF. So if possible, I might try some of the following: - Substituting a different switch or hub between the two systems, and looking for possible chronic sources of packet loss between them. - If possible, getting a trace of the packets on an intermediate node to see whether the packets were really sent or not. Maybe on a monitor port on the switch, or by inserting a bridging node. My suspicion is either that the sender is dropping them at a low level in the driver, perhaps due to a resource leak, or that they're dropped on the way through an intermediate node. Maybe something is particularly sensitive to the rapid sequential send of the 6 fragments. - Perhaps instrumenting the device drivers on the sender and recipient to look for possible areas where packet drops are being triggered. - I think someone already suggested disabling hardware checksumming, but if you haven't tried that, it would be worth trying it. - It would be useful to see if less complicated NFS meta-transactions than "Start GTK" can trigger the problem. For example, doing a large dd to a file in NFS, varying the blocksize to see if you can find useful thresholds that trigger the problem. I see a lot of successful 512 byte writes in the trace, but larger datagram sizes of 8192 for writes seem to have problems. Robert N M Watson FreeBSD Core Team, TrustedBSD Projects robert@fledge.watson.org Principal Research Scientist, McAfee Research