From owner-freebsd-fs@FreeBSD.ORG  Sun Jan 10 21:26:17 2010
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 61D02106566B
	for <freebsd-fs@freebsd.org>; Sun, 10 Jan 2010 21:26:17 +0000 (UTC)
	(envelope-from rmacklem@uoguelph.ca)
Received: from esa-annu.mail.uoguelph.ca (esa-annu.mail.uoguelph.ca
	[131.104.91.36])
	by mx1.freebsd.org (Postfix) with ESMTP id 18F028FC18
	for <freebsd-fs@freebsd.org>; Sun, 10 Jan 2010 21:26:16 +0000 (UTC)
X-IronPort-Anti-Spam-Filtered: true
X-IronPort-Anti-Spam-Result: ApoEAPzUSUuDaFvH/2dsb2JhbADReYIhgg4E
X-IronPort-AV: E=Sophos;i="4.49,251,1262581200"; d="scan'208";a="60717293"
Received: from danube.cs.uoguelph.ca ([131.104.91.199])
	by esa-annu-pri.mail.uoguelph.ca with ESMTP; 10 Jan 2010 16:26:15 -0500
Received: from localhost (localhost.localdomain [127.0.0.1])
	by danube.cs.uoguelph.ca (Postfix) with ESMTP id 4DD3A1084454;
	Sun, 10 Jan 2010 16:26:14 -0500 (EST)
X-Virus-Scanned: amavisd-new at danube.cs.uoguelph.ca
Received: from danube.cs.uoguelph.ca ([127.0.0.1])
	by localhost (danube.cs.uoguelph.ca [127.0.0.1]) (amavisd-new,
	port 10024)
	with ESMTP id dalOjxewKhYe; Sun, 10 Jan 2010 16:26:13 -0500 (EST)
Received: from muncher.cs.uoguelph.ca (muncher.cs.uoguelph.ca [131.104.91.102])
	by danube.cs.uoguelph.ca (Postfix) with ESMTP id 971CD108440B;
	Sun, 10 Jan 2010 16:26:12 -0500 (EST)
Received: from localhost (rmacklem@localhost)
	by muncher.cs.uoguelph.ca (8.11.7p3+Sun/8.11.6) with ESMTP id
	o0ALaIg09132; Sun, 10 Jan 2010 16:36:18 -0500 (EST)
X-Authentication-Warning: muncher.cs.uoguelph.ca: rmacklem owned process doing
	-bs
Date: Sun, 10 Jan 2010 16:36:18 -0500 (EST)
From: Rick Macklem <rmacklem@uoguelph.ca>
X-X-Sender: rmacklem@muncher.cs.uoguelph.ca
To: Mikolaj Golub <to.my.trociny@gmail.com>
In-Reply-To: <86ocl272mb.fsf@kopusha.onet>
Message-ID: <Pine.GSO.4.63.1001101623540.4616@muncher.cs.uoguelph.ca>
References: <86ocl272mb.fsf@kopusha.onet>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
Cc: freebsd-fs@FreeBSD.org
Subject: Re: FreeBSD NFS client/Linux NFS server issue
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 10 Jan 2010 21:26:17 -0000


On Sun, 10 Jan 2010, Mikolaj Golub wrote:

>
> For one of the incident we were tcpdumping "problem" NFS connection for about
> 1 hour and during this hour an activity was observed only once:
>
> 08:20:38.281422 IP (tos 0x0, ttl 64, id 56110, offset 0, flags [DF], proto TCP (6), length 140) 172.30.10.27.344496259 > 172.30.10.121.2049: 88 access fh[1:9300:10df8001] 003f
> 08:20:38.281554 IP (tos 0x0, ttl 64, id 26624, offset 0, flags [DF], proto TCP (6), length 52) 172.30.10.121.2049 > 172.30.10.27.971: ., cksum 0xca5e (correct), 89408667:89408667(0) ack 1517941890 win 46 <nop,nop,timestamp 901975640 111169517>
>
> The client sent rpc ACCESS request for root exported inode, received tcp ack
> response (so tcp connection was ok) but did not receive any RPC reply from the
> server.
>
> So it looks like the problem on NFS server side. But for me it looks a bit
> strange that freebsd client is sending rpc packets so rarely. Shouldn't it
> retransmit them more frequently? For another incident we monitored tcp
> connection for 4 minutes and did not see any packets then. Unfortunately we
> can't run tcpdumping long time as these are production servers and we need to
> reboot hosts to restore normal operations.
>

For NFSv3 over TCP, there was no RFC specification, so client behaviour 
when the server failed to reply to an RPC was essentially undefined. (For
NFSv4, a client isn't allowed to retry a non-NULL RPC on the same TCP
connection and a server is expected to reply to all RPCs received on the
connection or do a disconnect, but that's NFSv4 not NFSv3.)

I think the new krpc in FreeBSD8 does to a slow timeout on RPCs over TCP
for NFSv3 and eventually does a retry, but I didn't write the code, so I'm
not absolutely sure. (I'll try and remember to take a look, or maybe dfr
can comment?) However, this krpc code isn't used for FreeBSD7.

Bottom line is I don't think the client does a retry until it sees the
TCP connection break and if the server isn't replying to the RPC nor
disconnecting the TCP connection, it'll be stuck as you describe.

I think you have three choices:
1 - Fix the NFS server so that it does reply or disconnects, if that is
     possible. (I have no idea if the Linux NFS server can be
     reconfigured?)
2 - Switch to using UDP (which will retry RPCs when no reply is received).
3 - Try a FreeBSD8 system and see if it works ok, then upgrade if that's
     practical?

rick
ps: As an historical note, I think I implemented NFS over TCP before
     anyone else and assumed that a server would reply to all RPC
     requests, so retries at the RPC level wouldn't be necessary.
     Others, like Sun, implemented NFS over TCP with RPC timeout/retries
     and then slowly came over to my way of thinking, but it wasn't
     spelled out until NFSv4.