From owner-freebsd-fs@FreeBSD.ORG  Sat Oct 15 18:41:44 2005
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
X-Original-To: fs@freebsd.org
Delivered-To: freebsd-fs@FreeBSD.ORG
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 887CA16A420
	for <fs@freebsd.org>; Sat, 15 Oct 2005 18:41:44 +0000 (GMT)
	(envelope-from rick@snowhite.cis.uoguelph.ca)
Received: from ccshst09.cs.uoguelph.ca (ccshst09.cs.uoguelph.ca
	[131.104.96.18])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 2CF1D43D4C
	for <fs@freebsd.org>; Sat, 15 Oct 2005 18:41:43 +0000 (GMT)
	(envelope-from rick@snowhite.cis.uoguelph.ca)
Received: from snowhite.cis.uoguelph.ca (snowhite.cis.uoguelph.ca
	[131.104.48.1])
	by ccshst09.cs.uoguelph.ca (8.13.1/8.13.1) with ESMTP id j9FIfcAc023938;
	Sat, 15 Oct 2005 14:41:38 -0400
Received: (from rick@localhost)
	by snowhite.cis.uoguelph.ca (8.9.3/8.9.3) id OAA37331;
	Sat, 15 Oct 2005 14:43:01 -0400 (EDT)
Date: Sat, 15 Oct 2005 14:43:01 -0400 (EDT)
From: rick@snowhite.cis.uoguelph.ca
Message-Id: <200510151843.OAA37331@snowhite.cis.uoguelph.ca>
To: fs@freebsd.org
X-Scanned-By: MIMEDefang 2.52 on 131.104.96.18
Cc: 
Subject: FreeBSD NFS server not responding to TCP SYN packets from
	Linux/SunOS clients
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 15 Oct 2005 18:41:44 -0000

>> When Sun first did NFS over TCP, I believe they did
>> do retries (using a conservative timeout). I think I eventually convinced Sun
>> that it wasn't a good idea and I think that Solaris no longer
>> does them, but I'm not sure. (For this to work correctly, a server is required
>> to disconnect whenever it can't generate a reply to an RPC over TCP for any
>> reason.)
>
>yes, this is a difficult semantic.

For v3,4 it shouldn't be necessary, except in extreme circumstances, since
the server can always just reply NFSERR_DELAY. For v2, I'd be tempted to
discourage v2 over TCP, arguing that v2 is just there for old clients that
can't do anything else and let them use UDP.

In other words, NFSERR_DELAY is your friend:-)

> it means that there is now a race that allows a server to redo a 
> non-idempotent request if the client reconnects on another port and 
> sends a retransmit of a stuck request.  i've seen this in practice, and 
> for certain applications this will cause data corruption.
> 
> most Linux NFS clients will not reconnect on the same port after the 
> server disconnects (a bug i recently addressed).  for servers with a 
> duplicate reply cache, this means the client can retransmit 
> non-idempotent requests and the DRC will not stop the requests from 
> being reapplied.  such servers are dependent on identifying RPC requests 
> by the tuple of [ XID, source port, client IP ] -- if source port 
> changes, then the DRC is rendered ineffective.

I'd argue that the DRC shouldn't depend on the same port#. (It can even
be argued that it shouldn't depend on same client host IP#, since they
can change dynamically via dhcp, etc.) I think you'll find a very brief
(and crappy) description of what I use for my current DRC on the ftp
site (ftp.cis.uoguelph.ca/pub/nfsv4/server-cache.algorithm and some
notes in ftp.cis.uoguelph.ca/pub/nfsv4/doc.tar.gz).
Basically, it uses XID, plus a checksum of the first N bytes of the request and
a few other checks.
[good stuff snipped]

> if a client *doesn't* retransmit, is there any guarantee that a 
> hard-mounted client can make forward progress?

Probably not. But I don't think it has been a problem, in practice, for
FreeBSD? (I suspect that servers only fail to reply to requests when they
are "dead in the water".)

The BSD server never drops a request in progress. It does MGET()s and
MALLOC()s with M_WAITOK. The problem is that most BSDen are pretty well
toast by the time this happens. I am thinking that I should change the
server to use M_NOWAIT and then return NFSERR_DELAY when it gets a NULL
ptr. (For v2, only allow UDP and drop the request.) But I
haven't gotten around to coding it. (Lots of cases where NULL ptrs have
to be checked for--> lots of work:-)

rick