FreeBSD Mail Archives

Date:      Fri, 14 Oct 2005 17:44:58 -0400
From:      Chuck Lever <cel@citi.umich.edu>
To:        rick@snowhite.cis.uoguelph.ca
Cc:        fs@freebsd.org
Subject:   Re: FreeBSD NFS server not responding to TCP SYN packets from Linux/SunOS clients
Message-ID:  <435026DA.5050101@citi.umich.edu>
In-Reply-To: <200510142020.QAA26662@snowhite.cis.uoguelph.ca>
References:  <200510142020.QAA26662@snowhite.cis.uoguelph.ca>

This is a multi-part message in MIME format.
--------------090700060104010008020909
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit

rick@snowhite.cis.uoguelph.ca wrote:
>>where is that rule stated?  most NFS clients i am aware of retransmit an 
>>RPC after 60 seconds over TCP.
> 
> 
> For NFSv4, it's in RFC3530, Sec. 3.1.1 (actually applies to RPCs other
> than NULL).

i recently had a thorough discussion of this with the author of that 
section, Mike Eisler.

> For NFSv2,3 it was never required by the RFCs, so it is
> questionable what the correct behaviour is. Being the first to do NFS over
> TCP, I only did retransmits after reconnect. I think I described it that
> way in the ancient Usenix paper. (http://snowhite.cis.uoguelph.ca/nfsv4,
> then click on it)

i will try to grab that.

> When Sun first did NFS over TCP, I believe they did
> do retries (using a conservative timeout). I think I eventually convinced Sun
> that it wasn't a good idea and I think that Solaris no longer
> does them, but I'm not sure. (For this to work correctly, a server is required
> to disconnect whenever it can't generate a reply to an RPC over TCP for any
> reason.)

yes, this is a difficult semantic.

it means that there is now a race that allows a server to redo a 
non-idempotent request if the client reconnects on another port and 
sends a retransmit of a stuck request.  i've seen this in practice, and 
for certain applications this will cause data corruption.

most Linux NFS clients will not reconnect on the same port after the 
server disconnects (a bug i recently addressed).  for servers with a 
duplicate reply cache, this means the client can retransmit 
non-idempotent requests and the DRC will not stop the requests from 
being reapplied.  such servers are dependent on identifying RPC requests 
by the tuple of [ XID, source port, client IP ] -- if source port 
changes, then the DRC is rendered ineffective.

servers that don't have a DRC for TCP are exposed to this problem.  when 
they disconnect the TCP connection, they've lost all stream transport 
guarantees (no request reordering, no duplicate requests).  on reconnect 
a client can retransmit any requests it hasn't received a reply for, 
which are then reapplied by the server.  if the server doesn't guarantee 
that these retransmitted requests are applied in the same order that the 
original requests were applied, there is opportunity for data corruption.

retransmitting an idempotent request will cause a connection drop, 
meaning any non-idempotents requests that were outstanding at the time 
will have to be retransmitted.

this is load dependent behavior.  when a server slows down, a client 
that retransmits on TCP is more likely to retransmit one or more 
non-idempotent requests.  this means the server will disconnect, 
creating even more work for server, network, and client, and it means 
the likelihood of data corruption increases as load increases.

if a client *doesn't* retransmit, is there any guarantee that a 
hard-mounted client can make forward progress?

> So, for NFSv2,3 I don't know of a stated "rule". I don't think it is covered
> in the NFS interoperability RFC that appeared a while back, but can't
> remember for sure.

we've been looking for a while, but haven't seen anything.

--------------090700060104010008020909--

Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?435026DA.5050101>

Header And Logo

Peripheral Links

Site Navigation

Header And Logo

Peripheral Links

Search

Site Navigation