Date: Fri, 14 Oct 2005 17:44:58 -0400 From: Chuck Lever <cel@citi.umich.edu> To: rick@snowhite.cis.uoguelph.ca Cc: fs@freebsd.org Subject: Re: FreeBSD NFS server not responding to TCP SYN packets from Linux/SunOS clients Message-ID: <435026DA.5050101@citi.umich.edu> In-Reply-To: <200510142020.QAA26662@snowhite.cis.uoguelph.ca> References: <200510142020.QAA26662@snowhite.cis.uoguelph.ca>
next in thread | previous in thread | raw e-mail | index | archive | help
This is a multi-part message in MIME format. --------------090700060104010008020909 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit rick@snowhite.cis.uoguelph.ca wrote: >>where is that rule stated? most NFS clients i am aware of retransmit an >>RPC after 60 seconds over TCP. > > > For NFSv4, it's in RFC3530, Sec. 3.1.1 (actually applies to RPCs other > than NULL). i recently had a thorough discussion of this with the author of that section, Mike Eisler. > For NFSv2,3 it was never required by the RFCs, so it is > questionable what the correct behaviour is. Being the first to do NFS over > TCP, I only did retransmits after reconnect. I think I described it that > way in the ancient Usenix paper. (http://snowhite.cis.uoguelph.ca/nfsv4, > then click on it) i will try to grab that. > When Sun first did NFS over TCP, I believe they did > do retries (using a conservative timeout). I think I eventually convinced Sun > that it wasn't a good idea and I think that Solaris no longer > does them, but I'm not sure. (For this to work correctly, a server is required > to disconnect whenever it can't generate a reply to an RPC over TCP for any > reason.) yes, this is a difficult semantic. it means that there is now a race that allows a server to redo a non-idempotent request if the client reconnects on another port and sends a retransmit of a stuck request. i've seen this in practice, and for certain applications this will cause data corruption. most Linux NFS clients will not reconnect on the same port after the server disconnects (a bug i recently addressed). for servers with a duplicate reply cache, this means the client can retransmit non-idempotent requests and the DRC will not stop the requests from being reapplied. such servers are dependent on identifying RPC requests by the tuple of [ XID, source port, client IP ] -- if source port changes, then the DRC is rendered ineffective. servers that don't have a DRC for TCP are exposed to this problem. when they disconnect the TCP connection, they've lost all stream transport guarantees (no request reordering, no duplicate requests). on reconnect a client can retransmit any requests it hasn't received a reply for, which are then reapplied by the server. if the server doesn't guarantee that these retransmitted requests are applied in the same order that the original requests were applied, there is opportunity for data corruption. retransmitting an idempotent request will cause a connection drop, meaning any non-idempotents requests that were outstanding at the time will have to be retransmitted. this is load dependent behavior. when a server slows down, a client that retransmits on TCP is more likely to retransmit one or more non-idempotent requests. this means the server will disconnect, creating even more work for server, network, and client, and it means the likelihood of data corruption increases as load increases. if a client *doesn't* retransmit, is there any guarantee that a hard-mounted client can make forward progress? > So, for NFSv2,3 I don't know of a stated "rule". I don't think it is covered > in the NFS interoperability RFC that appeared a while back, but can't > remember for sure. we've been looking for a while, but haven't seen anything. --------------090700060104010008020909--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?435026DA.5050101>