From owner-freebsd-fs@FreeBSD.ORG Sat Oct 15 18:41:44 2005 Return-Path: X-Original-To: fs@freebsd.org Delivered-To: freebsd-fs@FreeBSD.ORG Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 887CA16A420 for ; Sat, 15 Oct 2005 18:41:44 +0000 (GMT) (envelope-from rick@snowhite.cis.uoguelph.ca) Received: from ccshst09.cs.uoguelph.ca (ccshst09.cs.uoguelph.ca [131.104.96.18]) by mx1.FreeBSD.org (Postfix) with ESMTP id 2CF1D43D4C for ; Sat, 15 Oct 2005 18:41:43 +0000 (GMT) (envelope-from rick@snowhite.cis.uoguelph.ca) Received: from snowhite.cis.uoguelph.ca (snowhite.cis.uoguelph.ca [131.104.48.1]) by ccshst09.cs.uoguelph.ca (8.13.1/8.13.1) with ESMTP id j9FIfcAc023938; Sat, 15 Oct 2005 14:41:38 -0400 Received: (from rick@localhost) by snowhite.cis.uoguelph.ca (8.9.3/8.9.3) id OAA37331; Sat, 15 Oct 2005 14:43:01 -0400 (EDT) Date: Sat, 15 Oct 2005 14:43:01 -0400 (EDT) From: rick@snowhite.cis.uoguelph.ca Message-Id: <200510151843.OAA37331@snowhite.cis.uoguelph.ca> To: fs@freebsd.org X-Scanned-By: MIMEDefang 2.52 on 131.104.96.18 Cc: Subject: FreeBSD NFS server not responding to TCP SYN packets from Linux/SunOS clients X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 15 Oct 2005 18:41:44 -0000 >> When Sun first did NFS over TCP, I believe they did >> do retries (using a conservative timeout). I think I eventually convinced Sun >> that it wasn't a good idea and I think that Solaris no longer >> does them, but I'm not sure. (For this to work correctly, a server is required >> to disconnect whenever it can't generate a reply to an RPC over TCP for any >> reason.) > >yes, this is a difficult semantic. For v3,4 it shouldn't be necessary, except in extreme circumstances, since the server can always just reply NFSERR_DELAY. For v2, I'd be tempted to discourage v2 over TCP, arguing that v2 is just there for old clients that can't do anything else and let them use UDP. In other words, NFSERR_DELAY is your friend:-) > it means that there is now a race that allows a server to redo a > non-idempotent request if the client reconnects on another port and > sends a retransmit of a stuck request. i've seen this in practice, and > for certain applications this will cause data corruption. > > most Linux NFS clients will not reconnect on the same port after the > server disconnects (a bug i recently addressed). for servers with a > duplicate reply cache, this means the client can retransmit > non-idempotent requests and the DRC will not stop the requests from > being reapplied. such servers are dependent on identifying RPC requests > by the tuple of [ XID, source port, client IP ] -- if source port > changes, then the DRC is rendered ineffective. I'd argue that the DRC shouldn't depend on the same port#. (It can even be argued that it shouldn't depend on same client host IP#, since they can change dynamically via dhcp, etc.) I think you'll find a very brief (and crappy) description of what I use for my current DRC on the ftp site (ftp.cis.uoguelph.ca/pub/nfsv4/server-cache.algorithm and some notes in ftp.cis.uoguelph.ca/pub/nfsv4/doc.tar.gz). Basically, it uses XID, plus a checksum of the first N bytes of the request and a few other checks. [good stuff snipped] > if a client *doesn't* retransmit, is there any guarantee that a > hard-mounted client can make forward progress? Probably not. But I don't think it has been a problem, in practice, for FreeBSD? (I suspect that servers only fail to reply to requests when they are "dead in the water".) The BSD server never drops a request in progress. It does MGET()s and MALLOC()s with M_WAITOK. The problem is that most BSDen are pretty well toast by the time this happens. I am thinking that I should change the server to use M_NOWAIT and then return NFSERR_DELAY when it gets a NULL ptr. (For v2, only allow UDP and drop the request.) But I haven't gotten around to coding it. (Lots of cases where NULL ptrs have to be checked for--> lots of work:-) rick