From owner-freebsd-fs@FreeBSD.ORG Fri Oct 14 16:54:27 2005 Return-Path: X-Original-To: fs@freebsd.org Delivered-To: freebsd-fs@FreeBSD.ORG Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 1568516A41F for ; Fri, 14 Oct 2005 16:54:27 +0000 (GMT) (envelope-from rick@snowhite.cis.uoguelph.ca) Received: from dargo.cs.uoguelph.ca (dargo.cs.uoguelph.ca [131.104.96.159]) by mx1.FreeBSD.org (Postfix) with ESMTP id 8A96043D46 for ; Fri, 14 Oct 2005 16:54:26 +0000 (GMT) (envelope-from rick@snowhite.cis.uoguelph.ca) Received: from snowhite.cis.uoguelph.ca (snowhite.cis.uoguelph.ca [131.104.48.1]) by dargo.cs.uoguelph.ca (8.13.1/8.13.1) with ESMTP id j9EGsPPr027724 for ; Fri, 14 Oct 2005 12:54:25 -0400 Received: (from rick@localhost) by snowhite.cis.uoguelph.ca (8.9.3/8.9.3) id MAA22902 for fs@freebsd.org; Fri, 14 Oct 2005 12:55:49 -0400 (EDT) Date: Fri, 14 Oct 2005 12:55:49 -0400 (EDT) From: rick@snowhite.cis.uoguelph.ca Message-Id: <200510141655.MAA22902@snowhite.cis.uoguelph.ca> To: fs@freebsd.org X-Scanned-By: MIMEDefang 2.52 on 131.104.96.159 Cc: Subject: FreeBSD NFS server not responding to TCP SYN packets from Linux/SunOS clients X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 14 Oct 2005 16:54:27 -0000 >> rick >> ps: It would be nice if someone with the right expertise could explore >> other things in TCP specifically for NFS. For example, I don't see >> why a retransmit timeout should go above about 100msec, since net >> delays are well below that level, even half way around the world >> these days. Having said that, I don't know enough about TCP retransmit >> to say that one second retry intervals aren't correct? > >Wouldn't this be a problem for a server under high disk load? If the >disks are very very busy, and clients are requesting stat's on files, >etc, then the server would be waiting on disk, and the time could be way >more than 100ms, even more than 1s. Of course, this would be a slow >server because of the load, however it does occur, and so lowering it to >100msec might be too aggresive. If you have many many clients, all >attempting lots of NFS activity, during times of load you could make the >server even more overloaded with all the retransmits, right? It is a concern. If the previously sent request is still in the server's TCP socket receive queue, then TCP will throw away the retransmit. If the request is in progress via an nfsd thread, then the recent request cache code should wait for the reply created from the first one and then both requests get copies of the reply. (This introduces overhead, but at least no additional disk I/O or risk of repeating a non-idempotent request.) nb: My current server cache code does this, but I don't believe the one currently in FreeBSD does? The trick is to not have the nfsd threads remove a request from the socket receive queue until the disk subsystem isn't backlogged. Since delayed ACK is disabled for NFS over TCP, the server will then throw away the retransitted request and generate an ACK to the client right away, so the TCP layer in the client won't retransmit it again. The problem is "how do you make sure the nfsd threads don't start a request if the disk I/O subsystem is backlogged". An interesting question and I'd appreciate hearing suggestions. Part of the problem is that many requests can be satified out of caches in the server (such as the vnode/inode in memory, for a Getattr) and the server doesn't know if a request will be doing disk I/O (it's hidden behind the VFS/Vnode layer). One possibility is for nfsd threads to time how long they take to do a request. When the thread sees that time increasing dramatically, it could assume a backlog in the disk I/O subsystem and sleep for a while before getting the next request off a socket receive queue. Sounds like something worth looking at. Unfortunately I think it will require a pretty high resolution time clock and I don't think I can count on that in FreeBSD unless the server has the right hardware? Any other ideas? rick