From owner-freebsd-current@FreeBSD.ORG Thu Oct 29 20:03:46 2009 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 8DED7106568B for ; Thu, 29 Oct 2009 20:03:46 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from esa-annu.mail.uoguelph.ca (esa-annu.mail.uoguelph.ca [131.104.91.36]) by mx1.freebsd.org (Postfix) with ESMTP id 441F08FC1A for ; Thu, 29 Oct 2009 20:03:46 +0000 (UTC) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AqoEABqS6UqDaFvJ/2dsb2JhbACQHgHPS4IzB4IDBIFi X-IronPort-AV: E=Sophos;i="4.44,648,1249272000"; d="scan'208";a="51726886" Received: from ganges.cs.uoguelph.ca ([131.104.91.201]) by esa-annu-pri.mail.uoguelph.ca with ESMTP; 29 Oct 2009 16:03:44 -0400 Received: from localhost (localhost.localdomain [127.0.0.1]) by ganges.cs.uoguelph.ca (Postfix) with ESMTP id 9BFBEFB8063; Thu, 29 Oct 2009 16:03:44 -0400 (EDT) X-Virus-Scanned: amavisd-new at ganges.cs.uoguelph.ca Received: from ganges.cs.uoguelph.ca ([127.0.0.1]) by localhost (ganges.cs.uoguelph.ca [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 7at6sgRIAlBS; Thu, 29 Oct 2009 16:03:43 -0400 (EDT) Received: from muncher.cs.uoguelph.ca (muncher.cs.uoguelph.ca [131.104.91.102]) by ganges.cs.uoguelph.ca (Postfix) with ESMTP id 59D2BFB8042; Thu, 29 Oct 2009 16:03:43 -0400 (EDT) Received: from localhost (rmacklem@localhost) by muncher.cs.uoguelph.ca (8.11.7p3+Sun/8.11.6) with ESMTP id n9TKAqP21213; Thu, 29 Oct 2009 16:10:52 -0400 (EDT) X-Authentication-Warning: muncher.cs.uoguelph.ca: rmacklem owned process doing -bs Date: Thu, 29 Oct 2009 16:10:52 -0400 (EDT) From: Rick Macklem X-X-Sender: rmacklem@muncher.cs.uoguelph.ca To: freebsd-current@freebsd.org Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: O.Seibert@cs.ru.nl Subject: NFS over TCP patch testing/review, please!! X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 29 Oct 2009 20:03:46 -0000 I think the following patch fixes the problem reported by O. Seibert w.r.t. NFS over TCP taking 5min to reconnect to a server after a period of inactivity. (I think there have been others bit by this, but they were vague reports of trouble with NFS over TCP.) I didn't see the problem, because I was mainly testing against a FreeBSD server and/or using NFSv4 (NFSv4 does a Renew every 30sec, so the TCP connection isn't inactive for long enough for a Solaris server to disconnect it.) clnt_vc_call() in sys/rpc/clnt_vc.c checks for the server closing down the connection while the RPC is in progress, but doesn't check to see if it has already happened. If it has already happened, there would be no upcall to prompt a wakeup of the msleep() waiting for a reply, etc. This patch adds a check for the connection being closed by the server, just before queuing the request and sending it. (I think this fixes the problem.) What I really need is some people to test NFS over TCP with the patch applied to their kernel. It doesn't matter if you aren't seeing the problem (ie. using a FreeBSD server), since I am more concerned with the patch breaking something else than fixing the problem. (This seems serious enough that I'd like to try and get a fix into 8.0, which is why I'm hoping some folks can test this quickly?) Thanks in advance for help with this, rick --- patch for sys/rpc/clnt_vc.c --- --- rpc/clnt_vc.c.sav 2009-10-28 15:44:20.000000000 -0400 +++ rpc/clnt_vc.c 2009-10-29 15:40:37.000000000 -0400 @@ -413,6 +413,22 @@ cr->cr_xid = xid; mtx_lock(&ct->ct_lock); + /* + * Check to see if the other end has already started to close down + * the connection. The upcall will have set ct_error.re_status + * to RPC_CANTRECV if this is the case. + * If the other end starts to close down the connection after this + * point, it will be detected later when cr_error is checked, + * since the request is in the ct_pending queue. + */ + if (ct->ct_error.re_status == RPC_CANTRECV) { + if (errp != &ct->ct_error) { + errp->re_errno = ct->ct_error.re_errno; + errp->re_status = RPC_CANTRECV; + } + stat = RPC_CANTRECV; + goto out; + } TAILQ_INSERT_TAIL(&ct->ct_pending, cr, cr_link); mtx_unlock(&ct->ct_lock);