From owner-freebsd-current@FreeBSD.ORG  Thu Oct 29 20:03:46 2009
Return-Path: <owner-freebsd-current@FreeBSD.ORG>
Delivered-To: freebsd-current@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 8DED7106568B
	for <freebsd-current@freebsd.org>; Thu, 29 Oct 2009 20:03:46 +0000 (UTC)
	(envelope-from rmacklem@uoguelph.ca)
Received: from esa-annu.mail.uoguelph.ca (esa-annu.mail.uoguelph.ca
	[131.104.91.36])
	by mx1.freebsd.org (Postfix) with ESMTP id 441F08FC1A
	for <freebsd-current@freebsd.org>; Thu, 29 Oct 2009 20:03:46 +0000 (UTC)
X-IronPort-Anti-Spam-Filtered: true
X-IronPort-Anti-Spam-Result: AqoEABqS6UqDaFvJ/2dsb2JhbACQHgHPS4IzB4IDBIFi
X-IronPort-AV: E=Sophos;i="4.44,648,1249272000"; d="scan'208";a="51726886"
Received: from ganges.cs.uoguelph.ca ([131.104.91.201])
	by esa-annu-pri.mail.uoguelph.ca with ESMTP; 29 Oct 2009 16:03:44 -0400
Received: from localhost (localhost.localdomain [127.0.0.1])
	by ganges.cs.uoguelph.ca (Postfix) with ESMTP id 9BFBEFB8063;
	Thu, 29 Oct 2009 16:03:44 -0400 (EDT)
X-Virus-Scanned: amavisd-new at ganges.cs.uoguelph.ca
Received: from ganges.cs.uoguelph.ca ([127.0.0.1])
	by localhost (ganges.cs.uoguelph.ca [127.0.0.1]) (amavisd-new,
	port 10024)
	with ESMTP id 7at6sgRIAlBS; Thu, 29 Oct 2009 16:03:43 -0400 (EDT)
Received: from muncher.cs.uoguelph.ca (muncher.cs.uoguelph.ca [131.104.91.102])
	by ganges.cs.uoguelph.ca (Postfix) with ESMTP id 59D2BFB8042;
	Thu, 29 Oct 2009 16:03:43 -0400 (EDT)
Received: from localhost (rmacklem@localhost)
	by muncher.cs.uoguelph.ca (8.11.7p3+Sun/8.11.6) with ESMTP id
	n9TKAqP21213; Thu, 29 Oct 2009 16:10:52 -0400 (EDT)
X-Authentication-Warning: muncher.cs.uoguelph.ca: rmacklem owned process doing
	-bs
Date: Thu, 29 Oct 2009 16:10:52 -0400 (EDT)
From: Rick Macklem <rmacklem@uoguelph.ca>
X-X-Sender: rmacklem@muncher.cs.uoguelph.ca
To: freebsd-current@freebsd.org
Message-ID: <Pine.GSO.4.63.0910291601050.19312@muncher.cs.uoguelph.ca>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
Cc: O.Seibert@cs.ru.nl
Subject: NFS over TCP patch testing/review, please!!
X-BeenThere: freebsd-current@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussions about the use of FreeBSD-current
	<freebsd-current.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>, 
	<mailto:freebsd-current-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-current>
List-Post: <mailto:freebsd-current@freebsd.org>
List-Help: <mailto:freebsd-current-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>,
	<mailto:freebsd-current-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 29 Oct 2009 20:03:46 -0000

I think the following patch fixes the problem reported by O. Seibert
w.r.t. NFS over TCP taking 5min to reconnect to a server after a
period of inactivity. (I think there have been others bit by this,
but they were vague reports of trouble with NFS over TCP.) I didn't
see the problem, because I was mainly testing against a FreeBSD server
and/or using NFSv4 (NFSv4 does a Renew every 30sec, so the TCP
connection isn't inactive for long enough for a Solaris server to
disconnect it.)

clnt_vc_call() in sys/rpc/clnt_vc.c checks for the server closing
down the connection while the RPC is in progress, but doesn't check
to see if it has already happened. If it has already happened, there
would be no upcall to prompt a wakeup of the msleep() waiting for a
reply, etc. This patch adds a check for the connection being closed
by the server, just before queuing the request and sending it.
(I think this fixes the problem.)

What I really need is some people to test NFS over TCP with the
patch applied to their kernel. It doesn't matter if you aren't
seeing the problem (ie. using a FreeBSD server), since I am more
concerned with the patch breaking something else than fixing the
problem. (This seems serious enough that I'd like to try and get
a fix into 8.0, which is why I'm hoping some folks can test this
quickly?)

Thanks in advance for help with this, rick
--- patch for sys/rpc/clnt_vc.c ---
--- rpc/clnt_vc.c.sav	2009-10-28 15:44:20.000000000 -0400
+++ rpc/clnt_vc.c	2009-10-29 15:40:37.000000000 -0400
@@ -413,6 +413,22 @@

  	cr->cr_xid = xid;
  	mtx_lock(&ct->ct_lock);
+	/*
+	 * Check to see if the other end has already started to close down
+	 * the connection. The upcall will have set ct_error.re_status
+	 * to RPC_CANTRECV if this is the case.
+	 * If the other end starts to close down the connection after this
+	 * point, it will be detected later when cr_error is checked,
+	 * since the request is in the ct_pending queue.
+	 */
+	if (ct->ct_error.re_status == RPC_CANTRECV) {
+		if (errp != &ct->ct_error) {
+			errp->re_errno = ct->ct_error.re_errno;
+			errp->re_status = RPC_CANTRECV;
+		}
+		stat = RPC_CANTRECV;
+		goto out;
+	}
  	TAILQ_INSERT_TAIL(&ct->ct_pending, cr, cr_link);
  	mtx_unlock(&ct->ct_lock);