Date: Thu, 29 Oct 2009 16:10:52 -0400 (EDT) From: Rick Macklem <rmacklem@uoguelph.ca> To: freebsd-current@freebsd.org Cc: O.Seibert@cs.ru.nl Subject: NFS over TCP patch testing/review, please!! Message-ID: <Pine.GSO.4.63.0910291601050.19312@muncher.cs.uoguelph.ca>
next in thread | raw e-mail | index | archive | help
I think the following patch fixes the problem reported by O. Seibert w.r.t. NFS over TCP taking 5min to reconnect to a server after a period of inactivity. (I think there have been others bit by this, but they were vague reports of trouble with NFS over TCP.) I didn't see the problem, because I was mainly testing against a FreeBSD server and/or using NFSv4 (NFSv4 does a Renew every 30sec, so the TCP connection isn't inactive for long enough for a Solaris server to disconnect it.) clnt_vc_call() in sys/rpc/clnt_vc.c checks for the server closing down the connection while the RPC is in progress, but doesn't check to see if it has already happened. If it has already happened, there would be no upcall to prompt a wakeup of the msleep() waiting for a reply, etc. This patch adds a check for the connection being closed by the server, just before queuing the request and sending it. (I think this fixes the problem.) What I really need is some people to test NFS over TCP with the patch applied to their kernel. It doesn't matter if you aren't seeing the problem (ie. using a FreeBSD server), since I am more concerned with the patch breaking something else than fixing the problem. (This seems serious enough that I'd like to try and get a fix into 8.0, which is why I'm hoping some folks can test this quickly?) Thanks in advance for help with this, rick --- patch for sys/rpc/clnt_vc.c --- --- rpc/clnt_vc.c.sav 2009-10-28 15:44:20.000000000 -0400 +++ rpc/clnt_vc.c 2009-10-29 15:40:37.000000000 -0400 @@ -413,6 +413,22 @@ cr->cr_xid = xid; mtx_lock(&ct->ct_lock); + /* + * Check to see if the other end has already started to close down + * the connection. The upcall will have set ct_error.re_status + * to RPC_CANTRECV if this is the case. + * If the other end starts to close down the connection after this + * point, it will be detected later when cr_error is checked, + * since the request is in the ct_pending queue. + */ + if (ct->ct_error.re_status == RPC_CANTRECV) { + if (errp != &ct->ct_error) { + errp->re_errno = ct->ct_error.re_errno; + errp->re_status = RPC_CANTRECV; + } + stat = RPC_CANTRECV; + goto out; + } TAILQ_INSERT_TAIL(&ct->ct_pending, cr, cr_link); mtx_unlock(&ct->ct_lock);
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?Pine.GSO.4.63.0910291601050.19312>