Date: Thu, 5 Nov 2009 11:36:23 -0500 (EST) From: Rick Macklem <rmacklem@uoguelph.ca> To: freebsd-current@freebsd.org Subject: Re: Help needed: TCP Wizards (was 8.0-RC1 NFS client timeout issue) Message-ID: <Pine.GSO.4.63.0911051121340.5409@muncher.cs.uoguelph.ca> In-Reply-To: <4AF0B7DF.9030405@freebsd.org> References: <Pine.GSO.4.63.0911011644410.19276@muncher.cs.uoguelph.ca> <4AF0B7DF.9030405@freebsd.org>
next in thread | previous in thread | raw e-mail | index | archive | help
Rick Macklem wrote: > I can now reproduce what I think others were seeing as slow reconnects > when using NFSv3 over TCP against a server that disconnects inactive > TCP connections. I have had some luck figuring out what is going on > and can reproduce it fairly easily, but I really need help from someone > who understands the FreeBSD TCP stack. > Ok, I haven't made much progress on this, but here's what little I currently know about it. The problem occurs after a server has dropped an inactive TCP connection for an NFS over TCP mount (in my case a Solaris10 server). When the client does a new connection it, for some reason, sends a RST at almost exactly the same time as the first RPC request on the new TCP connection, causing the server to shut it down. Ok, things I now know don't affect this are: - doing the soshutdown(), soclose() on the old connection. I commented them out and it still happened. - Avoiding the sobind() on the new connection, done before the soconnect(). - Using a non-reserved port#. (The above tests shot down pretty well all the "theories" I could come up with.) The only thing I've found that avoids the problem: - putting a 2sec delay right before the soconnect() call. (A 1sec delay made it hard to reproduce and I've never reproduced it yet with a 2sec delay.) Not much of a fix, though. Now, here's where someone may be able to help? Grep'ng around, I found 4 places where the TCP stack called ip_output() (one in each of tcp_output.c, tcp_subr.c, tcp_timewait.c and tcp_syncache.c) and I put a printf like this just before them: if (flags & TH_RST) printf("sent a reset\n"); (The exact format varies for each, because of where the TCP header flags are and have different printf messages.) Now, the weird part is, that when the extraneous RST is sent to the server, I don't get any printf. (I do get a few of these, but at other times for what appear to be legitimate RSTs.) I can't see anywhere else that the TCP stack would send an RST and, so, I'm stuck w.r.t. figuring out what is sending them? Anyone know of another place the TCP stack would make the send happen? (Or is it queued earlier when I see the printf message, and then the send is "triggered" inside the ip layer when the first data is sent on the new connection?) rick, who is getting sick of looking at this:-)
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?Pine.GSO.4.63.0911051121340.5409>