Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 5 Nov 2009 11:36:23 -0500 (EST)
From:      Rick Macklem <rmacklem@uoguelph.ca>
To:        freebsd-current@freebsd.org
Subject:   Re: Help needed: TCP Wizards (was 8.0-RC1 NFS client timeout issue)
Message-ID:  <Pine.GSO.4.63.0911051121340.5409@muncher.cs.uoguelph.ca>
In-Reply-To: <4AF0B7DF.9030405@freebsd.org>
References:  <Pine.GSO.4.63.0911011644410.19276@muncher.cs.uoguelph.ca> <4AF0B7DF.9030405@freebsd.org>

next in thread | previous in thread | raw e-mail | index | archive | help


Rick Macklem wrote:
> I can now reproduce what I think others were seeing as slow reconnects
> when using NFSv3 over TCP against a server that disconnects inactive
> TCP connections. I have had some luck figuring out what is going on
> and can reproduce it fairly easily, but I really need help from someone
> who understands the FreeBSD TCP stack.
>
Ok, I haven't made much progress on this, but here's what little I
currently know about it.

The problem occurs after a server has dropped an inactive TCP connection
for an NFS over TCP mount (in my case a Solaris10 server). When the client
does a new connection it, for some reason, sends a RST at almost exactly
the same time as the first RPC request on the new TCP connection, causing
the server to shut it down.

Ok, things I now know don't affect this are:
- doing the soshutdown(), soclose() on the old connection. I commented
   them out and it still happened.
- Avoiding the sobind() on the new connection, done before the
   soconnect().
- Using a non-reserved port#.
(The above tests shot down pretty well all the "theories" I could come up
with.)

The only thing I've found that avoids the problem:
- putting a 2sec delay right before the soconnect() call. (A 1sec delay
   made it hard to reproduce and I've never reproduced it yet with a 2sec
   delay.)
   Not much of a fix, though.

Now, here's where someone may be able to help?

Grep'ng around, I found 4 places where the TCP stack called ip_output()
(one in each of tcp_output.c, tcp_subr.c, tcp_timewait.c and 
tcp_syncache.c) and I put a printf like this just before them:
 	if (flags & TH_RST)
 		printf("sent a reset\n");

 	(The exact format varies for each, because of where the TCP
          header flags are and have different printf messages.)

Now, the weird part is, that when the extraneous RST is sent to the
server, I don't get any printf. (I do get a few of these, but at other
times for what appear to be legitimate RSTs.)

I can't see anywhere else that the TCP stack would send an RST and, so,
I'm stuck w.r.t. figuring out what is sending them?

Anyone know of another place the TCP stack would make the send happen?
(Or is it queued earlier when I see the printf message, and then the
send is "triggered" inside the ip layer when the first data is sent on
the new connection?)

rick, who is getting sick of looking at this:-)




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?Pine.GSO.4.63.0911051121340.5409>