From owner-freebsd-net@FreeBSD.ORG Tue Jan 11 02:23:45 2005 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 5ED9B16A4CF for ; Tue, 11 Jan 2005 02:23:45 +0000 (GMT) Received: from electra.nolink.net (electra.nolink.net [195.139.204.207]) by mx1.FreeBSD.org (Postfix) with ESMTP id 30BDC43D1F for ; Tue, 11 Jan 2005 02:23:44 +0000 (GMT) (envelope-from lerik@nolink.net) Received: (qmail 92385 invoked by uid 1000); 11 Jan 2005 02:23:42 -0000 Received: from localhost (sendmail-bs@127.0.0.1) by localhost with SMTP; 11 Jan 2005 02:23:42 -0000 Date: Tue, 11 Jan 2005 03:23:42 +0100 (CET) From: Lars Erik Gullerud To: Len Conrad In-Reply-To: <6.1.1.1.2.20050110103857.045a9a68@81.255.84.73> Message-ID: <20050111025252.L88996@electra.nolink.net> References: <6.1.1.1.2.20050110103857.045a9a68@81.255.84.73> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed cc: freebsd-net@freebsd.org Subject: Re: buildup of Windows time_wait talking to fbsd 4.10 X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 11 Jan 2005 02:23:45 -0000 On Mon, 10 Jan 2005, Len Conrad wrote: > We have a windows mailserver that relays its outbound to a fbsd gateway. We > changed to a different fbsd gateway running 4.10. Windows then began having > trouble sending to 4.10. Windows "netstat -an" shows dozens of lines like > this: > > source IP desitination IP > ====================================================================== > TCP 10.1.16.3:1403 192.168.200.59:25 TIME_WAIT [snip] > Eventually, the windows SMTP logs line like "cannot connect to remote IP" or > "address already in use" because no local tcp/ip sockets are available, we > think. > > The new gateway/fbsd 4.10 "sockstat -4" shows no corresponding tcp > connections when the Windows server is showing as above. On the fbsd 4.10 > machines, smtp logs, syslog, and dmesg show no errors. > > We switch the windows box to smtp gateway towards the old box/fbsd 4.7, all > is cool. OK, let me play a wild hunch here - if you look at netstat -na output on the 4.7 machine (the one that works) when you are using that one, you see a large number of connections in the TIME_WAIT state on that side, while none on the Windows-server? I had a similar situation with an application we use that also opens a large number of TCP sessions from a Windows server to a FreeBSD server - that suddenly stopped working when the application in question was upgraded on the server it connected to. In our case, it turns it it was a timing issue that changed on the new version of the application. When a TCP connection is closing, one side of the connection typically initiates the close, and sends a FIN,ACK packet to the other side. After going through the steps of closing down the socket, the side that initiated the close, will leave the socket in TIME-WAIT state for 2 MSL (Maximum Segment Lifetime - which defaults to 2 mins, so 4 min wait) - while the other end transitions to CLOSED state (and tears down the socket) immediately, without this wait period. (The exception being if both ends send FIN,ACK at the same time, in which case they both go to TIME-WAIT). What happened with in our case, on the old version of the application, was that as soon as the client started to log off the session, the server-side application (on the FreeBSD server) would initiate closing of the TCP-session, and thereby being the originator (and getting a large number of sessions in TIME-WAIT - which was not a problem for the BSD box). While the Windows machine closed it's socket immediately and was happy all the time. However, after we upgraded the application, when the client logged off at the application level, the server-side app would first take 2-3 seconds to process various shutdown-related activities, and the client end (on the Windows machine) got "impatient" and initiated the TCP session close from it's side. Leaving all the TIME-WAIT sockets hanging on the Windows side, rather than the FreeBSD side. Now, newer versions of Windows have a ridiculously low number of max simultaneous connections configured, and we started seeing exactly the same kinds of errors you are describing, due to a large number of TIME-WAIT sockets. We had to adjust the server-side application to tear down the TCP socket first, THEN do its internal shutdown processing, in order to not leave the Windows client in a jam. The alternative was to increase the number of simultaneous connections on the Windows machine, which involves some registry black magic, and we found this to be the easier way out (then - we will probably hack the Windows regkeys if we start seeing the issue again). You didn't mention what MTA you are using, so I don't know if this is a similar (application-level) issue, or if it's FreeBSD 4.10 that causes some additional delay before initiating a TCP CLOSE, but either way, this might be the behaviour you are observing, in which case you will need to figure out how to get the FreeBSD side to tear down the connection, or preferably you should look at tuning some registry stuff on your Windows server - like setting the MSL time (default 2 minutes) to a much lower value, and perhaps upping the no. of max simultaneous connections. HTH, /leg