From owner-freebsd-questions  Thu Jan  6 11:23:48 2000
Delivered-To: freebsd-questions@freebsd.org
Received: from bomber.avantgo.com (ws1.avantgo.com [207.214.200.194])
	by hub.freebsd.org (Postfix) with ESMTP id 5A9FB156FE
	for <questions@freebsd.org>; Thu,  6 Jan 2000 11:23:44 -0800 (PST)
	(envelope-from scott@avantgo.com)
Received: from river ([10.0.128.30]) by bomber.avantgo.com
          (Netscape Messaging Server 3.5)  with SMTP id 276;
          Thu, 6 Jan 2000 11:19:42 -0800
Message-ID: <0ad901bf587b$6fde13f0$1e80000a@avantgo.com>
From: "Scott Hess" <scott@avantgo.com>
To: "Fabian Thylmann" <fthylmann@stats.net>, <questions@freebsd.org>
References: <004301bf58c6$9f33bc40$0593e289@oph.rwthaachen.de>
Subject: Re: Problem with not disappearing sockets.
Date: Thu, 6 Jan 2000 11:22:56 -0800
MIME-Version: 1.0
Content-Type: text/plain;
	charset="iso-8859-1"
Content-Transfer-Encoding: 7bit
X-Priority: 3
X-MSMail-Priority: Normal
X-Mailer: Microsoft Outlook Express 5.00.2314.1300
X-MimeOLE: Produced By Microsoft MimeOLE V5.00.2314.1300
Sender: owner-freebsd-questions@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

Fabian Thylmann <fthylmann@stats.net> wrote:
> I am having major problems with sockets that do never go away or that
take a
> VERY long time to do so at least. More than an hour as far as I can see.
>
> This causes my buffer space to run out and even ifconfig doesn't work
> anymore.
>
> most of the sockets sit in CLOSING state.
>
> On one of my servers for example I have almost 22000 closing sockets. And
> that number just climbs all the time.. Till it reaches about 60000 and
then
> the server has no more buffer space.

We get a similar problem, and as best we can tell, it's due to broken TCP
stacks which don't close their socket correctly, or perhaps due to
transient network problems.  In any case, there is a known problem in the
TCP protocol which can leave sockets in the CLOSING state if a client stops
sending packets at a certain point in the socket teardown sequence.
Unfortunately, it's against spec to have a timeout to fix this (I can go
look up a reference if you feel that's necessary.  I'm pretty sure it was
in Comer, or perhaps Stevans).

[We decided on the broken-stacks problem because we _only_ see this on
systems that receive connections from the Internet at large.  Certain
backend operations involve far more activity, but never acrue sockets in
this state.]

[BTW, could you please make suggestions as to how you managed tens of
thousands of sockets in this state?  Our machines always rebooted at around
1500.]

> Also, if there IS no way to change the timeout, there HAS to be a way to
> remove those sockets from the buffer, no? I really see no reason why that
> isn't possible.

We fixed the problem by doing '/sbin/sysctl -w
net.inet.tcp.always_keepalive=1' on the machines which receive Internet
connections.  After a couple hours of inactivity on the socket, keep-alive
packets will be sent, and if there is no response to them, the socket will
be closed out.  If you run this on a live server, be prepared for weird
things to happen as various connections get nuked - we noticed that Squid
lost connections to it's dnsserver processes if this command was run on a
system that had been up for awhile.  The solution was to tell the server
processes to do a graceful restart.

Later,
scott


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-questions" in the body of the message