From owner-freebsd-questions Thu Jan 6 11:23:48 2000 Delivered-To: freebsd-questions@freebsd.org Received: from bomber.avantgo.com (ws1.avantgo.com [207.214.200.194]) by hub.freebsd.org (Postfix) with ESMTP id 5A9FB156FE for ; Thu, 6 Jan 2000 11:23:44 -0800 (PST) (envelope-from scott@avantgo.com) Received: from river ([10.0.128.30]) by bomber.avantgo.com (Netscape Messaging Server 3.5) with SMTP id 276; Thu, 6 Jan 2000 11:19:42 -0800 Message-ID: <0ad901bf587b$6fde13f0$1e80000a@avantgo.com> From: "Scott Hess" To: "Fabian Thylmann" , References: <004301bf58c6$9f33bc40$0593e289@oph.rwthaachen.de> Subject: Re: Problem with not disappearing sockets. Date: Thu, 6 Jan 2000 11:22:56 -0800 MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 5.00.2314.1300 X-MimeOLE: Produced By Microsoft MimeOLE V5.00.2314.1300 Sender: owner-freebsd-questions@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG Fabian Thylmann wrote: > I am having major problems with sockets that do never go away or that take a > VERY long time to do so at least. More than an hour as far as I can see. > > This causes my buffer space to run out and even ifconfig doesn't work > anymore. > > most of the sockets sit in CLOSING state. > > On one of my servers for example I have almost 22000 closing sockets. And > that number just climbs all the time.. Till it reaches about 60000 and then > the server has no more buffer space. We get a similar problem, and as best we can tell, it's due to broken TCP stacks which don't close their socket correctly, or perhaps due to transient network problems. In any case, there is a known problem in the TCP protocol which can leave sockets in the CLOSING state if a client stops sending packets at a certain point in the socket teardown sequence. Unfortunately, it's against spec to have a timeout to fix this (I can go look up a reference if you feel that's necessary. I'm pretty sure it was in Comer, or perhaps Stevans). [We decided on the broken-stacks problem because we _only_ see this on systems that receive connections from the Internet at large. Certain backend operations involve far more activity, but never acrue sockets in this state.] [BTW, could you please make suggestions as to how you managed tens of thousands of sockets in this state? Our machines always rebooted at around 1500.] > Also, if there IS no way to change the timeout, there HAS to be a way to > remove those sockets from the buffer, no? I really see no reason why that > isn't possible. We fixed the problem by doing '/sbin/sysctl -w net.inet.tcp.always_keepalive=1' on the machines which receive Internet connections. After a couple hours of inactivity on the socket, keep-alive packets will be sent, and if there is no response to them, the socket will be closed out. If you run this on a live server, be prepared for weird things to happen as various connections get nuked - we noticed that Squid lost connections to it's dnsserver processes if this command was run on a system that had been up for awhile. The solution was to tell the server processes to do a graceful restart. Later, scott To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-questions" in the body of the message