From owner-freebsd-net@FreeBSD.ORG Sun Jun 17 19:29:14 2007 Return-Path: X-Original-To: net@freebsd.org Delivered-To: freebsd-net@FreeBSD.ORG Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 96B3116A4FC for ; Sun, 17 Jun 2007 19:29:14 +0000 (UTC) (envelope-from maxim@macomnet.ru) Received: from mp2.macomnet.net (mp2.macomnet.net [195.128.64.6]) by mx1.freebsd.org (Postfix) with ESMTP id 3045913C458 for ; Sun, 17 Jun 2007 19:29:13 +0000 (UTC) (envelope-from maxim@macomnet.ru) Received: from localhost (localhost.int.ru [127.0.0.1] (may be forged)) by mp2.macomnet.net (8.13.7/8.13.8) with ESMTP id l5HJT64m078532; Sun, 17 Jun 2007 23:29:06 +0400 (MSD) (envelope-from maxim@macomnet.ru) Date: Sun, 17 Jun 2007 23:29:05 +0400 (MSD) From: Maxim Konovalov To: "M. Warner Losh" In-Reply-To: <20070617.130238.-1435629453.imp@bsdimp.com> Message-ID: <20070617232404.U73282@mp2.macomnet.net> References: <20070617.114133.778151882.imp@bsdimp.com> <46757818.5030005@joeholden.co.uk> <20070617.130238.-1435629453.imp@bsdimp.com> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Cc: joe@joeholden.co.uk, net@freebsd.org Subject: Re: Issue with huge numbers of connections X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 17 Jun 2007 19:29:14 -0000 On Sun, 17 Jun 2007, 13:02-0600, M. Warner Losh wrote: > In message: <46757818.5030005@joeholden.co.uk> > Joe Holden writes: > : M. Warner Losh wrote: > : > Greetings, > : > > : > I have a friend who is having problems with a service he's running. > : > He gets billions and billions of connections to this service a day. > : > Somewhere between 10^8 and 10^9 connections, he notices that his > : > servers lose the ability to accept new connections. These are TCP > : > connections. > : > > : > This is with FreeBSD 6.1R. My first question is: does anybody know if > : > the fixes to -current/7.0 have fixed this? Is there a fix that can be > : > back ported? He's currently working around the problem by having a > : > number of different machines that reboot in a round robin fashion, but > : > would like a better solution. > : > > : > Warner > : > _______________________________________________ > : Warner, if he hasn't done so already, have you suggested tweaking the > : sysctl variables, such as: > : kern.maxfilesperproc > : kern.ipc.nmbclusters > : kern.maxprocperuid > : kern.maxfiles > : kern.ipc.somaxconn > : kern.maxvnodes > : > : Tweaking those may help, or he may just be exhausting available > : resources, IIRC its limited to 65k connections per interface, someone > : correct me if I am wrong. > > Here's the bug report I got: > > There is still a vague problem with the FreeBSD network interface -- > especially the part that handles TCP. Something strange happens after > about a week or so (after handling about 10^8 or 10^9 > connections). The system becomes unreachable for TCP connections. I > have fixed this problem by having all of the FreeBSD systems reboot > automatically once a week using a cron job. I have not been able to > isolate this issue, but I suspect that there is some kind of problem > with the error handling and some resource gets depleted slowly. I > realize that this is pretty vague, but I have not been able to find > out what actually happens in this case. > > I believe that each connection lasts on the order of tens or > hundreds milliseconds, given what I know about the systems in place. > My earlier rephrase omitted a few key points. I suggested that he > try to use a newer version of FreeBSD, but since these are a > production system, he's hesitant to mess with them... > > Doing the math on 10^9 connections in a week translates to ~1650/s, > so we'd expect there are on the order of 100-200 connections steady > state at any time. I suspect that the peak load may be up to 100 > times that, which is still only 20000 connections. The hangs don't > seem to hang at a peak, but randomly. > > Given all that, I'm not sure which of the above to try. > There are several obvious sysctls can affect: net.inet.ip.portrange.randomized, net.inet.ip.portrange.*. We definitly need more debug info: vmstat -zm, netstat -anp tcp, netstat -m, sysctl net.inet from his system. It would be nice if he gives a shell to the problem box. -- Maxim Konovalov