From owner-freebsd-net@FreeBSD.ORG Sun Apr 20 17:04:39 2008 Return-Path: Delivered-To: net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 2C2B61065672 for ; Sun, 20 Apr 2008 17:04:39 +0000 (UTC) (envelope-from eagletree@hughes.net) Received: from n016.sc0.he.tucows.com (smtpout1120.sc0.he.tucows.com [64.97.144.120]) by mx1.freebsd.org (Postfix) with ESMTP id 00D6F8FC19 for ; Sun, 20 Apr 2008 17:04:38 +0000 (UTC) (envelope-from eagletree@hughes.net) Received: from sc0-out06.emaildefenseservice.com (64.97.131.2) by n016.sc0.he.tucows.com (7.2.078) id 4794C3EA014D2C98; Sun, 20 Apr 2008 17:04:38 +0000 X-SpamScore: 2 X-Spamcatcher-Summary: 2, 0, 0, ff8509a8af3eee98, fe71ce1fe1bfd227, eagletree@hughes.net, -, RULES_HIT:355:379:541:564:599:601:945:966:967:973:980:982:988:989:1260:1261:1277:1311:1313:1314:1345:1359:1437:1515:1516:1518:1535:1544:1593:1594:1605:1711:1730:1747:1766:1792:1981:2194:2196:2198:2199:2200:2201:2378:2379:2393:2525:2553:2559:2563:2682:2685:2693:2857:2859:2933:2937:2939:2942:2945:2947:2951:2954:3022:3027:3636:3865:3866:3867:3868:3869:3870:3871:3872:3873:3874:3934:3936:3938:3941: 3944:3947:4117:4250:4385:4860:5007:6119:7652:7679:7904, 0, RBL:none, CacheIP:none, Bayesian:0.5, 0.5, 0.5, Netcheck:none, DomainCache:0, MSF:not bulk, SPF:, MSBL:none, DNSBL:none, TSO:0 X-Spamcatcher-Explanation: Received: from [192.168.0.3] (dpc6744118153.direcpc.com [67.44.118.153]) (Authenticated sender: eagletree@hughes.net) by sc0-out06.emaildefenseservice.com (Postfix) with ESMTP; Sun, 20 Apr 2008 17:04:30 +0000 (UTC) In-Reply-To: <20080420103258.D67663@fledge.watson.org> References: <48087C98.8060600@delphij.net> <382258DB-13B8-4108-B8F4-157F247A7E4B@hughes.net> <20080420103258.D67663@fledge.watson.org> Mime-Version: 1.0 (Apple Message framework v753) Content-Type: text/plain; charset=US-ASCII; delsp=yes; format=flowed Message-Id: <33AC96BF-B9AC-4303-9597-80BC341B7309@hughes.net> Content-Transfer-Encoding: 7bit From: Chris Pratt Date: Sun, 20 Apr 2008 09:53:49 -0700 To: Robert Watson X-Mailer: Apple Mail (2.753) Cc: net@freebsd.org Subject: Re: zonelimit issues... X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 20 Apr 2008 17:04:39 -0000 On Apr 20, 2008, at 2:43 AM, Robert Watson wrote: > > On Fri, 18 Apr 2008, Chris Pratt wrote: > >> Doesn't 7.0 fix this? I'd like to see an official definitive >> answer and all I've been going on is that the problem description >> is no longer in the errata. > > Unfortunately, bugs of this sort don't really "work" that way -- > specific bugs are a property of a problem in code (or a problem in > design), but what we have right now is a report of a symptom that > might reflect zero or more specific bugs. It's unclear that the > problem described in errata is the problem you've been > experiencing, or that the (at least one) fixed bug with the same > symptoms is that one you've been experiencing. For better or > worse, the only way to really tell of a generic class of hang or > wedging is fixed is to try out the new version and see. In most > cases, "zonelimit" wedging reflects one of two things: > > (1) Inadequate resource allocation to the network stack or some other > component, try tuning up the memory tunable for clusters (for > example). > For several months I did quite a bit of tuning. I never increased nmbclusters beyond the 32768 shown in the docs because man tuning doesn't define it's use of "arbitrarily high". Inability to boot could mean travel. Kris Kenneway had provided instructions to get a dump. I set up for that but have never had a dump. The only respite came from adding another circuit, another NIC and spreading traffic. We increased our lock time from every couple of days during the heavy bot period of late 2006 to now every month or during traditionally slow months, even two months. For example, we ran a record 72 days last summer. It was a very dead summer traffic wise. I will try to increase the nmbclusters dramatically if I can figure out what a safe top limit is but it sounds like the jump to 7.0 RELEASE may be worth the effort. I would want to wait until this issue with TCP, Windows and certain routers is well past. I had not seen that applied to 7_0_0 yet and that would be a show stopper. Is there a way to know what is safe for nmbclusters given an 8GB ram system? I did vmstats data collection for a couple of months when things were at their worst. The results were nebulous to me based on lack of code knowledge. All I actually found was that a certain counter would drop to 0 and never recover. I didn't know if it was meaningful and received no replies when I asked FreeBSD-Questions. It was 128-Bucket or something like that. > (2) A memory leak in a network device driver or other network part, > which > needs to be debugged and fixed. > Initially I thought there may be something related to the bge driver and moved the high traffic apps on an em. This didn't seem to help much, nor did polling. I am most willing to collect data if I could figure out how to collect something meaningful. I gather from what you say, that 7.0 would provide this. I really appreciate both of your responses. Just based on this one problem, 6.x has been a bad experience after years of seemingly impossible uptime on 4 and 5.x FreeBSD. > On at least one prior occasion, there has been a bug in UMA itself > that lead to getting stuck in zonelimit, and it's not impossible > there's a scheduler sleep/wakeup bug that would lead to a similar > symptom but for a different reason. > > In FreeBSD 7-STABLE, you can now use procstat -k to print kernel > stack traces of user threads blocked in kernel, which may make > diagnosing the general class of problem a bit easier without using > a kernel debugger. "zonelimit" is the generic wait channel across > all memory type and allocation paths, so doesn't reveal a lot about > *which* limit is being hit. Using a kernel stack trace, we can see > which specific memory type and allocation context is involved. > > Robert N M Watson > Computer Laboratory > University of Cambridge > _______________________________________________ > freebsd-net@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-net > To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org"