Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 20 Apr 2008 10:43:20 +0100 (BST)
From:      Robert Watson <rwatson@FreeBSD.org>
To:        Chris Pratt <eagletree@hughes.net>
Cc:        gnn@freebsd.org, d@delphij.net, net@freebsd.org
Subject:   Re: zonelimit issues...
Message-ID:  <20080420103258.D67663@fledge.watson.org>
In-Reply-To: <382258DB-13B8-4108-B8F4-157F247A7E4B@hughes.net>
References:  <m2hcdztsx2.wl%gnn@neville-neil.com> <48087C98.8060600@delphij.net> <382258DB-13B8-4108-B8F4-157F247A7E4B@hughes.net>

next in thread | previous in thread | raw e-mail | index | archive | help

On Fri, 18 Apr 2008, Chris Pratt wrote:

> Doesn't 7.0 fix this? I'd like to see an official definitive answer and all 
> I've been going on is that the problem description is no longer in the 
> errata.

Unfortunately, bugs of this sort don't really "work" that way -- specific bugs 
are a property of a problem in code (or a problem in design), but what we have 
right now is a report of a symptom that might reflect zero or more specific 
bugs.  It's unclear that the problem described in errata is the problem you've 
been experiencing, or that the (at least one) fixed bug with the same symptoms 
is that one you've been experiencing.  For better or worse, the only way to 
really tell of a generic class of hang or wedging is fixed is to try out the 
new version and see.  In most cases, "zonelimit" wedging reflects one of two 
things:

(1) Inadequate resource allocation to the network stack or some other
     component, try tuning up the memory tunable for clusters (for example).

(2) A memory leak in a network device driver or other network part, which
     needs to be debugged and fixed.

On at least one prior occasion, there has been a bug in UMA itself that lead 
to getting stuck in zonelimit, and it's not impossible there's a scheduler 
sleep/wakeup bug that would lead to a similar symptom but for a different 
reason.

In FreeBSD 7-STABLE, you can now use procstat -k to print kernel stack traces 
of user threads blocked in kernel, which may make diagnosing the general class 
of problem a bit easier without using a kernel debugger.  "zonelimit" is the 
generic wait channel across all memory type and allocation paths, so doesn't 
reveal a lot about *which* limit is being hit.  Using a kernel stack trace, we 
can see which specific memory type and allocation context is involved.

Robert N M Watson
Computer Laboratory
University of Cambridge



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20080420103258.D67663>