From owner-freebsd-net@FreeBSD.ORG Sun Apr 20 09:43:20 2008 Return-Path: Delivered-To: net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id D6E02106566B; Sun, 20 Apr 2008 09:43:20 +0000 (UTC) (envelope-from rwatson@FreeBSD.org) Received: from cyrus.watson.org (cyrus.watson.org [209.31.154.42]) by mx1.freebsd.org (Postfix) with ESMTP id ABE778FC1A; Sun, 20 Apr 2008 09:43:20 +0000 (UTC) (envelope-from rwatson@FreeBSD.org) Received: from fledge.watson.org (fledge.watson.org [209.31.154.41]) by cyrus.watson.org (Postfix) with ESMTP id 4293246B7B; Sun, 20 Apr 2008 05:43:20 -0400 (EDT) Date: Sun, 20 Apr 2008 10:43:20 +0100 (BST) From: Robert Watson X-X-Sender: robert@fledge.watson.org To: Chris Pratt In-Reply-To: <382258DB-13B8-4108-B8F4-157F247A7E4B@hughes.net> Message-ID: <20080420103258.D67663@fledge.watson.org> References: <48087C98.8060600@delphij.net> <382258DB-13B8-4108-B8F4-157F247A7E4B@hughes.net> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: gnn@freebsd.org, d@delphij.net, net@freebsd.org Subject: Re: zonelimit issues... X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 20 Apr 2008 09:43:20 -0000 On Fri, 18 Apr 2008, Chris Pratt wrote: > Doesn't 7.0 fix this? I'd like to see an official definitive answer and all > I've been going on is that the problem description is no longer in the > errata. Unfortunately, bugs of this sort don't really "work" that way -- specific bugs are a property of a problem in code (or a problem in design), but what we have right now is a report of a symptom that might reflect zero or more specific bugs. It's unclear that the problem described in errata is the problem you've been experiencing, or that the (at least one) fixed bug with the same symptoms is that one you've been experiencing. For better or worse, the only way to really tell of a generic class of hang or wedging is fixed is to try out the new version and see. In most cases, "zonelimit" wedging reflects one of two things: (1) Inadequate resource allocation to the network stack or some other component, try tuning up the memory tunable for clusters (for example). (2) A memory leak in a network device driver or other network part, which needs to be debugged and fixed. On at least one prior occasion, there has been a bug in UMA itself that lead to getting stuck in zonelimit, and it's not impossible there's a scheduler sleep/wakeup bug that would lead to a similar symptom but for a different reason. In FreeBSD 7-STABLE, you can now use procstat -k to print kernel stack traces of user threads blocked in kernel, which may make diagnosing the general class of problem a bit easier without using a kernel debugger. "zonelimit" is the generic wait channel across all memory type and allocation paths, so doesn't reveal a lot about *which* limit is being hit. Using a kernel stack trace, we can see which specific memory type and allocation context is involved. Robert N M Watson Computer Laboratory University of Cambridge