From owner-freebsd-net@FreeBSD.ORG  Sun Apr 20 09:32:26 2008
Return-Path: <owner-freebsd-net@FreeBSD.ORG>
Delivered-To: net@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 79451106564A;
	Sun, 20 Apr 2008 09:32:26 +0000 (UTC)
	(envelope-from rwatson@FreeBSD.org)
Received: from cyrus.watson.org (cyrus.watson.org [209.31.154.42])
	by mx1.freebsd.org (Postfix) with ESMTP id 52FAC8FC0A;
	Sun, 20 Apr 2008 09:32:26 +0000 (UTC)
	(envelope-from rwatson@FreeBSD.org)
Received: from fledge.watson.org (fledge.watson.org [209.31.154.41])
	by cyrus.watson.org (Postfix) with ESMTP id 0745A46B03;
	Sun, 20 Apr 2008 05:32:26 -0400 (EDT)
Date: Sun, 20 Apr 2008 10:32:25 +0100 (BST)
From: Robert Watson <rwatson@FreeBSD.org>
X-X-Sender: robert@fledge.watson.org
To: gnn@freebsd.org
In-Reply-To: <m2hcdztsx2.wl%gnn@neville-neil.com>
Message-ID: <20080420102827.U67663@fledge.watson.org>
References: <m2hcdztsx2.wl%gnn@neville-neil.com>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
Cc: net@freebsd.org
Subject: Re: zonelimit issues...
X-BeenThere: freebsd-net@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Networking and TCP/IP with FreeBSD <freebsd-net.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
	<mailto:freebsd-net-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-net>
List-Post: <mailto:freebsd-net@freebsd.org>
List-Help: <mailto:freebsd-net-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
	<mailto:freebsd-net-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 20 Apr 2008 09:32:26 -0000


On Fri, 18 Apr 2008, gnn@freebsd.org wrote:

> I am wondering why this patch was never committed?
>
> http://people.freebsd.org/~delphij/misc/patch-zonelimit-workaround
>
> It does seem to address an issue I'm seeing where processes get into the 
> zonelimit state through the use of mbufs (a high speed UDP packet receiver) 
> but even after network pressure is reduced/removed the process never gets 
> out of that state again.  Applying the patch fixed the issue, but I'd like 
> to have some discussion as to the general merits of the approach.
>
> Unfortunately the test that currently causes this is tied very tightly to 
> code at work that I can't share, but I will hopefully be improving mctest to 
> try to exhibit this behavior.

When you take all load off the system, do mbufs and clusters get properly 
freed back to UMA (as visible in netstat -m)?  If not, continuing to bump up 
against the zonelimit would suggest an mbuf/cluster leak, in which case we 
need to track that bug.

You might consider adding a debugging-only zonelimit waiter count to the UMA 
zone, and checks/assertions that a wakeup is being generated properly.  That 
is, to confirm that the wakeup is generated when memory is freed up if there 
are threads waiting.  There is at least one as-yet MFC'd fix to the 
sleep/wakeup code, I believe, that might be relevant here.  Is the problem 
you're reporting on 7.x, or on 8.x?  If 8.x, that's probably not it, but if 
7.x, it could be.  (This same sleep/wakeup bug occasionally leads to wedging 
of dump(8), I believe).

Robert N M Watson
Computer Laboratory
University of Cambridge