From owner-freebsd-hackers@FreeBSD.ORG  Wed Jul 23 14:58:10 2003
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 28E3737B401
	for <freebsd-hackers@freebsd.org>;
	Wed, 23 Jul 2003 14:58:10 -0700 (PDT)
Received: from relay.pair.com (relay.pair.com [209.68.1.20])
	by mx1.FreeBSD.org (Postfix) with SMTP id 658BF43FA3
	for <freebsd-hackers@freebsd.org>;
	Wed, 23 Jul 2003 14:58:09 -0700 (PDT)	(envelope-from silby@silby.com)
Received: (qmail 14874 invoked from network); 23 Jul 2003 21:58:08 -0000
Received: from niwun.pair.com (HELO localhost) (209.68.2.70)
  by relay.pair.com with SMTP; 23 Jul 2003 21:58:08 -0000
X-pair-Authenticated: 209.68.2.70
Date: Wed, 23 Jul 2003 16:57:38 -0500 (CDT)
From: Mike Silbersack <silby@silby.com>
To: "Kevin A. Pieckiel" <kpieckiel-freebsd-hackers@smartrafficenter.org>
In-Reply-To: <20030723173007.GD41280@pacer.dmz.smartrafficenter.org>
Message-ID: <20030723163643.F4074@odysseus.silby.com>
References: <20030723173007.GD41280@pacer.dmz.smartrafficenter.org>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
cc: freebsd-hackers@freebsd.org
Subject: Re: mbuf cluster shortage caused kernel panic
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
	<freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
	<mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
	<mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 23 Jul 2003 21:58:10 -0000


On Wed, 23 Jul 2003, Kevin A. Pieckiel wrote:

> #uname -a
> FreeBSD fileserver1.smartrafficenter.net 4.7-STABLE FreeBSD 4.7-STABLE #0: Mon Dec 16 19:41:03 EST 2002     toor@fileserver1.smartrafficenter.net:/usr/obj/usr/src/sys/FILESERVER1  i386
>
> Running 4.7 stable with sources CVSed on 16 Dec 2002.
>
> My fileserver has been running since 17 Dec 2002 and suddenly lost its
> ability to talk on the network today.  Went to the console to discover
> a flood of messages that it was out of mbuf clusters, read tuning(7)
> for more info.
>
> What can I do to help solve any problems that might exist in the kernel
> code, and what suggestions do you have to keep this from happening on
> my fileserver again?
>
> Kernel, debug kernel, CVS date, kernel config, and core file can be
> made available upon request.
>
> Thanks much,
> Kevin A. Pieckiel

Your panic seems to indicate that the mbuf cluster chain became corrupted,
which could have happened in one of a few ways.  I'll address your
question in two parts:

1.  How do I prevent the system from using all mbuf clusters.

This depends on the application you're running; next time you're in a
similar situation, you may wish to run netstat -n | more and look at the
sendq values to see if there are a large number of connections with large
sendqs that are sucking up all the mbuf clusters.

If a large number of mbuf clusters are in use without much of anything
showing up in netstat -n, then we have some sort of mbuf cluster leak,
which is much more serious.

2.  How do I prevent the system from panicing when all mbuf clusters are
used up?

This question has a more useful answer. :)

You could cvsup to 4.8-STABLE; at least two bugs which would result in
panics during mbuf exhaustion have been fixed, and an additional potential
panic causing situation has been patched.  One of those bugs may be the
same as the one that affected you, but it would be very time consuming to
figure it out.

Even if you stay with the kernel version you are at, you may want to
enable the INVARIANTS (and INVARIANT_SUPPORT) options.  This will cause
additional checks to be enabled in the kernel which will make tracking
down future panics easier.

If this problem is infrequent, I think your best course of action is to
build a 4.7 kernel with INVARIANTS for now, and plan on a 4.8-stable
upgrade at some point in the future.

Mike "Silby" Silbersack