From owner-freebsd-hackers@FreeBSD.ORG  Wed Jul 23 15:20:59 2003
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 5FE4437B401
	for <freebsd-hackers@freebsd.org>;
	Wed, 23 Jul 2003 15:20:59 -0700 (PDT)
Received: from smartrafficenter.org (pacer.smartrafficenter.org [207.14.56.3])
	by mx1.FreeBSD.org (Postfix) with SMTP id 8680943FB1
	for <freebsd-hackers@freebsd.org>;
	Wed, 23 Jul 2003 15:20:58 -0700 (PDT)
	(envelope-from kpieckiel@smartrafficenter.org)
Received: (qmail 74737 invoked by uid 1500); 23 Jul 2003 22:20:56 -0000
Date: Wed, 23 Jul 2003 18:20:56 -0400
From: "Kevin A. Pieckiel" <kpieckiel-freebsd-hackers@smartrafficenter.org>
To: Mike Silbersack <silby@silby.com>
Message-ID: <20030723222056.GA74596@pacer.dmz.smartrafficenter.org>
References: <20030723173007.GD41280@pacer.dmz.smartrafficenter.org>
	<20030723163643.F4074@odysseus.silby.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20030723163643.F4074@odysseus.silby.com>
User-Agent: Mutt/1.4i
cc: freebsd-hackers@freebsd.org
Subject: Re: mbuf cluster shortage caused kernel panic
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
	<freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
	<mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
	<mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 23 Jul 2003 22:20:59 -0000

Mike,

On Wed, Jul 23, 2003 at 04:57:38PM -0500, Mike Silbersack wrote:
> 
> Your panic seems to indicate that the mbuf cluster chain became corrupted,
> which could have happened in one of a few ways.  I'll address your
> question in two parts:
> 
> 1.  How do I prevent the system from using all mbuf clusters.
> 
> This depends on the application you're running; next time you're in a
> similar situation, you may wish to run netstat -n | more and look at the

You are exactly right.  Right before I read this E-Mail, I noticed I
started running out again.  I was fortunate enough to catch the right
information in time.  A program one of my colleagues wrote was running
ping every couple of seconds.  The problem was the -c flag was not used,
so ping never exited.  I had hundreds of ping commands running.  I was not
able to catch this before the panic.

(It panicked twice more, BTW, before I was able to catch this.)

This time, I was fortunate enough to notice a high load average for
this machine.  That lead to checking the process list.  That led to
gazillions of ping commands running.  'killall -9 ping' was my best
friend today.

> 2.  How do I prevent the system from panicing when all mbuf clusters are
> used up?
> 
> This question has a more useful answer. :)
> 
> You could cvsup to 4.8-STABLE; at least two bugs which would result in
> panics during mbuf exhaustion have been fixed, and an additional potential
> panic causing situation has been patched.  One of those bugs may be the
> same as the one that affected you, but it would be very time consuming to
> figure it out.

This is a good thought.  In fact, I did use the third crash as an
opportunity to upgrade, in hopes of solving the panic problem, even if
it didn't solve the real issue.  Not panicking would give me more time
to see what was really wrong, even if I had no network.  Fortunately,
I didn't have to test this theory, but I did get the upgrade.  :)


> If this problem is infrequent, I think your best course of action is to
> build a 4.7 kernel with INVARIANTS for now, and plan on a 4.8-stable
> upgrade at some point in the future.

Mike, I am truly thankful for your response.  I appreciate your help.  Even
though I did find the problem before I read your answer, I believe it would
have given me the insight/time I needed to find what the real problem was
had I not noticed my high load average.

Thank you.

Sincerely,
Kevin