From owner-freebsd-hackers@FreeBSD.ORG Wed Jul 23 14:58:10 2003 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 28E3737B401 for ; Wed, 23 Jul 2003 14:58:10 -0700 (PDT) Received: from relay.pair.com (relay.pair.com [209.68.1.20]) by mx1.FreeBSD.org (Postfix) with SMTP id 658BF43FA3 for ; Wed, 23 Jul 2003 14:58:09 -0700 (PDT) (envelope-from silby@silby.com) Received: (qmail 14874 invoked from network); 23 Jul 2003 21:58:08 -0000 Received: from niwun.pair.com (HELO localhost) (209.68.2.70) by relay.pair.com with SMTP; 23 Jul 2003 21:58:08 -0000 X-pair-Authenticated: 209.68.2.70 Date: Wed, 23 Jul 2003 16:57:38 -0500 (CDT) From: Mike Silbersack To: "Kevin A. Pieckiel" In-Reply-To: <20030723173007.GD41280@pacer.dmz.smartrafficenter.org> Message-ID: <20030723163643.F4074@odysseus.silby.com> References: <20030723173007.GD41280@pacer.dmz.smartrafficenter.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII cc: freebsd-hackers@freebsd.org Subject: Re: mbuf cluster shortage caused kernel panic X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 23 Jul 2003 21:58:10 -0000 On Wed, 23 Jul 2003, Kevin A. Pieckiel wrote: > #uname -a > FreeBSD fileserver1.smartrafficenter.net 4.7-STABLE FreeBSD 4.7-STABLE #0: Mon Dec 16 19:41:03 EST 2002 toor@fileserver1.smartrafficenter.net:/usr/obj/usr/src/sys/FILESERVER1 i386 > > Running 4.7 stable with sources CVSed on 16 Dec 2002. > > My fileserver has been running since 17 Dec 2002 and suddenly lost its > ability to talk on the network today. Went to the console to discover > a flood of messages that it was out of mbuf clusters, read tuning(7) > for more info. > > What can I do to help solve any problems that might exist in the kernel > code, and what suggestions do you have to keep this from happening on > my fileserver again? > > Kernel, debug kernel, CVS date, kernel config, and core file can be > made available upon request. > > Thanks much, > Kevin A. Pieckiel Your panic seems to indicate that the mbuf cluster chain became corrupted, which could have happened in one of a few ways. I'll address your question in two parts: 1. How do I prevent the system from using all mbuf clusters. This depends on the application you're running; next time you're in a similar situation, you may wish to run netstat -n | more and look at the sendq values to see if there are a large number of connections with large sendqs that are sucking up all the mbuf clusters. If a large number of mbuf clusters are in use without much of anything showing up in netstat -n, then we have some sort of mbuf cluster leak, which is much more serious. 2. How do I prevent the system from panicing when all mbuf clusters are used up? This question has a more useful answer. :) You could cvsup to 4.8-STABLE; at least two bugs which would result in panics during mbuf exhaustion have been fixed, and an additional potential panic causing situation has been patched. One of those bugs may be the same as the one that affected you, but it would be very time consuming to figure it out. Even if you stay with the kernel version you are at, you may want to enable the INVARIANTS (and INVARIANT_SUPPORT) options. This will cause additional checks to be enabled in the kernel which will make tracking down future panics easier. If this problem is infrequent, I think your best course of action is to build a 4.7 kernel with INVARIANTS for now, and plan on a 4.8-stable upgrade at some point in the future. Mike "Silby" Silbersack