From owner-freebsd-hackers@FreeBSD.ORG Fri Oct 1 16:36:35 2004 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 86B9516A4CE for ; Fri, 1 Oct 2004 16:36:35 +0000 (GMT) Received: from mail.ambrisko.com (adsl-64-174-51-43.dsl.snfc21.pacbell.net [64.174.51.43]) by mx1.FreeBSD.org (Postfix) with ESMTP id 449EB43D53 for ; Fri, 1 Oct 2004 16:36:35 +0000 (GMT) (envelope-from ambrisko@ambrisko.com) Received: from server2.ambrisko.com (HELO www.ambrisko.com) (192.168.1.2) by mail.ambrisko.com with ESMTP; 01 Oct 2004 09:36:35 -0700 Received: from ambrisko.com (localhost [127.0.0.1]) by www.ambrisko.com (8.12.9p2/8.12.9) with ESMTP id i91GaYkT093429; Fri, 1 Oct 2004 09:36:34 -0700 (PDT) (envelope-from ambrisko@ambrisko.com) Received: (from ambrisko@localhost) by ambrisko.com (8.12.9p2/8.12.9/Submit) id i91GaYgC093428; Fri, 1 Oct 2004 09:36:34 -0700 (PDT) (envelope-from ambrisko) From: Doug Ambrisko Message-Id: <200410011636.i91GaYgC093428@ambrisko.com> In-Reply-To: <200409301003.00492.durham@jcdurham.com> To: Jim Durham Date: Fri, 1 Oct 2004 09:36:34 -0700 (PDT) X-Mailer: ELM [version 2.4ME+ PL94b (25)] MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset=US-ASCII cc: freebsd-hackers@freebsd.org Subject: Re: Sudden Reboots X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 01 Oct 2004 16:36:35 -0000 Jim Durham writes: | I have had this problem now with at least 3 FreeBSD servers over a period of | about 2 years. I had put it down to some hardware problem but it seems to be | too much of a coincidence with 3 different machines doing the same thing. | | The first time was when I put 4.5-RELEASE on a brand new Dell Poweredge 2650. | I ran it on the bench for a week or so, then decided all was well and put it | in the server rack and started doing the company's email service on it. After | a few weeks, it suddenly would 'reboot' for no apparent reason. No log | entries, nothing at all except the usual stuff in /var/log/messages about '/ | was not unmounted correctly', etc. Just like you had pulled the power plug. How much memory are in these system?. If you have 3G or more you end up with very little left for the kernel in the 2G space. You can monitor how much space you have left by compile a debug kernel then as root: gdb -k kernel.debug /dev/mem print ((unsigned int)virtual_end)-((unsigned int)kernel_vm_end) This should probably be made into a sysctl so it can be montored better. If you only have a few meg. left it doesn't take many processes to fork etc. then you machine blows up. The bge driver for example takes 4M each for the jumbo packet handling. You can recover some of this memory via loader.conf tunables or bump KVA_PAGES in your kernel config file. Still once this memory is put into the zone allocator (vmstat -z) in -stable it is gone from the system even if that bucket isn't fully used or needed :-( Ironically the more memory you put in a system the less you can do with the system! A lot of people are starting to run into this problem since large memory machines are cheap. Doug A.