From owner-freebsd-hackers@FreeBSD.ORG Sun Feb 1 07:41:52 2004 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 471E316A4CE for ; Sun, 1 Feb 2004 07:41:52 -0800 (PST) Received: from mail.icomag.de (ns.icomag.de [195.227.115.162]) by mx1.FreeBSD.org (Postfix) with ESMTP id 663E343D41 for ; Sun, 1 Feb 2004 07:41:49 -0800 (PST) (envelope-from bgd@icomag.de) Received: from localhost (localhost [127.0.0.1]) by mail.icomag.de (Postfix) with ESMTP id 1883022E36 for ; Sun, 1 Feb 2004 16:41:47 +0100 (CET) Received: by mail.icomag.de (Postfix, from userid 1019) id BDE8D22E38; Sun, 1 Feb 2004 16:41:43 +0100 (CET) Date: Sun, 1 Feb 2004 16:41:43 +0100 From: Bogdan TARU To: freebsd-hackers@freebsd.org Message-ID: <20040201154143.GA7837@icomag.de> Mail-Followup-To: freebsd-hackers@freebsd.org References: <20040123125040.GA42187@icomag.de> <40111803.25970.2F6461BE@localhost> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <40111803.25970.2F6461BE@localhost> User-Agent: Mutt/1.4.1i X-Virus-Scanned: by AMaViS Subject: Re: 4.9 kernel panics on a poweredge 2650 X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 01 Feb 2004 15:41:52 -0000 Hi Hackers, Ok, now some more infos about my problem: We have 3 identical webservers (as hw configuration), and the same kernel and applications running on all three. They get mostly the same traffic (dns round-robined). They all run 4.9-RELEASE. I have experienced repetable crashes on all three, so there is no problem with the hardware (or the possibility of such a thing is too small). I have come to think that the problem is with the kernel memory space, which is too low. I have compiled the kernel from Generic, by performing the following modifications: - maxusers set to 128 - activated SMP (the cpus are HTT-compatible) - kva_pages set 256 (each box has 2GB of ram and 2Gb of swap) - PMAP_SHPGPERPROC=401 (for apache) - ACCEPT_FILTER_DATA and ACCEPT_FILTER_HTTP - removed unnecessary drivers from the kernel /etc/sysctl.conf looks like: net.inet.tcp.msl=100 net.inet.tcp.blackhole=1 # Hyperthreading machdep.cpu_idle_hlt=1 kern.ipc.somaxconn=4096 kern.maxfiles=65535 vfs.vmiodirenable=1 kern.ipc.shm_use_phys=1 net.inet.tcp.sendspace=16384 The boxes run w/o a problem for about 2-3 days, after which they panic with 'page not present' in different processes (sshd, httpd, etc). I guess the real reason for this is the low value for kvm_free: (web1)[~] sysctl -a | grep vm.kvm vm.kvm_size: 1069543424 vm.kvm_free: 4190208 But I don't know what causes that. The boxes are not that busy (they don't even crash during peak-traffic times), and vmstat -m shows me as a total: Memory Totals: In Use Free Requests 5311K 7090K 15602606 which also looks sort of normal. So, any idea where I should start looking in order to see what 'eats' so much kvm space? Thank you, bogdan On Fri, Jan 23, 2004 at 12:48:03PM -0800, Andrew Kinney wrote: > On 23 Jan 2004 at 13:50, Bogdan TARU wrote: > > > > > > > Hi hackers, > > > > I am experiencing kernel panics on a poweredge 2650 each day around > > 3am (usually the machine comes up at 3:04am). The kernel panics are > > reproductable by running: /etc/periodic/security/100.chksetuid (in > > fact by runnning find on /usr with -perms). The problem lies > > somewhere in /usr/ports. Deleting the /usr/ports tree doesn't solve > > it, trying a cvs up of /usr/ports results in a crash again. > > > > Our experience is that repetitive crashes when dealing with large > numbers of files (like the ports tree) generally points to hitting > some OS resource limit. Some things to check that may or may not > apply to this particular problem: > > sysctl vm.zone > > Make sure you're not hitting any of those limits. > > sysctl vm.kvm_size > sysctl vm.kvm_free > > If kvm_free is running low just prior to the crash, you might want to > increase your KVA_PAGES (see lint) and rebuild your kernel. > > Of course, this is all hit and miss guess work until you have a crash > dump, so getting a crash dump and a traceback from a kernel identical > to your running kernel with debugging symbols would be a logical > first step if you want to avoid any guessing. If your tracebacks > show failures in random locations, you're probably looking at bad > RAM. If you always fail in the same spot with each crash, then it is > just a matter of determining why and correcting it. > > I believe the freebsd developer's handbook has instructions on how > to setup a system to do an automatic crash dump for any panic. It is > relatively straightforward. > > Sincerely, > Andrew Kinney > President and > Chief Technology Officer > Advantagecom Networks, Inc. > http://www.advantagecom.net >