From owner-freebsd-stable@FreeBSD.ORG Thu Sep 11 10:56:33 2008 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id C46C01065671 for ; Thu, 11 Sep 2008 10:56:33 +0000 (UTC) (envelope-from jdc@koitsu.dyndns.org) Received: from QMTA04.westchester.pa.mail.comcast.net (qmta04.westchester.pa.mail.comcast.net [76.96.62.40]) by mx1.freebsd.org (Postfix) with ESMTP id 614C88FC1C for ; Thu, 11 Sep 2008 10:56:32 +0000 (UTC) (envelope-from jdc@koitsu.dyndns.org) Received: from OMTA09.westchester.pa.mail.comcast.net ([76.96.62.20]) by QMTA04.westchester.pa.mail.comcast.net with comcast id DNsd1a0040SCNGk54NwYRr; Thu, 11 Sep 2008 10:56:32 +0000 Received: from koitsu.dyndns.org ([67.180.253.227]) by OMTA09.westchester.pa.mail.comcast.net with comcast id DNwX1a0034v8bD73VNwXjK; Thu, 11 Sep 2008 10:56:32 +0000 X-Authority-Analysis: v=1.0 c=1 a=LGadaglty7YA:10 a=SRs3071BOUEA:10 a=6I5d2MoRAAAA:8 a=QycZ5dHgAAAA:8 a=jDDFYdE3S2vXS1IL9ecA:9 a=XlYIDDRJ2fjrCFLU6cxhRmwbJtIA:4 a=EoioJ0NPDVgA:10 a=SV7veod9ZcQA:10 a=LY0hPdMaydYA:10 Received: by icarus.home.lan (Postfix, from userid 1000) id 5D7A317B81A; Thu, 11 Sep 2008 03:56:31 -0700 (PDT) Date: Thu, 11 Sep 2008 03:56:31 -0700 From: Jeremy Chadwick To: Michael Grant Message-ID: <20080911105631.GB25493@icarus.home.lan> References: <487E0D1B.2060902@FreeBSD.org> <20080716203900.5jt4qce17gg0og0o@mail.basicnets.co.uk> <62b856460807241309k3cea60dbh24eea677cd6751f7@mail.gmail.com> <4888E207.4020606@FreeBSD.org> <62b856460809110138o5fb10171h9832ac8b964fa3f6@mail.gmail.com> <20080911092047.GA24499@icarus.home.lan> <62b856460809110308sa44f057mc08189a97efa9d0c@mail.gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <62b856460809110308sa44f057mc08189a97efa9d0c@mail.gmail.com> User-Agent: Mutt/1.5.18 (2008-05-17) Cc: Kris Kennaway , FreeBSD Stable List Subject: Re: Fresh 7.0 Install: Fatal Trap 12 panic when put under load X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 11 Sep 2008 10:56:33 -0000 On Thu, Sep 11, 2008 at 12:08:47PM +0200, Michael Grant wrote: > On Thu, Sep 11, 2008 at 11:20 AM, Jeremy Chadwick wrote: > > On Thu, Sep 11, 2008 at 10:38:36AM +0200, Michael Grant wrote: > >> My box crashed again: > >> > >> panic: kmem_malloc(4096): kmem_map too small: 1073741824 total allocated > >> cpuid = 0 > >> Uptime: 33d11h12m58s > >> Dumping 3327 MB (2 chunks) > >> chunk 0: 1MB (151 pages) ... ok > >> chunk 1: 3327MB (851568 pages) <---hung here > >> > >> Still no valid dump. > >> > >> There is 4gig of physical memory in the machine. > >> > >> In /boot/loader.conf, I currently have the following: > >> > >> vm.kmem_size=1G > >> vm.kmem_size_max=1G > >> vm.kmem_size_scale=2 > >> > >> and in my kernel conf file I have: > >> > >> options KVA_PAGES=512 > >> > >> It stayed up for 33 days this time. Is there anything else I can do? > > > > First and foremost: are you using ZFS on this machine? If so, there are > > many tunables you can apply to try and limit this; I'm willing to bet > > it's ARC which is doing it. See below. > > > > In general, it appears that you need to increase the maximum range of > > kmem. The kernel attempted to utilise more than 1GB, and your limit is > > 1G. My machines running RELENG_7 on amd64, with only 2GB of RAM > > installed, use the following tunables in loader.conf: > > > > vm.kmem_size="1536M" > > vm.kmem_size_max="1536M" > > > > If ZFS is in use, I recommend these as well: > > > > vfs.zfs.arc_min="16M" > > vfs.zfs.arc_max="64M" > > vfs.zfs.prefetch_disable="1" > > > > Do not increase kmem_size any larger than 1.5GB; the amount of RAM you > > have in the machine, with regards to RELENG_7, will not help. This is a > > known limitation which has been fixed in HEAD/CURRENT (where the limit > > has been increased to 512GB). See the "Kernel" section below; you'll > > see the applicable item. > > > > http://wiki.freebsd.org/JeremyChadwick/Commonly_reported_issues > > > > Your only solution may be to run HEAD/CURRENT. > > I am not running ZFS. My file systems are ufs. > > This feels like some sort of memory leak in the kernel. Giving it > more and more memory just seems to delay the crash. Are you saying > the crash is fixed in HEAD/CURRENT? It's an intentional crash, not "the program tried to access NULL, which crashed the machine" crash. The kernel wants more memory to accomplish a certain thing, and it's not available. kris@ can explain this in better terms than I can. First and foremost, it would be good to find out what all you are running on this machine (process-wise). A process could be tickling something in the kernel which requires a large amount of memory to be required. I can imagine something like MySQL would require this. Ideally what needs to happen is to debug the kernel or get a full map of kmem to find out what's using what. I believe vmstat -m or vmstat -z output might help. Obviously since the machine panics, you won't be able to run those commands after the fact. I would recommend you set up a cronjob that runs every 1-2 minutes and logs the output of both of those commands to a file. When the panic happens, restart the system and look at the logfile to see if you can figure out if anything suddenly starts taking up a large amount of memory, or if it's a gradual thing (indicating a memory leak). If you can figure out what might be tickling the problem, you can ultimately figure out if increasing kmem is the right thing to do, or if there's a greater problem here. > I'm running 6.3 by the way. > > I have put your changes into my loader.conf, we'll see how long it > goes this time. I'm not qute in position to update everything to 7.x > at the moment. Our production webservers run RELENG_6 and RELENG_7, and we don't encounter this kind of problem. I'm not saying what you're experiencing is indicative of hardware issues or something like that -- I'm simply saying I have loaded systems which don't ever hit that condition. So figuring out what's causing it in your case would be good. -- | Jeremy Chadwick jdc at parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP: 4BD6C0CB |