Date: Wed, 10 Aug 2011 09:00:19 -0700 From: Jeremy Chadwick <freebsd@jdc.parodius.com> To: Steven Hartland <killing@multiplay.co.uk> Cc: freebsd-stable@freebsd.org Subject: Re: debugging frequent kernel panics on 8.2-RELEASE Message-ID: <20110810160018.GA40279@icarus.home.lan> In-Reply-To: <ADF5E597D1C0428D8FB838D94BDEB3A4@multiplay.co.uk> References: <47F0D04ADF034695BC8B0AC166553371@multiplay.co.uk> <20110810151256.GA38601@icarus.home.lan> <ADF5E597D1C0428D8FB838D94BDEB3A4@multiplay.co.uk>
next in thread | previous in thread | raw e-mail | index | archive | help
On Wed, Aug 10, 2011 at 04:46:17PM +0100, Steven Hartland wrote: > >On Wed, Aug 10, 2011 at 03:22:52PM +0100, Steven Hartland wrote: > >>The base stack reported is a double fault with no additional > >>details and CTRL+ALT+ESC fails to break to the debugger as > >>does and NMI, even though it at least tries printing the > >>following many times some quite jumbled:- > >>NMI ... going to debugger > > >If you're generating the NMI yourself (possibly via the KVM, etc.) then > >okay, that's different. I'm trying to discern whether or not *you're* > >generating the NMI, or if the NMI just happens and causes a panic for > >you and that's what you're worried about. > > Yer generating it after panic in order to try and get to the debugger :) Understood, thanks for clarifying. > >Now to discuss the "jumbled console output": > ... > >The default (assuming your kernel configs are based off of GENERIC > >within the past 4-5 years) is 128. However, the same developers stated > >that they have great reservations over increasing this number > >dramatically (meaning, something like 256 will probably work, but larger > >"may have repercussions which are unknown at this time"). > > Might try that if it will help but with so many production machines to > action I'd like to try and avoid if possible. I've used PRINTF_BUFR_SIZE=256 with success on our systems, but since it doesn't actually *solve* the problem, I just use the default 128 and just grit my teeth when we experience it. It's larger values (e.g. 512/1024, etc.) which there is concern over. > >In combination with this, we use the following in /etc/rc.conf (the > >dumpdev line is important, else savecore won't pick up anything): > > > >dumpdev="auto" > > I thought this was ment to be the default from back in the 6.x days but > it didnt seem to work, so I added the gptid device from /etc/fstab /etc/defaults/rc.conf has dumpdev="NO", which affects two things: both /etc/rc.d/dumpon (this script is a little tricky, you really have to read it slowly/pay close attention to what's going on), and /etc/rc.d/savecore. I've always wondered why dumpdev="NO" is the default, not "auto", since on a system with no swap devices in /etc/fstab dumpdev="auto" should behave the same. Possibly the idea of the default is to ensure that savecore(8) never gets run (e.g. there's no guarantee someone has /var/crash, or a /var that's big enough to hold a crash dump; possibly embedded systems or NFS-only systems, for example). Touchy subject I guess. -- | Jeremy Chadwick jdc at parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, US | | Making life hard for others since 1977. PGP 4BD6C0CB |
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20110810160018.GA40279>