Date: Mon, 15 Aug 2011 11:31:35 +0300 From: Andriy Gapon <avg@FreeBSD.org> To: Steven Hartland <killing@multiplay.co.uk> Cc: freebsd-stable@FreeBSD.org Subject: Re: debugging frequent kernel panics on 8.2-RELEASE Message-ID: <4E48D967.9060804@FreeBSD.org> In-Reply-To: <2C4B0D05C8924F24A73B56EA652FA4B0@multiplay.co.uk> References: <47F0D04ADF034695BC8B0AC166553371@multiplay.co.uk><A71C3ACF01EC4D36871E49805C1A5321@multiplay.co.uk><4E4380C0.7070908@FreeBSD.org><EBC06A239BAB4B3293C28D793329F9CA@multiplay.co.uk> <4E43E272.1060204@FreeBSD.org> <62BF25D0ED914876BEE75E2ADF28DDF7@multiplay.co.uk> <4E440865.1040500@FreeBSD.org> <6F08A8DE780545ADB9FA93B0A8AA4DA1@multiplay.co.uk> <4E441314.6060606@FreeBSD.org> <2C4B0D05C8924F24A73B56EA652FA4B0@multiplay.co.uk>
next in thread | previous in thread | raw e-mail | index | archive | help
on 14/08/2011 17:43 Steven Hartland said the following: > ----- Original Message ----- From: "Andriy Gapon" <avg@FreeBSD.org> >> >> Maybe test it on couple of machines first just in case I overlooked something >> essential, although I have a report from another use that the patch didn't break >> anything for him (it was tested for an unrelated issue). > > We've got this running on a ~40 machines and just had the first panic > since the update. Unfortunately it doesn't seem to have changed anything :( > > We have 352 thread entries starting with:- > #0 sched_switch (td=0xffffffff8083e4e0, newtd=0xffffff0012d838c0, > flags=Variable "flags" is not available. > 23 with:- > cpustop_handler () at atomic.h:285 > and 16 with:- > #0 fork_trampoline () at /usr/src/sys/amd64/amd64/exception.S:562 I would like to get a full output of thread apply all bt. > The main message being:- > panic: double fault > > GNU gdb 6.1.1 [FreeBSD] > Copyright 2004 Free Software Foundation, Inc. > GDB is free software, covered by the GNU General Public License, and you are > welcome to change it and/or distribute copies of it under certain conditions. > Type "show copying" to see the conditions. > There is absolutely no warranty for GDB. Type "show warranty" for details. > This GDB was configured as "amd64-marcel-freebsd"... > > Unread portion of the kernel message buffer: > <118>Aug 14 15:13:33 amsbld15 syslogd: exiting on signal 15 So this line, does it indicate a shutdown of a jail or of the whole system? > Fatal double fault > rip = 0xffffffff8053b691 Can you please provide output of 'list *0xffffffff8053b691' in kgdb? > rsp = 0xffffff8d8f356fb0 > rbp = 0xffffff8d8f357210 > cpuid = 2; apic id = 02 > panic: double fault > cpuid = 2 > KDB: stack backtrace: > #0 0xffffffff803bb75e at kdb_backtrace+0x5e > #1 0xffffffff8038956e at panic+0x2ae > #2 0xffffffff805802b6 at dblfault_handler+0x96 > #3 0xffffffff8056900d at Xdblfault+0xad I think (not 100% sure) that with DDB in kernel we could get a better backtrace here, possibly with pre-dblfault stack frames, because DDB backend is a bit more smarter than the trivial stack(9) printer. > stack: 0xffffff8d8f357000, 4 One thing I can say is that this looks like like a double-fault because of stack exhaustion (the most typical cause): rsp value is below td_kstack. Can you please also provide the following information: p *((struct pcb *)((char *)0xffffff8d8f357000 + KSTACK_PAGES * PAGE_SIZE) - 1) where KSTACK_PAGES is a value of KSTACK_PAGES option (amd64 default is 4) and PAGE_SIZE is 4096. > rsp = 0xffffff800009ae10 [snip] > There are some indications that stopping jails could be the > cause of the panics so on one test box I've added in invariants > to see if we get anything shows up from that. OK. -- Andriy Gapon
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4E48D967.9060804>