From owner-freebsd-stable@FreeBSD.ORG Mon Aug 15 08:31:44 2011 Return-Path: Delivered-To: freebsd-stable@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id C9A281065674 for ; Mon, 15 Aug 2011 08:31:44 +0000 (UTC) (envelope-from avg@FreeBSD.org) Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id 01AEF8FC17 for ; Mon, 15 Aug 2011 08:31:43 +0000 (UTC) Received: from porto.starpoint.kiev.ua (porto-e.starpoint.kiev.ua [212.40.38.100]) by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id LAA09764; Mon, 15 Aug 2011 11:31:40 +0300 (EEST) (envelope-from avg@FreeBSD.org) Received: from localhost ([127.0.0.1]) by porto.starpoint.kiev.ua with esmtp (Exim 4.34 (FreeBSD)) id 1QssaB-000750-OY; Mon, 15 Aug 2011 11:31:39 +0300 Message-ID: <4E48D967.9060804@FreeBSD.org> Date: Mon, 15 Aug 2011 11:31:35 +0300 From: Andriy Gapon User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:5.0) Gecko/20110706 Thunderbird/5.0 MIME-Version: 1.0 To: Steven Hartland References: <47F0D04ADF034695BC8B0AC166553371@multiplay.co.uk><4E4380C0.7070908@FreeBSD.org> <4E43E272.1060204@FreeBSD.org> <62BF25D0ED914876BEE75E2ADF28DDF7@multiplay.co.uk> <4E440865.1040500@FreeBSD.org> <6F08A8DE780545ADB9FA93B0A8AA4DA1@multiplay.co.uk> <4E441314.6060606@FreeBSD.org> <2C4B0D05C8924F24A73B56EA652FA4B0@multiplay.co.uk> In-Reply-To: <2C4B0D05C8924F24A73B56EA652FA4B0@multiplay.co.uk> X-Enigmail-Version: 1.2pre Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: freebsd-stable@FreeBSD.org Subject: Re: debugging frequent kernel panics on 8.2-RELEASE X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 15 Aug 2011 08:31:45 -0000 on 14/08/2011 17:43 Steven Hartland said the following: > ----- Original Message ----- From: "Andriy Gapon" >> >> Maybe test it on couple of machines first just in case I overlooked something >> essential, although I have a report from another use that the patch didn't break >> anything for him (it was tested for an unrelated issue). > > We've got this running on a ~40 machines and just had the first panic > since the update. Unfortunately it doesn't seem to have changed anything :( > > We have 352 thread entries starting with:- > #0 sched_switch (td=0xffffffff8083e4e0, newtd=0xffffff0012d838c0, > flags=Variable "flags" is not available. > 23 with:- > cpustop_handler () at atomic.h:285 > and 16 with:- > #0 fork_trampoline () at /usr/src/sys/amd64/amd64/exception.S:562 I would like to get a full output of thread apply all bt. > The main message being:- > panic: double fault > > GNU gdb 6.1.1 [FreeBSD] > Copyright 2004 Free Software Foundation, Inc. > GDB is free software, covered by the GNU General Public License, and you are > welcome to change it and/or distribute copies of it under certain conditions. > Type "show copying" to see the conditions. > There is absolutely no warranty for GDB. Type "show warranty" for details. > This GDB was configured as "amd64-marcel-freebsd"... > > Unread portion of the kernel message buffer: > <118>Aug 14 15:13:33 amsbld15 syslogd: exiting on signal 15 So this line, does it indicate a shutdown of a jail or of the whole system? > Fatal double fault > rip = 0xffffffff8053b691 Can you please provide output of 'list *0xffffffff8053b691' in kgdb? > rsp = 0xffffff8d8f356fb0 > rbp = 0xffffff8d8f357210 > cpuid = 2; apic id = 02 > panic: double fault > cpuid = 2 > KDB: stack backtrace: > #0 0xffffffff803bb75e at kdb_backtrace+0x5e > #1 0xffffffff8038956e at panic+0x2ae > #2 0xffffffff805802b6 at dblfault_handler+0x96 > #3 0xffffffff8056900d at Xdblfault+0xad I think (not 100% sure) that with DDB in kernel we could get a better backtrace here, possibly with pre-dblfault stack frames, because DDB backend is a bit more smarter than the trivial stack(9) printer. > stack: 0xffffff8d8f357000, 4 One thing I can say is that this looks like like a double-fault because of stack exhaustion (the most typical cause): rsp value is below td_kstack. Can you please also provide the following information: p *((struct pcb *)((char *)0xffffff8d8f357000 + KSTACK_PAGES * PAGE_SIZE) - 1) where KSTACK_PAGES is a value of KSTACK_PAGES option (amd64 default is 4) and PAGE_SIZE is 4096. > rsp = 0xffffff800009ae10 [snip] > There are some indications that stopping jails could be the > cause of the panics so on one test box I've added in invariants > to see if we get anything shows up from that. OK. -- Andriy Gapon