From owner-freebsd-current@FreeBSD.ORG Wed Nov 27 08:22:41 2013 Return-Path: Delivered-To: freebsd-current@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id D72B4930 for ; Wed, 27 Nov 2013 08:22:41 +0000 (UTC) Received: from gw.catspoiler.org (gw.catspoiler.org [75.1.14.242]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 9DE0B2F83 for ; Wed, 27 Nov 2013 08:22:41 +0000 (UTC) Received: from FreeBSD.org (mousie.catspoiler.org [192.168.101.2]) by gw.catspoiler.org (8.13.3/8.13.3) with ESMTP id rAR8MRnk039213; Wed, 27 Nov 2013 00:22:31 -0800 (PST) (envelope-from truckman@FreeBSD.org) Message-Id: <201311270822.rAR8MRnk039213@gw.catspoiler.org> Date: Wed, 27 Nov 2013 00:22:27 -0800 (PST) From: Don Lewis Subject: Re: panic: double fault with 11.0-CURRENT r258504 To: kostikbel@gmail.com In-Reply-To: <20131125081047.GN59496@kib.kiev.ua> MIME-Version: 1.0 Content-Type: TEXT/plain; charset=us-ascii Cc: freebsd-current@FreeBSD.org X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.16 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 27 Nov 2013 08:22:41 -0000 On 25 Nov, Konstantin Belousov wrote: > On Sat, Nov 23, 2013 at 11:43:30PM -0800, Don Lewis wrote: >> I upgraded my 11.0-CURRENT machine to r258504 to get past the uma panic >> that I stumbled across earlier. Now I got this when I started upgrading >> ports: >> >> Unread portion of the kernel message buffer: >> >> Fatal double fault: >> eip = 0xc0b158e0 >> esp = 0xe4f62000 >> ebp = 0xe4f62010 >> cpuid = 0; apic id = 00 >> panic: double fault >> cpuid = 0 >> KDB: stack backtrace: >> db_trace_self_wrapper(c113340c,2,10000000,c15a0cf0,c15a0ce8,...) at db_trace_self_wrapper+0x2d/frame 0xc15a0cb0 >> kdb_backtrace(c12f143f,0,c12f2aea,c15a0d6c,0,...) at kdb_backtrace+0x30/frame 0xc15a0d18 >> vpanic(c15a0d6c,c15a0d84,c0fc14fb,c12f2aea,0,...) at vpanic+0x11f/frame 0xc15a0d54 >> panic(c12f2aea,0,0,0,e4f62010,...) at panic+0x12/frame 0xc15a0d60 >> dblfault_handler() at dblfault_handler+0xab/frame 0xc15a0d60 >> --- trap 0x17, eip = 0xc0b158e0, esp = 0xe4f62000, ebp = 0xe4f62010 --- >> vprintf(c12f2900,c,fffffe7f,fffffeff,bfff75ed,...) at vprintf/frame 0xe4f62010 >> trap(e4f62164) at trap+0x18a/frame 0xe4f62158 >> calltrap() at calltrap+0x6/frame 0xe4f62158 >> --- trap 0xc, eip = 0xc0b145dd, esp = 0xe4f621a4, ebp = 0xe4f62270 --- >> kvprintf(c12f2900,c0b15210,e4f62290,a,e4f6235c,...) at kvprintf+0x1cd/frame 0xe4f62270 >> vprintf(c12f2900,e4f6235c,e4f6235c) at vprintf+0x7f/frame 0xe4f6233c >> printf(c12f2900,c,ffefdfff,ebefefff,dfdffedf,...) at printf+0x1b/frame 0xe4f62350 >> trap(e4f624a4) at trap+0x18a/frame 0xe4f62498 >> calltrap() at calltrap+0x6/frame 0xe4f62498 >> --- trap 0xc, eip = 0xc0b145dd, esp = 0xe4f624e4, ebp = 0xe4f625b0 --- >> kvprintf(c12f2900,c0b15210,e4f625d0,a,e4f6269c,...) at kvprintf+0x1cd/frame 0xe4f625b0 >> vprintf(c12f2900,e4f6269c,e4f6269c) at vprintf+0x7f/frame 0xe4f6267c >> printf(c12f2900,c,5fd7ff5f,ba77f7fb,bfffb7ff,...) at printf+0x1b/frame 0xe4f62690 >> trap(e4f627e4) at trap+0x18a/frame 0xe4f627d8 >> calltrap() at calltrap+0x6/frame 0xe4f627d8 >> --- trap 0xc, eip = 0xc0b145dd, esp = 0xe4f62824, ebp = 0xe4f628f0 --- >> kvprintf(c12f2900,c0b15210,e4f62910,a,e4f629dc,...) at kvprintf+0x1cd/frame 0xe4f628f0 >> vprintf(c12f2900,e4f629dc,e4f629dc) at vprintf+0x7f/frame 0xe4f629bc >> printf(c12f2900,c,0,80000000,c0,...) at printf+0x1b/frame 0xe4f629d0 >> trap(e4f62b20) at trap+0x18a/frame 0xe4f62b14 >> calltrap() at calltrap+0x6/frame 0xe4f62b14 >> --- trap 0xc, eip = 0xc0afe270, esp = 0xe4f62b60, ebp = 0xe4f62b78 --- >> tdq_choose(c141e090,4,c113144d,917,c2425c80,...) at tdq_choose+0x60/frame 0xe4f62b78 >> sched_choose(e4f62c00,c0afc511,c141e090,14,c113144d,...) at sched_choose+0x4c/frame 0xe4f62ba4 >> choosethread(c141e090,14,c113144d,78b,c141e116,...) at choosethread+0x1f/frame 0xe4f62bac >> sched_switch(c8f04000,0,608,1b7,ef2,...) at sched_switch+0x361/frame 0xe4f62c00 >> mi_switch(608,0,c112f4e4,d3,c,...) at mi_switch+0x1c9/frame 0xe4f62c34 >> critical_exit(0,2,c113144d,411,c141e108,...) at critical_exit+0xa4/frame 0xe4f62c50 >> sched_idletd(0,e4f62d08,c1128634,3db,0,...) at sched_idletd+0x1d6/frame 0xe4f62ccc >> fork_exit(c0afeb00,0,e4f62d08) at fork_exit+0x7f/frame 0xe4f62cf4 >> fork_trampoline() at fork_trampoline+0x8/frame 0xe4f62cf4 >> --- trap 0, eip = 0, esp = 0xe4f62d40, ebp = 0 --- >> KDB: enter: panic >> >> (kgdb) list *tdq_choose+0x60 >> 0xc0afe270 is in tdq_choose (/usr/src/sys/kern/sched_ule.c:1334). >> 1329 td = runq_choose(&tdq->tdq_realtime); >> 1330 if (td != NULL) >> 1331 return (td); >> 1332 td = runq_choose_from(&tdq->tdq_timeshare, tdq->tdq_ridx); >> 1333 if (td != NULL) { >> 1334 KASSERT(td->td_priority >= PRI_MIN_BATCH, >> 1335 ("tdq_choose: Invalid priority on timeshare queue %d", >> 1336 td->td_priority)); >> 1337 return (td); >> 1338 } >> >> (kgdb) bt >> #0 doadump (textdump=-1051128300) at pcpu.h:233 >> #1 0xc052766d in db_fncall (dummy1=-1051051648, dummy2=0, dummy3=-1051063684, >> dummy4=0xc15a0a54 "") at /usr/src/sys/ddb/db_command.c:578 >> #2 0xc0527357 in db_command (cmd_table=) >> at /usr/src/sys/ddb/db_command.c:449 >> #3 0xc0527090 in db_command_loop () at /usr/src/sys/ddb/db_command.c:502 >> #4 0xc0529922 in db_trap (type=, code=0) >> at /usr/src/sys/ddb/db_main.c:231 >> #5 0xc0b0ff38 in kdb_trap (type=, >> code=, tf=) >> at /usr/src/sys/kern/subr_kdb.c:656 >> #6 0xc0fc0c07 in trap (frame=) >> at /usr/src/sys/i386/i386/trap.c:712 >> #7 0xc0faa0ec in calltrap () at /usr/src/sys/i386/i386/exception.s:170 >> #8 0xc0b0f7bd in kdb_enter (why=0xc112ee39 "panic", msg=) >> at cpufunc.h:71 >> #9 0xc0ad6a93 in vpanic (fmt=, ap=) >> at /usr/src/sys/kern/kern_shutdown.c:747 >> #10 0xc0ad6ad2 in panic (fmt=0xc12f2aea "double fault") >> at /usr/src/sys/kern/kern_shutdown.c:683 >> #11 0xc0fc14fb in dblfault_handler () at /usr/src/sys/i386/i386/trap.c:1072 >> #12 0x00000000 in ?? () > > It seems to be a corruption of the td and probably curthread. > > Is it repeatable easily ? If yes, you could try to manually inspect first > elements in the (idle) runq queue of the tdq_cpu[paniced cpu]. It took a while, but I just got another double fault, though this one is somewhat different. This time it trapped in cpu_switch(), which resulted in calls to trap()->printf()->...->putchar()->msgbuf_addstr()->_mtx_lock_spin_flags() where it trapped again. Sitting at DDB prompt ...