From owner-freebsd-emulation@FreeBSD.ORG Thu Nov 29 20:42:52 2007 Return-Path: Delivered-To: freebsd-emulation@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 859A016A418 for ; Thu, 29 Nov 2007 20:42:52 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from speedfactory.net (mail6.speedfactory.net [66.23.216.219]) by mx1.freebsd.org (Postfix) with ESMTP id 2990413C474 for ; Thu, 29 Nov 2007 20:42:51 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from server.baldwin.cx (unverified [66.23.211.162]) by speedfactory.net (SurgeMail 3.8p) with ESMTP id 221855654-1834499 for multiple; Thu, 29 Nov 2007 15:22:56 -0500 Received: from localhost.corp.yahoo.com (john@localhost [127.0.0.1]) (authenticated bits=0) by server.baldwin.cx (8.13.8/8.13.8) with ESMTP id lATKOk1U042281; Thu, 29 Nov 2007 15:24:46 -0500 (EST) (envelope-from jhb@freebsd.org) From: John Baldwin To: Juergen Lock Date: Thu, 29 Nov 2007 14:41:03 -0500 User-Agent: KMail/1.9.6 References: <20071118020533.GA57425@saturn.kn-bremen.de> <200711270824.55839.jhb@freebsd.org> <20071128235042.GA40147@saturn.kn-bremen.de> In-Reply-To: <20071128235042.GA40147@saturn.kn-bremen.de> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Content-Disposition: inline Message-Id: <200711291441.04134.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH authentication, not delayed by milter-greylist-2.0.2 (server.baldwin.cx [127.0.0.1]); Thu, 29 Nov 2007 15:24:47 -0500 (EST) X-Virus-Scanned: ClamAV 0.91.2/4954/Thu Nov 29 12:46:26 2007 on server.baldwin.cx X-Virus-Status: Clean X-Spam-Status: No, score=-4.4 required=4.2 tests=ALL_TRUSTED,AWL,BAYES_00 autolearn=ham version=3.1.3 X-Spam-Checker-Version: SpamAssassin 3.1.3 (2006-06-01) on server.baldwin.cx Cc: freebsd-hackers@freebsd.org, freebsd-emulation@freebsd.org Subject: Re: double panic, and whats apic_cmd? (kqemu crash...) X-BeenThere: freebsd-emulation@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Development of Emulators of other operating systems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 29 Nov 2007 20:42:52 -0000 On Wednesday 28 November 2007 06:50:42 pm Juergen Lock wrote: > On Tue, Nov 27, 2007 at 08:24:55AM -0500, John Baldwin wrote: > > On Sunday 18 November 2007 05:43:45 pm Juergen Lock wrote: > > > On Sun, Nov 18, 2007 at 03:05:33AM +0100, Juergen Lock wrote: > > > > Ok I finally have an amd64 smp box here that i can play with, and tried > > > > to reproduce http://www.freebsd.org/cgi/query-pr.cgi?pr=113430 - and I got > > > > the following crash: > > > >[...] > > > > > > Ok, the crashes seem to be pretty random, I got a few more: > > > (btw I disabled -DSMP in the kqemu build since it doesn't seem to help, > > > and it doesn't seem to be used anywhere else. Also I forgot to say > > > I also have KDB_TRACE and KDB_UNATTENDED in the kernel config. Oh and > > > I had a few hangs too, and never could get into ddb in those cases...) > > > > > > [GDB will not be able to debug user-mode threads: /usr/lib/libthread_db.so: Undefined symbol "ps_pglobal_lookup"] > > > GNU gdb 6.1.1 [FreeBSD] > > > Copyright 2004 Free Software Foundation, Inc. > > > GDB is free software, covered by the GNU General Public License, and you are > > > welcome to change it and/or distribute copies of it under certain conditions. > > > Type "show copying" to see the conditions. > > > There is absolutely no warranty for GDB. Type "show warranty" for details. > > > This GDB was configured as "amd64-marcel-freebsd". > > > > > > Unread portion of the kernel message buffer: > > > kernel trap 12 with interrupts disabled > > > > > > > > > Fatal trap 12: page fault while in kernel mode > > > cpuid = 1; apic id = 01 > > > fault virtual address = 0x246 > > > fault code = supervisor read instruction, page not present > > > instruction pointer = 0x8:0x246 > > > stack pointer = 0x10:0xffffffff9fae4b50 > > > frame pointer = 0x10:0xffffffff9fae4b80 > > > code segment = base 0x0, limit 0xfffff, type 0x1b > > > = DPL 0, pres 1, long 1, def32 0, gran 1 > > > processor eflags = resume, IOPL = 0 > > > current process = 11 (idle: cpu1) > > > trap number = 12 > > > <0> > > > > > > > > > Fatal trap 12: page fault while in kernel mode > > > cpuid = 1; apic id = 01 > > > fault virtual address = 0xc011dbfb > > > fault code = supervisor read instruction, page not present > > > instruction pointer = 0x8:0xc011dbfb > > > stack pointer = 0x10:0xffffffff9fae47d0 > > > frame pointer = 0x10:0x801de4000 > > > code segment = base 0x0, limit 0xfffff, type 0x1b > > > = DPL 0, pres 1, long 1, def32 0, gran 1 > > > processor eflags = trace trap, interrupt enabled, nested task, IOPL = 3 > > > current process = 11 (idle: cpu1) > > > trap number = 12 > > > panic: page fault > > > cpuid = 1 > > > KDB: stack backtrace: > > > db_trace_self_wrapper() at db_trace_self_wrapper+0x2a > > > panic() at panic+0x17a > > > trap_fatal() at trap_fatal+0x29f > > > trap_pfault() at trap_pfault+0x294 > > > trap() at trap+0x2ea > > > sendsig() at sendsig+0x2aa > > > sched_choose() at sched_choose+0x8c > > > choosethread() at choosethread+0x2b > > > sched_switch() at sched_switch+0x184 > > > mi_switch() at mi_switch+0x189 > > > ast() at ast+0x1e8 > > > doreti_ast() at doreti_ast+0x1f > > > Uptime: 37m8s > > > Physical memory: 986 MB > > > Dumping 152 MB: 137 121 105 89 73 57 41 25 9 > > > > > > #0 doadump () at pcpu.h:194 > > > 194 __asm __volatile("movq %%gs:0,%0" : "=r" (td)); > > > (kgdb) bt > > > #0 doadump () at pcpu.h:194 > > > #1 0xffffffff80484b18 in boot (howto=260) at ../../../kern/kern_shutdown.c:409 > > > #2 0xffffffff80484f77 in panic (fmt=Variable "fmt" is not available. > > > ) at ../../../kern/kern_shutdown.c:563 > > > #3 0xffffffff8070de6f in trap_fatal (frame=0xc, eva=Variable "eva" is not available. > > > ) > > > at ../../../amd64/amd64/trap.c:697 > > > #4 0xffffffff8070e254 in trap_pfault (frame=0xffffffff9fae4720, usermode=0) > > > at ../../../amd64/amd64/trap.c:614 > > > #5 0xffffffff8070ec0a in trap (frame=0xffffffff9fae4720) > > > at ../../../amd64/amd64/trap.c:383 > > > #6 0xffffffff806fcd4a in sendsig (catcher=0x405460, ksi=Variable "ksi" is not available. > > > ) > > > at ../../../amd64/amd64/machdep.c:326 > > > #7 0xffffffff804a16ec in sched_choose () at ../../../kern/sched_4bsd.c:1256 > > > #8 0xffffffff804a174b in choosethread () at kern_switch.c:137 > > > #9 0xffffffff804a2984 in sched_switch (td=0xffffff000209b680, > > > newtd=0xffffff00021a18c0, flags=13) at ../../../kern/sched_4bsd.c:907 > > > #10 0xffffffff8048cc99 in mi_switch (flags=2, newtd=0x0) > > > at ../../../kern/kern_synch.c:442 > > > #11 0xffffffff804b7068 in ast (framep=0xffffffff9fae4c70) > > > at ../../../kern/subr_trap.c:239 > > > #12 0xffffffff806f4999 in doreti_ast () at ../../../amd64/amd64/exception.S:468 > > > #13 0x0000000811d87d74 in ?? () > > > #14 0x0000000000000005 in ?? () > > > #15 0x00000000000010e0 in ?? () > > > ---Type to continue, or q to quit--- > > > #16 0x0000000811d87d8c in ?? () > > > #17 0x0000000801de4000 in ?? () > > > #18 0x0000000741e00000 in ?? () > > > #19 0x000000000215dd30 in ?? () > > > #20 0x0000000000d49160 in ?? () > > > #21 0x00000000c016fdf0 in ?? () > > > #22 0x0000000000000000 in ?? () > > > #23 0x0000000801de84d0 in ?? () > > > #24 0xffffffffbfffffff in ?? () > > > #25 0x0000000000063fff in ?? () > > > #26 0x0000000801de4000 in ?? () > > > #27 0x0000000000063fff in ?? () > > > #28 0x0000000000000016 in ?? () > > > #29 0x0000000000000000 in ?? () > > > #30 0x0000000000000000 in ?? () > > > #31 0x0000000000000000 in ?? () > > > #32 0x000000000215dd0c in ?? () > > > #33 0x000000000000002b in ?? () > > > #34 0x0000000000000286 in ?? () > > > #35 0x00007fffffffb608 in ?? () > > > #36 0x0000000000000023 in ?? () > > > #37 0x0000000000000000 in ?? () > > > #38 0x0000000000000000 in ?? () > > > ---Type to continue, or q to quit--- > > > #39 0x0000000000c9f000 in ?? () > > > #40 0x00000000fffffffd in ?? () > > > #41 0xffffff0001080460 in ?? () > > > #42 0xffffff000209b680 in ?? () > > > #43 0x0000000000000001 in ?? () > > > #44 0xffffffff9fae4bb0 in ?? () > > > #45 0xffffffff9fae4b68 in ?? () > > > #46 0xffffff00010819c0 in ?? () > > > #47 0xffffffff804a2984 in sched_switch (td=0xd49160, newtd=0x63fff, > > > flags=409599) at ../../../kern/sched_4bsd.c:907 > > > Previous frame inner to this frame (corrupt stack?) > > > (kgdb) q > > > iapetus# exit > > > > > > and > > > > > > [GDB will not be able to debug user-mode threads: /usr/lib/libthread_db.so: Undefined symbol "ps_pglobal_lookup"] > > > GNU gdb 6.1.1 [FreeBSD] > > > Copyright 2004 Free Software Foundation, Inc. > > > GDB is free software, covered by the GNU General Public License, and you are > > > welcome to change it and/or distribute copies of it under certain conditions. > > > Type "show copying" to see the conditions. > > > There is absolutely no warranty for GDB. Type "show warranty" for details. > > > This GDB was configured as "amd64-marcel-freebsd". > > > > > > Unread portion of the kernel message buffer: > > > kernel trap 12 with interrupts disabled > > > > > > > > > Fatal trap 0: while in kernel mode > > > cpuid = 0; apic id = 00 > > > instruction pointer = 0x4300:0xffffffff9fae41c0 > > > stack pointer = 0x10:0xffffffff9fae4190 > > > frame pointer = 0x10:0x5 > > > code segment = base 0x0, limit 0x0, type 0x0 > > > = DPL 0, pres 0, long 0, def32 0, gran 0 > > > processor eflags = resume, IOPL = 0 > > > current process = 904 (qemu-system-x86_64) > > > trap number = kernel trap 12 with interrupts disabled > > > > > > > > > Fatal trap 12: page fault while in kernel mode > > > cpuid = 0; apic id = 00 > > > fault virtual address = 0x46 > > > fault code = supervisor read data, page not present > > > instruction pointer = 0x8:0xffffffff804aff9d > > > stack pointer = 0x10:0xffffffff9fae3d20 > > > frame pointer = 0x10:0xffffffff9fae3e80 > > > code segment = base 0x0, limit 0xfffff, type 0x1b > > > = DPL 0, pres 1, long 1, def32 0, gran 1 > > > processor eflags = resume, IOPL = 0 > > > current process = 904 (qemu-system-x86_64) > > > trap number = 12 > > > panic: page fault > > > cpuid = 0 > > > KDB: stack backtrace: > > > db_trace_self_wrapper() at db_trace_self_wrapper+0x2a > > > panic() at panic+0x17a > > > trap_fatal() at trap_fatal+0x29f > > > trap() at trap+0x242 > > > calltrap() at calltrap+0x8 > > > --- trap 0xc, rip = 0xffffffff804aff9d, rsp = 0xffffffff9fae3d20, rbp = 0xffffffff9fae3e80 --- > > > kvprintf() at kvprintf+0x11ed > > > printf() at printf+0xa4 > > > uart_z8530_class() at 0x3386 > > > swapb.6687() at swapb.6687+0x13f > > > Uptime: 19m14s > > > Physical memory: 986 MB > > > Dumping 113 MB: (CTRL-C to abort) 98 82 66 (CTRL-C to abort) 50 34 18 2 > > > > > > #0 doadump () at pcpu.h:194 > > > 194 __asm __volatile("movq %%gs:0,%0" : "=r" (td)); > > > (kgdb) bt > > > #0 doadump () at pcpu.h:194 > > > #1 0xffffffff80484b18 in boot (howto=260) at ../../../kern/kern_shutdown.c:409 > > > #2 0xffffffff80484f77 in panic (fmt=Variable "fmt" is not available. > > > ) at ../../../kern/kern_shutdown.c:563 > > > #3 0xffffffff8070de6f in trap_fatal (frame=0xc, eva=Variable "eva" is not available. > > > ) > > > at ../../../amd64/amd64/trap.c:697 > > > #4 0xffffffff8070eb62 in trap (frame=0xffffffff9fae3c70) > > > at ../../../amd64/amd64/trap.c:248 > > > #5 0xffffffff806f3e0e in calltrap () at ../../../amd64/amd64/exception.S:169 > > > #6 0xffffffff804aff9d in kvprintf (fmt=0xffffffff807febff "\n", > > > func=0xffffffff804b07d0 , arg=0xffffffff9fae3e90, radix=10, > > > ap=0xffffffff9fae3ec0) at ../../../kern/subr_prf.c:819 > > > #7 0xffffffff804b0284 in printf (fmt=Variable "fmt" is not available. > > > ) at ../../../kern/subr_prf.c:314 > > > #8 0x0000000000003386 in ?? () > > > #9 0xffffffff9fae4090 in ?? () > > > #10 0xffffffff806f4667 in Xtimerint () at apic_vector.S:103 > > > Previous frame identical to this frame (corrupt stack?) > > > (kgdb) q > > > iapetus# exit > > > > > > Script done on Sun Nov 18 19:11:41 2007 > > > > > > and: > > > > > > [GDB will not be able to debug user-mode threads: /usr/lib/libthread_db.so: Undefined symbol "ps_pglobal_lookup"] > > > GNU gdb 6.1.1 [FreeBSD] > > > Copyright 2004 Free Software Foundation, Inc. > > > GDB is free software, covered by the GNU General Public License, and you are > > > welcome to change it and/or distribute copies of it under certain conditions. > > > Type "show copying" to see the conditions. > > > There is absolutely no warranty for GDB. Type "show warranty" for details. > > > This GDB was configured as "amd64-marcel-freebsd". > > > > > > Unread portion of the kernel message buffer: > > > kernel trap 12 with interrupts disabled > > > > > > > > > Fatal trap 12: page fault while in kernel mode > > > cpuid = 0; apic id = 00 > > > fault virtual address = 0xd > > > fault code = supervisor read data, page not present > > > instruction pointer = 0x8:0xffffffff8073d743 > > > stack pointer = 0x10:0xffffffff9fae4610 > > > frame pointer = 0x10:0x0 > > > code segment = base 0x0, limit 0xfffff, type 0x1b > > > = DPL 0, pres 1, long 1, def32 0, gran 1 > > > processor eflags = resume, IOPL = 0 > > > current process = 948 (qemu-system-x86_64) > > > trap number = 12 > > > panic: page fault > > > cpuid = 0 > > > KDB: stack backtrace: > > > db_trace_self_wrapper() at db_trace_self_wrapper+0x2a > > > panic() at panic+0x17a > > > trap_fatal() at trap_fatal+0x29f > > > dmapbase() at 0xffffff0001080460 > > > dmapbase() at 0xffffff00010819c0 > > > Uptime: 23m57s > > > Physical memory: 986 MB > > > Dumping 152 MB: 137 121 105 89 73 57 41 25 9 > > > > > > #0 doadump () at pcpu.h:194 > > > 194 __asm __volatile("movq %%gs:0,%0" : "=r" (td)); > > > (kgdb) bt > > > #0 doadump () at pcpu.h:194 > > > #1 0xffffffff80484b18 in boot (howto=260) at ../../../kern/kern_shutdown.c:409 > > > #2 0xffffffff80484f77 in panic (fmt=Variable "fmt" is not available. > > > ) at ../../../kern/kern_shutdown.c:563 > > > #3 0xffffffff8070de6f in trap_fatal (frame=0xc, eva=Variable "eva" is not available. > > > ) > > > at ../../../amd64/amd64/trap.c:697 > > > #4 0xffffff0001080460 in ?? () > > > #5 0xffffffff80a4d8a0 in lapics () > > > #6 0xffffff00010819c0 in ?? () > > > #7 0x0000000000000000 in ?? () > > > #8 0xffffff0001055600 in ?? () > > > #9 0xffffffff9fae44e0 in ?? () > > > #10 0xffffffff8044ffed in hardclock_cpu (usermode=Variable "usermode" is not available. > > > ) > > > at ../../../kern/kern_clock.c:224 > > > #11 0xffffff00010819c0 in ?? () > > > #12 0x0000000000000000 in ?? () > > > #13 0xffffff000215b000 in ?? () > > > #14 0xffffffff9fae4610 in ?? () > > > #15 0xffffff000215b000 in ?? () > > > #16 0x0000000000000000 in ?? () > > > #17 0xffffffff80a26430 in main_console () > > > #18 0x00000000000213bf in ?? () > > > #19 0xffffff00010819c0 in ?? () > > > #20 0x0000000000000000 in ?? () > > > ---Type to continue, or q to quit--- > > > #21 0x0000000000000000 in ?? () > > > #22 0xffffffff80a2fd78 in runq () > > > #23 0xffffff000215b000 in ?? () > > > #24 0x0000000000000001 in ?? () > > > #25 0xffffffff8047953c in _mtx_lock_spin (m=0xffffffff80a26430, tid=136126, > > > opts=Variable "opts" is not available. > > > ) at cpufunc.h:343 > > > Previous frame inner to this frame (corrupt stack?) > > > (kgdb) q > > > iapetus# exit > > > > > > kgdb still seems to be kind of confused tho, afaict runq is a variable > > > not a function... Anyone can make head or tail of these crashes? > > > > I would check your hardware for bad RAM, etc. > > Well, I doubt its that... It works when running a up kernel, and it works > on a 6.3beta2 i386 install on the same box with smp. Also I haven't > seen any crashes on that box yet other than from this amd64 kqemu on the > smp kernel (it also survived building a world and kernel with -j4), > actually I haven't received reports of kqemu/amd64/smp actually working > for anyone. (do you want to try? :) I _suspect_ kqemu/amd64 is doing > either things differently than on i386, or differences between the > i386 and amd64 kernels trigger the problem. > > Fwiw, I have a report of kqemu/amd64 crashing the host on a linux smp host > too, tho there only with a windows guest; linux guests (which I was testing) > seem to work there. > > Oh and I left memtest86 running on that box overnight and it found nothing... well, it could be a kqemu bug I guess, but your panics look like seemingly random memory corruptino as you have stack traces where functions are calling other functions that the don't actually call in the source code. -- John Baldwin