Date: Tue, 18 Feb 2003 23:16:49 -0800 From: Terry Lambert <tlambert2@mindspring.com> To: Lars Eggert <larse@ISI.EDU> Cc: current@freebsd.org Subject: Re: panic starting gnome Message-ID: <3E532F61.653A09B0@mindspring.com> References: <3E52BB14.2040309@isi.edu>
next in thread | previous in thread | raw e-mail | index | archive | help
Lars Eggert wrote: > Fatal trap 12: page fault while in kernel mode > cpuid = 0; lapic.id = 00000000 > fault virtual address = 0x34 **** > fault code = supervisor read, page not present > instruction pointer = 0x8:0xc01b28a6 [ ... ] > kernel: type 12 trap, code=0 > Stopped at _mtx_lock_flags+0x26: cmpl $0xc03884a0,0(%esi) [ ... ] > trap_fatal(e91a5780,34,c0372ee0,2e4,c658e780) at trap_fatal+0x250 > trap_pfault(e91a5780,0,34,c03e0758,34) at trap_pfault+0x17a > trap(c21a0018,10,c0360010,9e,34) at trap+0x3e5 > calltrap() at calltrap+0x5 > --- trap 0xc, eip = 0xc01b28a6, esp = 0xe91a57c0, ebp = 0xe91a57e0 --- > _mtx_lock_flags(34,0,c035cf5f,9e,c658e780) at _mtx_lock_flags+0x26 ** Attempt to dereference the value "0x34" as if it were a pointer. > namei(e91a5a44,c0207d5a,c749458c,0,c658e780) at namei+0x134 Called from here. Debug: 1) Make sure that the kernel that has the fault was created with "config -g", so that there is a debug version of it lying around in the build directory. 2) Make sure that the kernel you installed is the stripped version of the debug kernel (there are two kernels created as a result of "config -g"; one is "kernel.debug" (the debug version) and the other is "kernel" (the stripped version). 3) If #1 and #2 are not true, then make them true, and repeat the problem. 4) Boot a kernel that doesn't crash instead, so that you can run the debugger. 5) Go to the build directory, and look at the faulting code to see where it gets the value "0x34" to pass in to the _mtx_lock_flags(); this is the bogus value. For example, if you had a debug kernel for the kernel that has the problem, and it was config'ed from i386 GENERIC, you would use the following sequence of commands: cd /sys/i386/compile/GENERIC gdb -k kernel.debug list namei+0x134 6) Change the code so the bogus value is no longer being passed. 7) Live happily ever after. Note that, to me, this looks like a problem with a dereference of a "current" process which is not really current, as a result of a wakeup occurring in an interrupt handler for an outstanding request which was satisfied by the interrupt handler. Note: Under no circumstances should a page 0 address be passed around to anyone, since page zero is typically unmapped in order to trigger NULL pointer dereference faults and/or structure member reference faults for structure elements (at least in the the initial 4K: range 0x00000000-0x00001000) when a structure pointer itself is NULL. IMO, the most likely cause is that you have a null structure pointer, and the element at offset 0x34 into the structure is being referenced out of it, without checking that the pointer is not NULL, and the most likely culprit is a proc/kse/thread type structure that's not guaranteed to be valid in interrupt context. Probably, the scheduler is switching directly from interrupt of a process context "Q" to a wakeup of the same process "Q", without restoring a register value that should normally be restored following an interrupt. I have no idea which of the schedulers you are using, so I have no idea if this should be an expected omission; my best guess is you are using the new one, though, because this is an unlikely problem with the old one, if it's really a scheduler wakeup problem. > namei(e91a5a44,c0207d5a,c749458c,0,c658e780) at namei+0x134 ^ | > vn_open_cred(e91a5a44,e91a5a0c,0,c2195e80,0) at vn_open_cred+0x53c ^ ^ | | ...all three of these are also incredibly suspicious, at first sight... Until you are willing to list out the code where the bogus value is being passed to the function call, there's no way any of us are going to be able to correlate your stack traceback to our own source trees, in order to be able to help you, unless you are running a tagged veraion (e.g. 5.0-RELEASE) with no modifications. Just saying "the most recent current" or "I CVS'up'ed on xxx date" is really useless to us, because CVS mirrors don't contain well known information relative to a CVS'up date. In many cases, we will need you to check out (at least!) a fresh /sys source tree from the CVS repository, using a date tage, if you are not running a -RELEASE version. Yes, this is a long-standing problem with the FreeBSD project itself. If you can do this, and repeat the problem, then we can check out with the same date tag, and determine what the code is supposed to be doing, and what code you actually have, so we can narrow it down to setup, and maybe fix it without having to rebuild an entire copy of the Internet, from your machine's point of view. 8-). Also, if your kernel configuration is different than the default, you need to provide *DIFFS* -- DO NOT SEND THE WHOLE CONFIG FILE TO THE LIST -- OR TO ME -- UNLESS YOU WANT TO BE IGNORED! For a modified GENERIC config file from a checked out copy of the local source tree, here is how you perform a context diff: cd /sys/i386/conf cvs diff -c GENERIC -- Terry To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-current" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?3E532F61.653A09B0>