Date: Tue, 18 Feb 2003 23:16:49 -0800 From: Terry Lambert <tlambert2@mindspring.com> To: Lars Eggert <larse@ISI.EDU> Cc: current@freebsd.org Subject: Re: panic starting gnome Message-ID: <3E532F61.653A09B0@mindspring.com> References: <3E52BB14.2040309@isi.edu>
next in thread | previous in thread | raw e-mail | index | archive | help
Lars Eggert wrote:
> Fatal trap 12: page fault while in kernel mode
> cpuid = 0; lapic.id = 00000000
> fault virtual address = 0x34
****
> fault code = supervisor read, page not present
> instruction pointer = 0x8:0xc01b28a6
[ ... ]
> kernel: type 12 trap, code=0
> Stopped at _mtx_lock_flags+0x26: cmpl $0xc03884a0,0(%esi)
[ ... ]
> trap_fatal(e91a5780,34,c0372ee0,2e4,c658e780) at trap_fatal+0x250
> trap_pfault(e91a5780,0,34,c03e0758,34) at trap_pfault+0x17a
> trap(c21a0018,10,c0360010,9e,34) at trap+0x3e5
> calltrap() at calltrap+0x5
> --- trap 0xc, eip = 0xc01b28a6, esp = 0xe91a57c0, ebp = 0xe91a57e0 ---
> _mtx_lock_flags(34,0,c035cf5f,9e,c658e780) at _mtx_lock_flags+0x26
**
Attempt to dereference the value "0x34" as if it were a pointer.
> namei(e91a5a44,c0207d5a,c749458c,0,c658e780) at namei+0x134
Called from here.
Debug:
1) Make sure that the kernel that has the fault was
created with "config -g", so that there is a debug
version of it lying around in the build directory.
2) Make sure that the kernel you installed is the
stripped version of the debug kernel (there are two
kernels created as a result of "config -g"; one is
"kernel.debug" (the debug version) and the other is
"kernel" (the stripped version).
3) If #1 and #2 are not true, then make them true, and
repeat the problem.
4) Boot a kernel that doesn't crash instead, so that you
can run the debugger.
5) Go to the build directory, and look at the faulting
code to see where it gets the value "0x34" to pass in
to the _mtx_lock_flags(); this is the bogus value. For
example, if you had a debug kernel for the kernel that
has the problem, and it was config'ed from i386 GENERIC,
you would use the following sequence of commands:
cd /sys/i386/compile/GENERIC
gdb -k kernel.debug
list namei+0x134
6) Change the code so the bogus value is no longer being
passed.
7) Live happily ever after.
Note that, to me, this looks like a problem with a dereference of a
"current" process which is not really current, as a result of a
wakeup occurring in an interrupt handler for an outstanding request
which was satisfied by the interrupt handler.
Note: Under no circumstances should a page 0 address be passed
around to anyone, since page zero is typically unmapped in
order to trigger NULL pointer dereference faults and/or
structure member reference faults for structure elements
(at least in the the initial 4K: range 0x00000000-0x00001000)
when a structure pointer itself is NULL.
IMO, the most likely cause is that you have a null structure
pointer, and the element at offset 0x34 into the structure is
being referenced out of it, without checking that the pointer
is not NULL, and the most likely culprit is a proc/kse/thread
type structure that's not guaranteed to be valid in interrupt
context.
Probably, the scheduler is switching directly from interrupt
of a process context "Q" to a wakeup of the same process "Q",
without restoring a register value that should normally be
restored following an interrupt. I have no idea which of the
schedulers you are using, so I have no idea if this should be
an expected omission; my best guess is you are using the new
one, though, because this is an unlikely problem with the old
one, if it's really a scheduler wakeup problem.
> namei(e91a5a44,c0207d5a,c749458c,0,c658e780) at namei+0x134
^
|
> vn_open_cred(e91a5a44,e91a5a0c,0,c2195e80,0) at vn_open_cred+0x53c
^ ^
| |
...all three of these are also incredibly suspicious, at first sight...
Until you are willing to list out the code where the bogus value is
being passed to the function call, there's no way any of us are
going to be able to correlate your stack traceback to our own source
trees, in order to be able to help you, unless you are running a
tagged veraion (e.g. 5.0-RELEASE) with no modifications.
Just saying "the most recent current" or "I CVS'up'ed on xxx date" is
really useless to us, because CVS mirrors don't contain well known
information relative to a CVS'up date. In many cases, we will need
you to check out (at least!) a fresh /sys source tree from the CVS
repository, using a date tage, if you are not running a -RELEASE
version. Yes, this is a long-standing problem with the FreeBSD
project itself.
If you can do this, and repeat the problem, then we can check out with
the same date tag, and determine what the code is supposed to be doing,
and what code you actually have, so we can narrow it down to setup, and
maybe fix it without having to rebuild an entire copy of the Internet,
from your machine's point of view. 8-).
Also, if your kernel configuration is different than the default, you
need to provide *DIFFS* -- DO NOT SEND THE WHOLE CONFIG FILE TO THE
LIST -- OR TO ME -- UNLESS YOU WANT TO BE IGNORED!
For a modified GENERIC config file from a checked out copy of the local
source tree, here is how you perform a context diff:
cd /sys/i386/conf
cvs diff -c GENERIC
-- Terry
To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-current" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?3E532F61.653A09B0>
