Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 12 Aug 2003 23:38:07 -0700
From:      Terry Lambert <tlambert2@mindspring.com>
To:        Peter Edwards <peter.edwards@openet-telecom.com>
Cc:        current@freebsd.org
Subject:   Re: 5.1, Data Corruption, Intel, Oh my! [patch] - Fatal trap 12
Message-ID:  <3F39DCCF.DB0B9C78@mindspring.com>
References:  <20030811100549.GA33392@technokratis.com> <20030811130937.GA34564@technokratis.com> <1060704284.45511.118.camel@rocklobster.openet-telecom.lan>

next in thread | previous in thread | raw e-mail | index | archive | help
Peter Edwards wrote:
> > ... He might also want to look for any function pointer
> > that takes 5 arguments;
> 
> Nice tactic, but misleading in this case, methinks.
> 
> I assume your basing this on the 5 arguments shown in the backtrace.
> The 5 arguments passed to the "function" at 0x5949 is probably just
> defaulted; I doubt it has any significance.
> 
> Long version:
> 
> ddb tries to work out the number of arguments passed to a function at a
> particular stack frame first based on symbolic information for the
> function itself (obviously not an option here), then based on the
> instruction at the return address in that frame. This works at best
> sporadically in the face of -O compiled C code. The fact that there's no
> function under the "(null)" would strongly suggest that ddb got confused
> with the frame pointer here and didn't get any useful information with
> which to work out the argument count.

I don't know how accurate this assumption is.  I don't thing
DDB is confused, because the NULL is consistent with the reported
fault address.  Even if we assume that it's confused, the PC is
enough information to locate the function pointer dereference that
is occurring.  I also have to assume that the function pointer is
in scope, since it's able to call through it to fault the kernel.


> In the face of failure, ddb just wildly prints out the 5 words under the
> stack pointer.

I did suggest that the correct thing to do would be to decode
what those words were pointing at, and thereby what types the
arguments were...


> Given that there's no real function at 0x5949, the stack frame won't
> have been set up at all, the frame pointer is still pointing to the
> caller's frame, which could be foobar anyway.

The stack frame is set up, since you don't run at all without
a stack, period.  The stack may be corrupt, in this case, but
that's an incredibly rare failure mode recently, and mostly
this still looks like a NULL pointer dereference to me.


> What can be useful is to print out the values on the stack symbolically.
> (in gdb,  p/a ((void **)$sp)[0]@100. I'm sure ddb can do something
> similar, but no idea how...). And hope to find the caller's return
> address lying in the output.

The best way would be to take a system dump, and then use GDB.

It turns out that, for the most part, you can rebuild a kernel
with the symbols, even if you didn't have one, and the names
you will get back will be "nearby"; hopefully, though, there's
a kernel.debug lying around for this thing.

In general, we'd be seeing people reporting this all over the
place, loudly, if it wasn't a custom kernel in the first place,
so I'd probably start there.

-- Terry



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?3F39DCCF.DB0B9C78>