From owner-freebsd-current@FreeBSD.ORG Tue Aug 12 23:39:08 2003 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 4E64537B401 for ; Tue, 12 Aug 2003 23:39:08 -0700 (PDT) Received: from heron.mail.pas.earthlink.net (heron.mail.pas.earthlink.net [207.217.120.189]) by mx1.FreeBSD.org (Postfix) with ESMTP id 5E15943F3F for ; Tue, 12 Aug 2003 23:39:07 -0700 (PDT) (envelope-from tlambert2@mindspring.com) Received: from user-38lc156.dialup.mindspring.com ([209.86.4.166] helo=mindspring.com) by heron.mail.pas.earthlink.net with asmtp (SSLv3:RC4-MD5:128) (Exim 3.33 #1) id 19mpHg-0005Qs-00; Tue, 12 Aug 2003 23:39:01 -0700 Message-ID: <3F39DCCF.DB0B9C78@mindspring.com> Date: Tue, 12 Aug 2003 23:38:07 -0700 From: Terry Lambert X-Mailer: Mozilla 4.79 [en] (Win98; U) X-Accept-Language: en MIME-Version: 1.0 To: Peter Edwards References: <20030811100549.GA33392@technokratis.com> <20030811130937.GA34564@technokratis.com> <1060704284.45511.118.camel@rocklobster.openet-telecom.lan> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-ELNK-Trace: b1a02af9316fbb217a47c185c03b154d40683398e744b8a4f6a28877877a05a02cdbdaf91dd34e0d667c3043c0873f7e350badd9bab72f9c350badd9bab72f9c cc: Bosko Milekic cc: current@freebsd.org Subject: Re: 5.1, Data Corruption, Intel, Oh my! [patch] - Fatal trap 12 X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 13 Aug 2003 06:39:08 -0000 Peter Edwards wrote: > > ... He might also want to look for any function pointer > > that takes 5 arguments; > > Nice tactic, but misleading in this case, methinks. > > I assume your basing this on the 5 arguments shown in the backtrace. > The 5 arguments passed to the "function" at 0x5949 is probably just > defaulted; I doubt it has any significance. > > Long version: > > ddb tries to work out the number of arguments passed to a function at a > particular stack frame first based on symbolic information for the > function itself (obviously not an option here), then based on the > instruction at the return address in that frame. This works at best > sporadically in the face of -O compiled C code. The fact that there's no > function under the "(null)" would strongly suggest that ddb got confused > with the frame pointer here and didn't get any useful information with > which to work out the argument count. I don't know how accurate this assumption is. I don't thing DDB is confused, because the NULL is consistent with the reported fault address. Even if we assume that it's confused, the PC is enough information to locate the function pointer dereference that is occurring. I also have to assume that the function pointer is in scope, since it's able to call through it to fault the kernel. > In the face of failure, ddb just wildly prints out the 5 words under the > stack pointer. I did suggest that the correct thing to do would be to decode what those words were pointing at, and thereby what types the arguments were... > Given that there's no real function at 0x5949, the stack frame won't > have been set up at all, the frame pointer is still pointing to the > caller's frame, which could be foobar anyway. The stack frame is set up, since you don't run at all without a stack, period. The stack may be corrupt, in this case, but that's an incredibly rare failure mode recently, and mostly this still looks like a NULL pointer dereference to me. > What can be useful is to print out the values on the stack symbolically. > (in gdb, p/a ((void **)$sp)[0]@100. I'm sure ddb can do something > similar, but no idea how...). And hope to find the caller's return > address lying in the output. The best way would be to take a system dump, and then use GDB. It turns out that, for the most part, you can rebuild a kernel with the symbols, even if you didn't have one, and the names you will get back will be "nearby"; hopefully, though, there's a kernel.debug lying around for this thing. In general, we'd be seeing people reporting this all over the place, loudly, if it wasn't a custom kernel in the first place, so I'd probably start there. -- Terry