Date: Fri, 12 Nov 2004 23:45:19 -0800 (PST) From: Matthew Dillon <dillon@apollo.backplane.com> To: Sean Farley <sean-freebsd@farley.org> Cc: freebsd-hackers@freebsd.org Subject: Re: bugs in contigmalloc*() related to "page not found in hash" panics Message-ID: <200411130745.iAD7jJR5079986@apollo.backplane.com> References: <200411101801.iAAI1SkK061883@apollo.backplane.com> <200411110651.iAB6pekO065188@apollo.backplane.com> <20041112212917.L1667@thor.farley.org>
next in thread | previous in thread | raw e-mail | index | archive | help
:Unfortunately, it is the binary driver from Nvidia. Maybe someone using :DragonFly is having similar problems? Not that I know of. There's not much that can be done with binary-only drivers short of throwing them away and finding hardware that works with normal drivers. :I ran the program on the vmcore and debug kernel from the recent crash :since the vmcore with the "page not found in hash" panic has long since :been deleted. As expected, the program showed no problem with the :vmcore. If you ever get the page not found in hash panic again while running FreeBSD, running that program on the kernel core may help the FreeBSD folks track the problem down (if it turns out not to be the contigmalloc bug that I pointed out earlier). :> : Fatal trap 12: page fault while in kernel mode :> : fault virtual address = 0x30 :> : fault code = supervisor read, page not present :... : :I will attach it , and I will also send it to Nvidia as I did once many :moons ago. One interesting symptom that I just noticed very close to :the time of instability is this message from /var/log/messages: I couldn't get much out of the backtrace. It looks like the filesystem is trying to generate a core file and the vn_open() call has failed and is trying to cleanup. The cleanup code looks ok and the vnode is nothing special. It seems to have failed trying to do a NULL pointer dereference of the inode pointer, but it's hard to tell because most of the local variables look garbaged up (which is to be expected for a kernel compiled -O since the registers those variables are stored in are not accessible to the dump). Those are code paths that are usually pretty widely exercised in the system, though. :Here is near the end of strings output of vmcore just before panic: : :<118>Wed Nov 10 22:46:44 CST 2004 :<3>stray irq 7 :<118>Nov 10 22:47:14 thor /kernel: stray irq 7 :<3>stray irq 7 :<3>stray irq 7 :<118>Nov 10 22:47:46 thor last message repeated 2 times : :The parallel port is disabled, and I do not see these messages without :the Nvidia driver. Yah, I'm afraid I there's nothing there that rings a bell. Again the problem with using a binary-only driver is that there is never any visibility into it and no way to fix bugs. It makes such drivers of strictly limited utility. :> kernel (assuming the kernel is compiled with options INVARIANTS and :> options INVARIANT_SUPPORT) mostly preclude an error path to this :> panic from the pmap code. However, pmap panics could be related to :> corrupted VM pages. : :I have not tried compiling these options into the kernel. Sometime this :weekend I will give them a shot. I recommend that even production machines always be run with INVARIANTS and INVARIANT_SUPPORT. -Matt :Thank you for your help and the detailed description of the bug :(tricksy, sneaky bug) you fixed. : :Sean
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200411130745.iAD7jJR5079986>