From owner-freebsd-stable@FreeBSD.ORG Fri Dec 20 17:34:30 2013 Return-Path: Delivered-To: stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 7E44C8E7 for ; Fri, 20 Dec 2013 17:34:30 +0000 (UTC) Received: from mail.egr.msu.edu (boomhauer.egr.msu.edu [35.9.37.167]) by mx1.freebsd.org (Postfix) with ESMTP id 4CB4D1872 for ; Fri, 20 Dec 2013 17:34:29 +0000 (UTC) Received: from boomhauer (localhost [127.0.0.1]) by mail.egr.msu.edu (Postfix) with ESMTP id DF23B2E6C5 for ; Fri, 20 Dec 2013 12:34:22 -0500 (EST) X-Virus-Scanned: amavisd-new at egr.msu.edu Received: from mail.egr.msu.edu ([127.0.0.1]) by boomhauer (boomhauer.egr.msu.edu [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id hdBd_ulROdb8 for ; Fri, 20 Dec 2013 12:34:22 -0500 (EST) Received: from daemon.localdomain (daemon.egr.msu.edu [35.9.44.65]) by mail.egr.msu.edu (Postfix) with ESMTP id AE3072E6C0 for ; Fri, 20 Dec 2013 12:34:22 -0500 (EST) Received: by daemon.localdomain (Postfix, from userid 21281) id A8BB935309; Fri, 20 Dec 2013 12:34:22 -0500 (EST) Date: Fri, 20 Dec 2013 12:34:22 -0500 From: Adam McDougall To: stable@freebsd.org Subject: "panic: vm_fault: fault on nofault entry" in nvidia module on 10 Message-ID: <20131220173422.GA1556@egr.msu.edu> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.5.22 (2013-10-16) X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.17 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 20 Dec 2013 17:34:30 -0000 I know I should submit a PR and I fully intend to, but I don't have all the details gathered yet and had to defer to more pressing bugs or issues. But since 10.0 is very near, I should say at least something. 6 times on my home desktop, and twice this week on my work desktop I've had a kernel panic that looks like it came from inside the nvidia kernel module: Info from /var/crash/core.txt.#: Unread portion of the kernel message buffer: [175718] panic: vm_fault: fault on nofault entry, addr: fffffe0005f13000 [175718] cpuid = 3 [175718] Uptime: 2d0h48m38s [175718] Dumping 5442 out of 16321 MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91% Reading symbols from /boot/kernel/zfs.ko.symbols...done. Loaded symbols for /boot/kernel/zfs.ko.symbols Reading symbols from /boot/kernel/opensolaris.ko.symbols...done. Loaded symbols for /boot/kernel/opensolaris.ko.symbols Reading symbols from /boot/kernel/linux.ko.symbols...done. Loaded symbols for /boot/kernel/linux.ko.symbols Reading symbols from /boot/modules/vboxdrv.ko...done. Loaded symbols for /boot/modules/vboxdrv.ko Reading symbols from /boot/modules/nvidia.ko...done. Loaded symbols for /boot/modules/nvidia.ko #0 doadump (textdump=1) at pcpu.h:219 219 pcpu.h: No such file or directory. in pcpu.h (kgdb) #0 doadump (textdump=1) at pcpu.h:219 #1 0xffffffff805cb045 in kern_reboot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:447 #2 0xffffffff805cb424 in panic (fmt=) at /usr/src/sys/kern/kern_shutdown.c:754 #3 0xffffffff807c811d in vm_fault_hold (map=0xfffff80002000000, vaddr=, fault_type=1 '\001', fault_flags=0, m_hold=0x0) at /usr/src/sys/vm/vm_fault.c:279 #4 0xffffffff807c6be7 in vm_fault (map=0xfffff80002000000, vaddr=, fault_type=1 '\001', fault_flags=0) at /usr/src/sys/vm/vm_fault.c:224 #5 0xffffffff8080d01b in trap_pfault (frame=0xfffffe08491e5630, usermode=0) at /usr/src/sys/amd64/amd64/trap.c:775 #6 0xffffffff8080c8d6 in trap (frame=0xfffffe08491e5630) at /usr/src/sys/amd64/amd64/trap.c:463 #7 0xffffffff807f2ca2 in calltrap () at /usr/src/sys/amd64/amd64/exception.S:232 #8 0xffffffff8129056b in _nv000222rm () from /boot/modules/nvidia.ko #9 0xfffffe000bfd0000 in ?? () #10 0xfffff8008597ac00 in ?? () #11 0xfffffe08491e5820 in ?? () #12 0xfffff8000dc58c00 in ?? () #13 0xfffff8008597ac00 in ?? () #14 0xffffffff81781558 in _nv000768rm () from /boot/modules/nvidia.ko #15 0xfffffe000bfd0000 in ?? () #16 0xfffff8008597ac00 in ?? () #17 0xfffffe08491e5820 in ?? () #18 0xfffff8000dc58c00 in ?? () #19 0xfffff8008597ac00 in ?? () #20 0xffffffff817838c6 in rm_free_unused_clients () from /boot/modules/nvidia.ko #21 0x0000000000018764 in ?? () #22 0x134198d054ad9910 in ?? () #23 0x134198d14318c110 in ?? () #24 0x134198d14318c110 in ?? () #25 0x134198d0cbe32d10 in ?? () #26 0x0000000000000000 in ?? () Current language: auto; currently minimal (kgdb) Other traces I've found are similar but not necessarily exact, although they are ALL in nvidia.ko. My home desktop crashed with this panic string on Nov 6, Nov 25, Dec 4 (twice), Dec 14, and Dec 15th. Often it would crash when I was opening thunderbird on the right monitor which is rotated vertically. One of the panics was with the new nvidia driver from ports/184352, which I quickly abandoned for now because it didn't solve the panic and it caused my second monitor to be rotated wrongly. Otherwise, nvidia-driver-319.32 driving a 'G96 [Quadro FX 380]'. When I saw the very first panic I knew I should report it, but it sure seemed odd that only the one desktop was having trouble. Not having time every night lately to debug this properly, I was starting to blame the hardware until it started happening on my work desktop too. Now I have to take it seriously, although I'll only have remote access to my work desktop for the rest of the year after I go home today. My work desktop crashed with this panic string on Dec 18th and 20th, but both times it happened when I was trying to start a VM in VirtualBox. Both my monitors are rotated vertically at work (in case this is a factor). Only nvidia-driver-319.32, driving a 'G92 [GeForce 8800 GT]'. Both of these computers have built-in Intel graphics of some sort, but I'm pretty sure I'd just be running away from the problem if I went that route, as interesting as it may be. All I really need is decent performance with the ability to rotate one or both DVI digital outputs. I have not configured Intel graphics for X in years so that is even lower on the list. I don't think the build of 10 has made any difference. Some of the panics were on r257230 (BETA2-ish) and the more recent ones on r258899 (BETA4-ish). The nvidia driver was always compiled by a similar version jail in poudriere (I don't have exact details, nor did I think of trying to compile locally yet). FreeBSD 9.x was always fine in this regard. Both of these systems used to run 9.x before I switched to a fresh install of 10 in a new zfs. I feel I could pretty easily agitate my home computer into panicing if I had a set of things to try at home, but I was hoping to think of a way to make more symbols show up in the nvidia module so the backtrace would make sense. It's largest component is a binary blob from the source which claims to be unstripped (as does the resulting nvidia.ko, also based on its size). Anyone else seen this? Anyone have any tips to try, or think of tests scenarios I should explore that might help track it down? A way to see symbols from the nvidia driver in a backtrace? I can think of some ideas to try such as dumping rotation and I was planning on brainstorming a more concrete example with more info, but I'm running low on time to spend on it this year and 10.0 is at the door. I'm not so much concerned about this issue for my own sake, but for the greater good, assuming someone else will fall into it. I'll plug away at it as I have time but good suggestions might help my efficiency. Thanks.