From owner-freebsd-current@FreeBSD.ORG Thu Jul 8 12:32:10 2010 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 8A6911065673; Thu, 8 Jul 2010 12:32:10 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from cyrus.watson.org (cyrus.watson.org [65.122.17.42]) by mx1.freebsd.org (Postfix) with ESMTP id 47C328FC0A; Thu, 8 Jul 2010 12:32:10 +0000 (UTC) Received: from bigwig.baldwin.cx (66.111.2.69.static.nyinternet.net [66.111.2.69]) by cyrus.watson.org (Postfix) with ESMTPSA id D448646B66; Thu, 8 Jul 2010 08:32:09 -0400 (EDT) Received: from jhbbsd.localnet (smtp.hudson-trading.com [209.249.190.9]) by bigwig.baldwin.cx (Postfix) with ESMTPSA id 061A18A04E; Thu, 8 Jul 2010 08:32:09 -0400 (EDT) From: John Baldwin To: freebsd-current@freebsd.org Date: Thu, 8 Jul 2010 08:26:32 -0400 User-Agent: KMail/1.13.5 (FreeBSD/7.3-CBSD-20100217; KDE/4.4.5; amd64; ; ) References: <201007021146.46542.naylor.b.david@gmail.com> <201007021855.42103.naylor.b.david@gmail.com> In-Reply-To: <201007021855.42103.naylor.b.david@gmail.com> MIME-Version: 1.0 Content-Type: Text/Plain; charset="iso-8859-15" Content-Transfer-Encoding: quoted-printable Message-Id: <201007080826.32764.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.0.1 (bigwig.baldwin.cx); Thu, 08 Jul 2010 08:32:09 -0400 (EDT) X-Virus-Scanned: clamav-milter 0.95.1 at bigwig.baldwin.cx X-Virus-Status: Clean X-Spam-Status: No, score=-2.6 required=4.2 tests=AWL,BAYES_00 autolearn=ham version=3.2.5 X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on bigwig.baldwin.cx Cc: Yuri Pankov , =?iso-8859-15?q?Ren=E9_Ladan?= , David Naylor Subject: Re: nvidia-driver crashing kernel on head X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 08 Jul 2010 12:32:10 -0000 On Friday, July 02, 2010 12:55:38 pm David Naylor wrote: > On Friday 02 July 2010 14:57:35 Ren=E9 Ladan wrote: > > 2010/7/2 Yuri Pankov : > > > On Fri, Jul 02, 2010 at 11:46:41AM +0200, David Naylor wrote: > > >> Hi, > > >>=20 > > >> I'm not sure this has been reported before but I am experience crash= es > > >> with nvidia-driver on -current (cvsup ~day ago). > > >>=20 > > >> If I remove all the debugging options from the kernel config then it= is > > >> very usable. > > >>=20 > > >> Here are the backtraces from two nvidia-driver versions: > > >>=20 > > >> nvidia-driver-195.36.15 and GENERIC: > > >> panic: mutex page lock not owned at > > >> /home/freebsd9/src/sys/vm/vm_page.c:1638 cpuid =3D 1 > > >> KDB: enter: panic > > >> [ thread pid 1815 tid 100097 ] > > >> Stopped at kdb_enter+0x3d: movq $0,0x6bc27c(%rip) > > >> db> bt > > >> Tracing pid 1815 tid 100097 td 0xffffff00045af000 > > >> kdb_enter() at kdb_enter+0x3d > > >> panic() at panic+0x176 > > >> assert_mtx() at assert_mtx > > >> vm_page_wire() at vm_page_wire+0x37 > > >> nv_alloc_system_pages() at nv_alloc_system_pages+0x217 > > >> nv_alloc_pages() at nv_alloc_pages+0xcd > > >> _nv019978rm() at _nv019978rm+0x7f > > >>=20 > > >> nvidia-driver-256.35 and custom kernel: > > >> panic: blockable sleep lock (sleep mutex) select mtxpool @ > > >> /home/freebsd9/src/sys/kern/sys_generic.c:1479 > > >> cpuid =3D 1 > > >> KDB: enter: panic > > >> [ thread pid 1830 tid 100090 ] > > >> Stopped at kdb_enter+0x3d: movq $0,0x51368c(%rip) > > >> db> bt > > >> Tracing pid 1830 tid 100090 td 0xffffff000456d3d0 > > >> kdb_enter() at kdb_enter+0x3d > > >> panic() at panic+0x176 > > >> witness_checkorder() at witness_checkorder+0x913 > > >> _mtx_lock_flags() at _mtx_lock_flags+0x68 > > >> selrecord() at selrecord+0x71 > > >> nvidia_dev_poll() at nvidia_dev_poll+0x52 > > >> devfs_poll_f() at devfs_poll_f+0x55 > > >> kern_select() at kern_select+0x501 > > >> select() at select+0x54 > > >> syscallenter() at syscallenter+0x19b > > >> syscall() at syscall+0x41 > > >> Xfast_syscall() at Xfast_syscall+0xe2 > > >> --- syscall (93, FreeBSD ELF64, select), rip =3D 0x801a17ddc, rsp =3D > > >> 0x7fffffffe908, rbp =3D 0x100 --- > > >>=20 > > >> Also of note is: > > >> # grep '^C.*FLAGS' /etc/make.conf > > >> CFLAGS+=3D -DNDEBUG > > >>=20 > > >> As mentioned that without any debugging options the system is stable. > > >>=20 > > >> Is there anything I can do to assist diagnosis? > > >>=20 > > >> Regards, > > >>=20 > > >> David > > >=20 > > > http://lists.freebsd.org/pipermail/freebsd-current/2010-June/017936.h= tml > > > helps here, check the thread as well. > > >=20 > > > You could also try to use 256.35 driver. > >=20 > > The 256.35 driver works for me (without the above-referred patch), but > > anywhere between 1 and 48 hours my laptop locks up hard without any > > warning nor panic. This is with CURRENT r209581, GENERIC kernel, but wi= th > > debug.witness.watch=3D0 If I set debug.witness.watch to 1, the kernel > > freezes when starting X. >=20 > I experienced a lockup when using the 256.35 driver, I switched back to t= he=20 > 195.36.15 driver and no problems since. The system also freezes up when= =20 > launching k3b so I'm not sure what caused that particular freeze... >=20 > Thanks for the debug.witness.watch hint. =20 These freezes and panics are due to the driver using a spin mutex instead o= f a=20 regular mutex for the per-file descriptor event_mtx. If you patch the driv= er=20 to change it to be a regular mutex I think that should fix the problems. =2D-=20 John Baldwin