Date: Fri, 23 Jul 2010 14:08:36 +0200 From: David Naylor <naylor.b.david@gmail.com> To: Christian Zander <czander@nvidia.com> Cc: Christian Zander <chzander@nvidia.com>, "danfe@freebsd.org" <danfe@freebsd.org>, Doug Barton <dougb@freebsd.org>, Yuri Pankov <yuri.pankov@gmail.com>, "freebsd-current@freebsd.org" <freebsd-current@freebsd.org>, Rene Ladan <rene@freebsd.org> Subject: Re: nvidia-driver crashing kernel on head Message-ID: <201007231408.39453.naylor.b.david@gmail.com> In-Reply-To: <20100717152527.GA26038@panther.nvidia.com> References: <201007021146.46542.naylor.b.david@gmail.com> <201007171624.58434.naylor.b.david@gmail.com> <20100717152527.GA26038@panther.nvidia.com>
next in thread | previous in thread | raw e-mail | index | archive | help
--nextPart3933015.Mqv6Xxsr9p Content-Type: Text/Plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable On Saturday 17 July 2010 17:25:27 Christian Zander wrote: > On Sat, Jul 17, 2010 at 07:24:54AM -0700, David Naylor wrote: > (...) >=20 > > > >>> These freezes and panics are due to the driver using a spin mutex > > > >>> instead of a > > > >>> regular mutex for the per-file descriptor event_mtx. If you patch > > > >>> the driver > > > >>> to change it to be a regular mutex I think that should fix the > > > >>> problems. > > > >>=20 > > > >> Can you give an example? :) I don't mind creating a patch for all = of > > > >> them if you can illustrate what needs to be changed. > > > >=20 > > > > See the attached patch > > >=20 > > > In order to use 195.36.15 it was necessary to use the patch Rene sent, > > > the suggestion from jhb previously to remove some locks, plus a bit > > > more. The patch that got it working on HEAD for me (specifically > > > r209633) is attached. With that patch I could start X, and run it for= a > > > while, but performance was very poor, even in comparison with the sto= ck > > > nv driver, and it crashed a couple times (although not nearly as bad = as > > > previously). > > >=20 > > > So based on other suggestions I tried the newest release version at > > > nvidia, 256.35. Some of the same locking stuff was needed to patch it, > > > a patch for the port which includes the locking patch is also > > > attached. If you are running an amd64 system you'll have to type 'make > > > makesum' after applying this patch to the port. I'm not sure this > > > patch is complete, or what Alexey might want to do with the update, > > > but it does create an accurate plist which means you can cleanly > > > deinstall/pkg_delete when you're done. > > >=20 > > > With 256.35 performance and stability have both been quite good, > > > comparable even to before the the drama started. The only concern I > > > have at this point is that I'm periodically getting a strange sort of > > > "flash" popping up on my screen that I didn't get while I was running > > > the nv driver recently. It looks sort of like the default X background > > > (the tiny gray crosshatch) is popping through for just a split second. > >=20 > > I've been getting these messages on the console: > >=20 > > NVRM: Xid (0001:00): 16, Head 00000000 Count 000218d5 > > NVRM: Xid (0001:00): 8, Channel 00000000 > > NVRM: Xid (0001:00): 16, Head 00000000 Count 000218d6 > > NVRM: Xid (0001:00): 8, Channel 00000002 > >=20 > > This is preceded by X locking hard. I cannot VT switch to a normal > > console and sometimes the computer needs a hard reset (i.e. does not > > respond to power button). It appears to only trigger when under heavy > > load. eg > > make -C /usr/src -j8 buildworld > >=20 > > This seems to be messing with interrupts with other subsystems as my > > network drivers are less than reliable of late. (Watchdog timeouts). >=20 > The messages indicate that the NVIDIA driver hasn't received > interrupts from the GPU @ PCI:1:00.0 over a significant > period of time. If you are seeing similar problems with other > system components, there's a good chance that the above is > a symptom of some larger problem. I think you are right. I'm not sure if this is a hardware problem or FreeB= SD. =20 I reverted to a kernel from May 01 and the system is solid (~5 days). I'm= =20 using the patched 256.35 driver without problem. =20 > > This happens with 195.36.15 unpatched and 256.35 patched. > >=20 > > I have not checked if booting with WITNESS enabled works. > >=20 > > Regards > >=20 > > * David Naylor <naylor.b.david@gmail.com> > > * 0xFF6916B2 --nextPart3933015.Mqv6Xxsr9p Content-Type: application/pgp-signature; name=signature.asc Content-Description: This is a digitally signed message part. -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.14 (FreeBSD) iEYEABECAAYFAkxJhkcACgkQUaaFgP9pFrIzyQCdE3KRNNbEW98TTm/XQOA6GF9u ff4An2FLYBBb5Bltf99fspfVW1GuJ93a =lvRd -----END PGP SIGNATURE----- --nextPart3933015.Mqv6Xxsr9p--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?201007231408.39453.naylor.b.david>