Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 23 Jul 2010 14:08:36 +0200
From:      David Naylor <naylor.b.david@gmail.com>
To:        Christian Zander <czander@nvidia.com>
Cc:        Christian Zander <chzander@nvidia.com>, "danfe@freebsd.org" <danfe@freebsd.org>, Doug Barton <dougb@freebsd.org>, Yuri Pankov <yuri.pankov@gmail.com>, "freebsd-current@freebsd.org" <freebsd-current@freebsd.org>, Rene Ladan <rene@freebsd.org>
Subject:   Re: nvidia-driver crashing kernel on head
Message-ID:  <201007231408.39453.naylor.b.david@gmail.com>
In-Reply-To: <20100717152527.GA26038@panther.nvidia.com>
References:  <201007021146.46542.naylor.b.david@gmail.com> <201007171624.58434.naylor.b.david@gmail.com> <20100717152527.GA26038@panther.nvidia.com>

next in thread | previous in thread | raw e-mail | index | archive | help
--nextPart3933015.Mqv6Xxsr9p
Content-Type: Text/Plain;
  charset="us-ascii"
Content-Transfer-Encoding: quoted-printable

On Saturday 17 July 2010 17:25:27 Christian Zander wrote:
> On Sat, Jul 17, 2010 at 07:24:54AM -0700, David Naylor wrote:
> (...)
>=20
> > > >>> These freezes and panics are due to the driver using a spin mutex
> > > >>> instead of a
> > > >>> regular mutex for the per-file descriptor event_mtx.  If you patch
> > > >>> the driver
> > > >>> to change it to be a regular mutex I think that should fix the
> > > >>> problems.
> > > >>=20
> > > >> Can you give an example? :) I don't mind creating a patch for all =
of
> > > >> them if you can illustrate what needs to be changed.
> > > >=20
> > > > See the attached patch
> > >=20
> > > In order to use 195.36.15 it was necessary to use the patch Rene sent,
> > > the suggestion from jhb previously to remove some locks, plus a bit
> > > more. The patch that got it working on HEAD for me (specifically
> > > r209633) is attached. With that patch I could start X, and run it for=
 a
> > > while, but performance was very poor, even in comparison with the sto=
ck
> > > nv driver, and it crashed a couple times (although not nearly as bad =
as
> > > previously).
> > >=20
> > > So based on other suggestions I tried the newest release version at
> > > nvidia, 256.35. Some of the same locking stuff was needed to patch it,
> > > a patch for the port which includes the locking patch is also
> > > attached. If you are running an amd64 system you'll have to type 'make
> > > makesum' after applying this patch to the port. I'm not sure this
> > > patch is complete, or what Alexey might want to do with the update,
> > > but it does create an accurate plist which means you can cleanly
> > > deinstall/pkg_delete when you're done.
> > >=20
> > > With 256.35 performance and stability have both been quite good,
> > > comparable even to before the the drama started. The only concern I
> > > have at this point is that I'm periodically getting a strange sort of
> > > "flash" popping up on my screen that I didn't get while I was running
> > > the nv driver recently. It looks sort of like the default X background
> > > (the tiny gray crosshatch) is popping through for just a split second.
> >=20
> > I've been getting these messages on the console:
> >=20
> > NVRM: Xid (0001:00): 16, Head 00000000 Count 000218d5
> > NVRM: Xid (0001:00): 8, Channel 00000000
> > NVRM: Xid (0001:00): 16, Head 00000000 Count 000218d6
> > NVRM: Xid (0001:00): 8, Channel 00000002
> >=20
> > This is preceded by X locking hard.  I cannot VT switch to a normal
> > console and sometimes the computer needs a hard reset (i.e. does not
> > respond to power button).  It appears to only trigger when under heavy
> > load.  eg
> > make -C /usr/src -j8 buildworld
> >=20
> > This seems to be messing with interrupts with other subsystems as my
> > network drivers are less than reliable of late.  (Watchdog timeouts).
>=20
> The messages indicate that the NVIDIA driver hasn't received
> interrupts from the GPU @ PCI:1:00.0 over a significant
> period of time. If you are seeing similar problems with other
> system components, there's a good chance that the above is
> a symptom of some larger problem.

I think you are right.  I'm not sure if this is a hardware problem or FreeB=
SD. =20
I reverted to a kernel from May 01 and the system is solid (~5 days).  I'm=
=20
using the patched 256.35 driver without problem. =20

> > This happens with 195.36.15 unpatched and 256.35 patched.
> >=20
> > I have not checked if booting with WITNESS enabled works.
> >=20
> > Regards
> >=20
> > * David Naylor <naylor.b.david@gmail.com>
> > * 0xFF6916B2

--nextPart3933015.Mqv6Xxsr9p
Content-Type: application/pgp-signature; name=signature.asc 
Content-Description: This is a digitally signed message part.

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.14 (FreeBSD)

iEYEABECAAYFAkxJhkcACgkQUaaFgP9pFrIzyQCdE3KRNNbEW98TTm/XQOA6GF9u
ff4An2FLYBBb5Bltf99fspfVW1GuJ93a
=lvRd
-----END PGP SIGNATURE-----

--nextPart3933015.Mqv6Xxsr9p--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?201007231408.39453.naylor.b.david>