Date: Sat, 10 Jun 2017 00:11:22 +0900 From: Tomoaki AOKI <junchoon@dec.sakura.ne.jp> To: freebsd-current@freebsd.org Cc: blubee blubeeme <gurenchan@gmail.com> Subject: Re: nvidia drivers mutex lock Message-ID: <20170610001122.9ea858d8feb2c7b58cfc00a6@dec.sakura.ne.jp> In-Reply-To: <CALM2mEnkCzLiArzuVGJHkt2neqBEYFJUuzsRShKGCMZdh5e-Dw@mail.gmail.com> References: <1100140349.1166835.1496498112171.ref@mail.yahoo.com> <1100140349.1166835.1496498112171@mail.yahoo.com> <CALM2mEnSFBp9z%2BV443SqQaeXDqQG6X%2B357LYEpbGk61kC9U7bQ@mail.gmail.com> <20170604165320.f4c06ed7ad867f4ec0280f09@dec.sakura.ne.jp> <CALM2mE=Spiq2r8XM-pm3=1LmQe=KbmPVdjWHffgr%2Bw1Fp9%2BJtw@mail.gmail.com> <20170604175911.6926dc73386d211c4a39bbc0@dec.sakura.ne.jp> <CALM2mE=Dum4P3Csavv8G75WesrJfmyKRkpbjC9F=ZjN0fX-0vQ@mail.gmail.com> <CALM2mEkRkOM2BF%2B83v6djs=qGOtFjULgGqjbSVGPG9YVc%2Bpaag@mail.gmail.com> <CALM2mEnkCzLiArzuVGJHkt2neqBEYFJUuzsRShKGCMZdh5e-Dw@mail.gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
Hmm, now I now strongly suspect hardware or noise issue, as nvidia GPU seems to fall / re-appear on bus for some times. If it WAS a desktop one and GPU is attached via PCIe connector, I'll immediately power off and re-connect the card, with some physical dust cleaning, but this time the GPU is onboard... *Not shure, but possibly, too short timeout on driver initialization code can show problems like this (too short to initialize). On Thu, 8 Jun 2017 02:27:51 +0800 blubee blubeeme <gurenchan@gmail.com> wrote: > I was just looking through dmesg and noticed these: > > Jun 6 21:40:52 blubee kernel: nvidia-modeset: Allocated GPU:0 > (GPU-54a7b304-c99d-efee-0117-0ce119063cd6) @ PCI:0000:01:00.0 > Jun 6 21:41:05 blubee kernel: NVRM: GPU at PCI:0000:01:00: > GPU-54a7b304-c99d-efee-0117-0ce119063cd6 > Jun 6 21:41:05 blubee kernel: NVRM: GPU Board Serial Number: > Jun 6 21:41:05 blubee kernel: NVRM: Xid (PCI:0000:01:00): 79, GPU has > fallen off the bus. > Jun 6 21:41:05 blubee kernel: > Jun 6 21:41:05 blubee kernel: NVRM: GPU at 0000:01:00.0 has fallen off the > bus. > Jun 6 21:41:05 blubee kernel: NVRM: GPU is on Board . > Jun 6 21:41:05 blubee kernel: NVRM: A GPU crash dump has been created. If > possible, please run > Jun 6 21:41:05 blubee kernel: NVRM: nvidia-bug-report.sh as root to > collect this data before > Jun 6 21:41:05 blubee kernel: NVRM: the NVIDIA kernel module is unloaded. > Jun 6 21:41:05 blubee kernel: nvidia-modeset: ERROR: GPU:0: Failed to > query display engine channel state: 0x0000927c:0:0:0x0000000f > Jun 6 21:41:05 blubee kernel: nvidia-modeset: ERROR: GPU:0: Failed to > query display engine channel state: 0x0000927c:0:0:0x0000000f > Jun 6 21:41:05 blubee kernel: vgapci0: child nvidia0 requested > pci_enable_io > Jun 6 21:41:05 blubee kernel: nvidia-modeset: ERROR: GPU:0: Failed to > query display engine channel state: 0x0000927c:0:0:0x0000000f > Jun 6 21:41:06 blubee kernel: nvidia-modeset: ERROR: GPU:0: Failed to > query display engine channel state: 0x0000927c:0:0:0x0000000f > Jun 6 21:41:22 blubee kernel: . > > then that lead me to this nvidia forum thread: > https://devtalk.nvidia.com/default/topic/985037/gtx-1070-quot-gpu-has-fallen-off-the-bus-quot-running-3d-games-in-arch-linux-/ > > maybe it could help somehow? > > Best, > Owen > > On Tue, Jun 6, 2017 at 10:08 PM, blubee blubeeme <gurenchan@gmail.com> > wrote: > > > This is getting out of hand. I can't even keep x going for ten minutes > > sometimes. > > I've tested all the suggestions in this thread and they just don't work. > > > > I have put out a print of sysctl hw. here : https://paste2.org/ > > > > With this CPU: hw.model: Intel(R) Core(TM) i7-6700HQ CPU @ 2.60GHz > > The bios on this laptop I can either set graphics to discrete or mshybrid. > > > > I've tried in the past to disable discrete and run mshybrid but that > > always comes up with 0 screens found. Even just doing Xorg -configure. > > > > Anyone have some tips on disabling nvidia drivers, running this cpu with > > igpu for a while? > > > > Best, > > Owen > > > > On Sun, Jun 4, 2017, 18:11 blubee blubeeme <gurenchan@gmail.com> wrote: > > > >> Thanks a lot! I'll give it a shot in a bit. > >> > >> Best, > >> Owen > >> > >> On Sun, Jun 4, 2017, 16:59 Tomoaki AOKI <junchoon@dec.sakura.ne.jp> > >> wrote: > >> > >>> Yes. FreeBSD patches in x11/nvidia-drivers/files are applied as usual. > >>> > >>> But beware! Sometimes upstream changes make any of FreeBSD patches not > >>> applicable (incorporating any of these, incompatible modifies, ...). > >>> > >>> For 381.22, current patchset applies and builds fine for me. > >>> > >>> > >>> On Sun, 04 Jun 2017 08:04:50 +0000 > >>> blubee blubeeme <gurenchan@gmail.com> wrote: > >>> > >>> > I'm running with svn and I build by make. > >>> > If in use these steps, the BSD related patches will be applied, etc? > >>> > > >>> > Best, > >>> > Owen > >>> > > >>> > On Sun, Jun 4, 2017, 15:53 Tomoaki AOKI <junchoon@dec.sakura.ne.jp> > >>> wrote: > >>> > > >>> > > Hi. > >>> > > > >>> > > Not in ports tree, but easily overridden by adding > >>> > > > >>> > > DISTVERSION=381.22 -DNO_CHECKSUM > >>> > > > >>> > > on make command line. Makefile of x11/nvidia-driver has a mechanism > >>> > > to do so for someone requires newer version (newer GPU support, > >>> etc.). > >>> > > > >>> > > If you're using portupgrade, > >>> > > > >>> > > portupgrade -m 'DISTVERSION=381.22 -DNO_CHECKSUM' -f > >>> x11/nvidia-driver > >>> > > > >>> > > would do the same. > >>> > > > >>> > > If you installed it via pkg, there's no way to try. :-( > >>> > > (As it's pre-built.) > >>> > > > >>> > > > >>> > > On Sun, 04 Jun 2017 07:04:01 +0000 > >>> > > blubee blubeeme <gurenchan@gmail.com> wrote: > >>> > > > >>> > > > Hi @tomoaki > >>> > > > Is that version of nvidia drivers currently in the ports tree? I > >>> just > >>> > > > checked but it seems not to be. > >>> > > > > >>> > > > @jeffrey > >>> > > > I just generated a new xorg based on the force composition > >>> setting. I > >>> > > > merged it with my previous xorg I'll reboot, see if it gives better > >>> > > > performance. > >>> > > > > >>> > > > It seems like my system is locking up more frequently now. > >>> Sometimes > >>> > > right > >>> > > > after a reboot the system, the screen locks and it's reboot and > >>> pray. > >>> > > > > >>> > > > Best, > >>> > > > Owen > >>> > > > > >>> > > > On Sat, Jun 3, 2017, 21:59 Jeffrey Bouquet < > >>> jeffreybouquet@yahoo.com> > >>> > > wrote: > >>> > > > > >>> > > > > SOME LINES BOTTOM POSTED, SEE... > >>> > > > > -------------------------------------------- > >>> > > > > On Fri, 6/2/17, Tomoaki AOKI <junchoon@dec.sakura.ne.jp> wrote: > >>> > > > > > >>> > > > > Subject: Re: nvidia drivers mutex lock > >>> > > > > To: freebsd-current@freebsd.org > >>> > > > > Cc: "Jeffrey Bouquet" <jeffreybouquet@yahoo.com>, "blubee > >>> blubeeme" < > >>> > > > > gurenchan@gmail.com> > >>> > > > > Date: Friday, June 2, 2017, 11:25 PM > >>> > > > > > >>> > > > > Hi. > >>> > > > > Version > >>> > > > > 381.22 (5 days newer than 375.66) of the driver states... > >>> > > > > [1] > >>> > > > > > >>> > > > > Fixed hangs and > >>> > > > > crashes that could occur when an OpenGL context is > >>> > > > > created while the system is out of available > >>> > > > > memory. > >>> > > > > > >>> > > > > Can this be related > >>> > > > > with your hang? > >>> > > > > > >>> > > > > IMHO, > >>> > > > > possibly allocating new resource (using os.lock_mtx > >>> > > > > guard) > >>> > > > > without checking the lock first while > >>> > > > > previous request is waiting for > >>> > > > > another can > >>> > > > > cause the duplicated lock situation. And high memory > >>> > > > > pressure would easily cause the situation. > >>> > > > > > >>> > > > > [1] http://www.nvidia.com/Download > >>> /driverResults.aspx/118527/en-us > >>> > > > > > >>> > > > > Hope it helps. > >>> > > > > > >>> > > > > > >>> > > > > On Thu, 1 Jun > >>> > > > > 2017 22:35:46 +0000 (UTC) > >>> > > > > Jeffrey Bouquet > >>> > > > > <jeffreybouquet@yahoo.com> > >>> > > > > wrote: > >>> > > > > > >>> > > > > > I see the same > >>> > > > > message, upon load, ... > >>> > > > > > > >>> > > > > -------------------------------------------- > >>> > > > > > On Thu, 6/1/17, blubee blubeeme <gurenchan@gmail.com> > >>> > > > > wrote: > >>> > > > > > > >>> > > > > > Subject: > >>> > > > > nvidia drivers mutex lock > >>> > > > > > To: freebsd-ports@freebsd.org, > >>> > > > > freebsd-current@freebsd.org > >>> > > > > > Date: Thursday, June 1, 2017, 11:35 > >>> > > > > AM > >>> > > > > > > >>> > > > > > I'm > >>> > > > > running nvidia-drivers 375.66 with a GTX > >>> > > > > > 1070 on FreeBSD-Current > >>> > > > > > > >>> > > > > > This problem > >>> > > > > just started happening > >>> > > > > > recently but, > >>> > > > > every so often my laptop > >>> > > > > > screen will > >>> > > > > just blank out and then I > >>> > > > > > have to > >>> > > > > power cycle to get the > >>> > > > > > machine up and > >>> > > > > running again. > >>> > > > > > > >>> > > > > > It seems to be a problem with nvidia > >>> > > > > > drivers acquiring duplicate lock. Any > >>> > > > > > info on this? > >>> > > > > > > >>> > > > > > Jun〓 2 02:29:41 blubee kernel: > >>> > > > > > acquiring duplicate lock of same > >>> > > > > type: > >>> > > > > > "os.lock_mtx" > >>> > > > > > Jun〓 2 02:29:41 blubee kernel: 1st > >>> > > > > > os.lock_mtx @ nvidia_os.c:841 > >>> > > > > > Jun〓 2 02:29:41 blubee kernel: 2nd > >>> > > > > > os.lock_mtx @ nvidia_os.c:841 > >>> > > > > > Jun〓 2 02:29:41 blubee kernel: > >>> > > > > > stack backtrace: > >>> > > > > > > >>> > > > > Jun〓 2 02:29:41 blubee kernel: #0 > >>> > > > > > > >>> > > > > 0xffffffff80ab7770 at > >>> > > > > > > >>> > > > > witness_debugger+0x70 > >>> > > > > > Jun〓 2 > >>> > > > > 02:29:41 blubee kernel: #1 > >>> > > > > > > >>> > > > > 0xffffffff80ab7663 at > >>> > > > > > > >>> > > > > witness_checkorder+0xe23 > >>> > > > > > Jun〓 2 > >>> > > > > 02:29:41 blubee kernel: #2 > >>> > > > > > > >>> > > > > 0xffffffff80a35b93 at > >>> > > > > > > >>> > > > > __mtx_lock_flags+0x93 > >>> > > > > > Jun〓 2 > >>> > > > > 02:29:41 blubee kernel: #3 > >>> > > > > > > >>> > > > > 0xffffffff82f4397b at > >>> > > > > > > >>> > > > > os_acquire_spinlock+0x1b > >>> > > > > > Jun〓 2 > >>> > > > > 02:29:41 blubee kernel: #4 > >>> > > > > > > >>> > > > > 0xffffffff82c48b15 at _nv012002rm+0x185 > >>> > > > > > Jun〓 2 02:29:41 blubee kernel: > >>> > > > > > ACPI Warning: > >>> > > > > \_SB.PCI0.PEG0.PEGP._DSM: > >>> > > > > > Argument #4 > >>> > > > > type mismatch - Found > >>> > > > > > [Buffer], ACPI > >>> > > > > requires [Package] > >>> > > > > > > >>> > > > > (20170303/nsarguments-205) > >>> > > > > > Jun〓 2 > >>> > > > > 02:29:42 blubee kernel: > >>> > > > > > > >>> > > > > nvidia-modeset: Allocated GPU:0 > >>> > > > > > > >>> > > > > (GPU-54a7b304-c99d-efee-0117-0ce119063cd6) @ > >>> > > > > > PCI:0000:01:00.0 > >>> > > > > > > >>> > > > > > >>> > > > > > Best, > >>> > > > > > Owen > >>> > > > > > > >>> > > > > _______________________________________________ > >>> > > > > > freebsd-ports@freebsd.org > >>> > > > > > mailing list > >>> > > > > > https://lists.freebsd.org/mailman/listinfo/freebsd-ports > >>> > > > > > To unsubscribe, send any mail to > >>> > > > > "freebsd-ports-unsubscribe@freebsd.org" > >>> > > > > > > >>> > > > > > > >>> > > > > > > >>> > > > > > ... then Xorg will > >>> > > > > run happily twelve hours or so. The lockups here happen > >>> > > > > usually > >>> > > > > > when too large or too many of > >>> > > > > number of tabs/ large web pages with complex CSS etc > >>> > > > > > are opened at a time. > >>> > > > > > So no help, just a 'me > >>> > > > > too'. > >>> > > > > > > >>> > > > > _______________________________________________ > >>> > > > > > freebsd-current@freebsd.org > >>> > > > > mailing list > >>> > > > > > https://lists.freebsd.org/mailman/listinfo/freebsd-current > >>> > > > > > > >>> > > > > To unsubscribe, send any mail to " > >>> > > freebsd-current-unsubscribe@freebsd.org > >>> > > > > " > >>> > > > > > > >>> > > > > > > >>> > > > > > >>> > > > > > >>> > > > > -- > >>> > > > > Tomoaki > >>> > > > > AOKI <junchoon@dec.sakura.ne.jp> > >>> > > > > > >>> > > > > > >>> > > > > > >>> > > > > ........................ > >>> > > > > might be a workaround > >>> > > > > Xorg/nvidia ran all night with this: > >>> > > > > nvidia-settings >> X server display configuration >> > >>> Advanced >> > >>> > > Force > >>> > > > > Full Composition Pipeline > >>> > > > > ... for the laptop freezing. Could not hurt to try. " merge > >>> with > >>> > > > > Xorg.conf " from nvidia-settings... > >>> > > > > ...................... > >>> > > > > 18 hours uptime so far, even past > >>> > > > > the 3 am periodic scripts. Have not rebooted out of the Xorg > >>> though > >>> > > so > >>> > > > > may require edit-out of > >>> > > > > xorg.conf if that is the case, in other words differing from > >>> real-time > >>> > > > > apply and > >>> > > > > xorg initially start applies. > >>> > > > > ........ > >>> > > > > > >>> > > > > > >>> > > > _______________________________________________ > >>> > > > freebsd-current@freebsd.org mailing list > >>> > > > https://lists.freebsd.org/mailman/listinfo/freebsd-current > >>> > > > To unsubscribe, send any mail to " > >>> > > freebsd-current-unsubscribe@freebsd.org" > >>> > > > > >>> > > > > >>> > > > >>> > > > >>> > > -- > >>> > > Tomoaki AOKI <junchoon@dec.sakura.ne.jp> > >>> > > > >>> > >>> > >>> -- > >>> Tomoaki AOKI <junchoon@dec.sakura.ne.jp> > >>> > >> > _______________________________________________ > freebsd-current@freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-current > To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org" > > -- Tomoaki AOKI <junchoon@dec.sakura.ne.jp>
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20170610001122.9ea858d8feb2c7b58cfc00a6>