Date: Sun, 11 Jul 2010 13:14:44 -0700 From: Doug Barton <dougb@FreeBSD.org> To: Rene Ladan <rene@freebsd.org> Cc: danfe@FreeBSD.org, Christian Zander <czander@nvidia.com>, David Naylor <naylor.b.david@gmail.com>, Yuri Pankov <yuri.pankov@gmail.com>, freebsd-current@freebsd.org Subject: Re: nvidia-driver crashing kernel on head Message-ID: <4C3A2634.5050003@FreeBSD.org> In-Reply-To: <4C36488A.6030203@freebsd.org> References: <201007021146.46542.naylor.b.david@gmail.com> <AANLkTimT4UwDzB6jF2eML4U7jQubOs1slwBPHwy_5U3b@mail.gmail.com> <201007021855.42103.naylor.b.david@gmail.com> <201007080826.32764.jhb@freebsd.org> <alpine.BSF.2.00.1007081304590.5061@yncgbc.qbhto.arg> <4C36488A.6030203@freebsd.org>
next in thread | previous in thread | raw e-mail | index | archive | help
This is a multi-part message in MIME format. --------------040401050506090804000701 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit On 07/08/10 14:52, Rene Ladan wrote: > On 08-07-2010 22:09, Doug Barton wrote: >> On Thu, 8 Jul 2010, John Baldwin wrote: >> >>> These freezes and panics are due to the driver using a spin mutex >>> instead of a >>> regular mutex for the per-file descriptor event_mtx. If you patch the >>> driver >>> to change it to be a regular mutex I think that should fix the problems. >> >> Can you give an example? :) I don't mind creating a patch for all of >> them if you can illustrate what needs to be changed. >> > See the attached patch In order to use 195.36.15 it was necessary to use the patch Rene sent, the suggestion from jhb previously to remove some locks, plus a bit more. The patch that got it working on HEAD for me (specifically r209633) is attached. With that patch I could start X, and run it for a while, but performance was very poor, even in comparison with the stock nv driver, and it crashed a couple times (although not nearly as bad as previously). So based on other suggestions I tried the newest release version at nvidia, 256.35. Some of the same locking stuff was needed to patch it, a patch for the port which includes the locking patch is also attached. If you are running an amd64 system you'll have to type 'make makesum' after applying this patch to the port. I'm not sure this patch is complete, or what Alexey might want to do with the update, but it does create an accurate plist which means you can cleanly deinstall/pkg_delete when you're done. With 256.35 performance and stability have both been quite good, comparable even to before the the drama started. The only concern I have at this point is that I'm periodically getting a strange sort of "flash" popping up on my screen that I didn't get while I was running the nv driver recently. It looks sort of like the default X background (the tiny gray crosshatch) is popping through for just a split second. hth, Doug -- ... and that's just a little bit of history repeating. -- Propellerheads Improve the effectiveness of your Internet presence with a domain name makeover! http://SupersetSolutions.com/ --------------040401050506090804000701 Content-Type: text/plain; name="nvidia-port-locking-256-35.diff" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="nvidia-port-locking-256-35.diff" Index: Makefile =================================================================== RCS file: /home/pcvs/ports/x11/nvidia-driver/Makefile,v retrieving revision 1.98 diff -u -r1.98 Makefile --- Makefile 24 May 2010 03:01:56 -0000 1.98 +++ Makefile 11 Jul 2010 20:02:47 -0000 @@ -6,7 +6,7 @@ # PORTNAME= nvidia-driver -DISTVERSION?= 195.36.15 +DISTVERSION?= 256.35 PORTREVISION?= 0 # As a reminder it can be overridden CATEGORIES= x11 kld MASTER_SITES= ${MASTER_SITE_NVIDIA} @@ -143,9 +143,6 @@ .endif .if ${NVVERSION} < 1802900 @${REINPLACE_CMD} '/vdpau/d' ${TMPPLIST} -.else - @${MKDIR} ${PREFIX}/include/vdpau - @${LN} -sf ${DOCSDIR}/vdpau*.h ${PREFIX}/include/vdpau .endif .if ${NVVERSION} < 1851829 @${REINPLACE_CMD} '/libcuda/d' ${TMPPLIST} Index: distinfo =================================================================== RCS file: /home/pcvs/ports/x11/nvidia-driver/distinfo,v retrieving revision 1.36 diff -u -r1.36 distinfo --- distinfo 10 Apr 2010 13:40:07 -0000 1.36 +++ distinfo 11 Jul 2010 20:02:47 -0000 @@ -1,15 +1,3 @@ -MD5 (NVIDIA-FreeBSD-x86-195.36.15.tar.gz) = 2537ca726240344c7eaa44857e2b134e -SHA256 (NVIDIA-FreeBSD-x86-195.36.15.tar.gz) = 21fc89fa59e2cc96e560af856a3fa583ce4bfb7975465c71170c64962201e7a1 -SIZE (NVIDIA-FreeBSD-x86-195.36.15.tar.gz) = 25614326 -MD5 (NVIDIA-FreeBSD-x86_64-195.36.15.tar.gz) = 95af03aedc818a3dfd8ae9f289746ba4 -SHA256 (NVIDIA-FreeBSD-x86_64-195.36.15.tar.gz) = d64c664398cb4dade24af6b108e03607614f1f7584c71449230c646c313d0e7e -SIZE (NVIDIA-FreeBSD-x86_64-195.36.15.tar.gz) = 26449559 -MD5 (NVIDIA-FreeBSD-x86-173.14.25.tar.gz) = 1eca3916a9ae86b953f54405e1881774 -SHA256 (NVIDIA-FreeBSD-x86-173.14.25.tar.gz) = c432ed94ce71e297b2d9304d9f34f906b58e2c7c4bc13d8dbac264ed52fd6261 -SIZE (NVIDIA-FreeBSD-x86-173.14.25.tar.gz) = 16682722 -MD5 (NVIDIA-FreeBSD-x86-96.43.16.tar.gz) = 3fc5c2bb537d4a7664d84a7a0df09c7c -SHA256 (NVIDIA-FreeBSD-x86-96.43.16.tar.gz) = 38bf334284dc600d92d8436333c98d5577e34d69456ed71f1cccc75caa6dffcd -SIZE (NVIDIA-FreeBSD-x86-96.43.16.tar.gz) = 11842453 -MD5 (NVIDIA-FreeBSD-x86-71.86.13.tar.gz) = 19000b906225ebd39ca3edc1b0c3c7a5 -SHA256 (NVIDIA-FreeBSD-x86-71.86.13.tar.gz) = 27ae01cd6fe050871f7785c2146b18e74ea882f6262e46dc965bf26061238447 -SIZE (NVIDIA-FreeBSD-x86-71.86.13.tar.gz) = 8066159 +MD5 (NVIDIA-FreeBSD-x86-256.35.tar.gz) = 599908c9ffd8999ab0333cab34ea15a0 +SHA256 (NVIDIA-FreeBSD-x86-256.35.tar.gz) = 897c711acdca188da26868aec510c732d34f415ae621c35e5556ed8de493f26e +SIZE (NVIDIA-FreeBSD-x86-256.35.tar.gz) = 26047458 Index: pkg-plist =================================================================== RCS file: /home/pcvs/ports/x11/nvidia-driver/pkg-plist,v retrieving revision 1.27 diff -u -r1.27 pkg-plist --- pkg-plist 10 Apr 2010 13:40:07 -0000 1.27 +++ pkg-plist 11 Jul 2010 20:02:47 -0000 @@ -10,15 +10,13 @@ @unexec mv -f %D/%%MODULESDIR%%/extensions/XXX-libglx.so.%%%%.%%XSERVVERSION%% %D/%%MODULESDIR%%/extensions/libglx.so @exec mv -f %D/lib/libGL.so.1 %D/lib/XXX-libGL.so.1.%%%%.%%LIBGLVERSION%% @unexec mv -f %D/lib/XXX-libGL.so.1.%%%%.%%LIBGLVERSION%% %D/lib/libGL.so.1 -include/vdpau/vdpau.h -include/vdpau/vdpau_x11.h -@dirrm include/vdpau +lib/libGL.so +lib/libnvidia-glcore.so.1 +lib/libnvidia-glcore.so lib/libnvidia-tls.so.1 lib/libnvidia-tls.so lib/libnvidia-cfg.so.1 lib/libnvidia-cfg.so -lib/libGLcore.so.1 -lib/libGLcore.so lib/libvdpau.so.1 lib/libvdpau.so lib/vdpau/libvdpau_nvidia.so.1 @@ -41,12 +39,10 @@ %%LINUX%%@cwd %%LINUXBASE%% %%LINUX%%usr/lib/libGL.so.%%SHLIB_VERSION%% %%LINUX%%usr/lib/libGL.so.1 -%%LINUX%%usr/lib/libGLcore.so.%%SHLIB_VERSION%% -%%LINUX%%usr/lib/libGLcore.so.1 %%LINUX%%usr/lib/libcuda.so.%%SHLIB_VERSION%% %%LINUX%%usr/lib/libcuda.so.1 +%%LINUX%%usr/lib/libnvidia-glcore.so.%%SHLIB_VERSION%% %%LINUX%%usr/lib/libnvidia-tls.so.%%SHLIB_VERSION%% -%%LINUX%%usr/lib/libnvidia-tls.so.1 %%LINUX%%usr/lib/libvdpau.so.%%SHLIB_VERSION%% %%LINUX%%usr/lib/libvdpau.so.1 %%LINUX%%usr/lib/libvdpau_nvidia.so Index: files/patch-nvidia-locking-256-35 =================================================================== RCS file: files/patch-nvidia-locking-256-35 diff -N files/patch-nvidia-locking-256-35 --- /dev/null 1 Jan 1970 00:00:00 -0000 +++ files/patch-nvidia-locking-256-35 11 Jul 2010 20:02:47 -0000 @@ -0,0 +1,100 @@ +diff -ur NVIDIA-FreeBSD-x86-256.35-port-patched/src/nvidia_ctl.c src/nvidia_ctl.c +--- NVIDIA-FreeBSD-x86-256.35-port-patched/src/nvidia_ctl.c 2010-06-16 18:36:40.000000000 -0700 ++++ src/nvidia_ctl.c 2010-07-11 01:22:55.000000000 -0700 +@@ -53,7 +53,7 @@ + } + + filep->nv = nv; +- mtx_init(&filep->event_mtx, "event_mtx", NULL, (MTX_SPIN | MTX_RECURSE)); ++ mtx_init(&filep->event_mtx, "event_mtx", NULL, (MTX_DEF | MTX_RECURSE)); + STAILQ_INIT(&filep->event_queue); + + nv_lock_api(nv); +@@ -123,7 +123,7 @@ + if (status != 0) + return status; + +- mtx_lock_spin(&filep->event_mtx); ++ mtx_lock(&filep->event_mtx); + et = STAILQ_FIRST(&filep->event_queue); + + if (et == NULL) +@@ -131,7 +131,7 @@ + else + mask = (events & (POLLIN | POLLPRI | POLLRDNORM)); + +- mtx_unlock_spin(&filep->event_mtx); ++ mtx_unlock(&filep->event_mtx); + + return mask; + } +diff -ur NVIDIA-FreeBSD-x86-256.35-port-patched/src/nvidia_dev.c src/nvidia_dev.c +--- NVIDIA-FreeBSD-x86-256.35-port-patched/src/nvidia_dev.c 2010-06-16 18:36:40.000000000 -0700 ++++ src/nvidia_dev.c 2010-07-11 01:22:55.000000000 -0700 +@@ -52,7 +52,7 @@ + } + + filep->nv = nv; +- mtx_init(&filep->event_mtx, "event_mtx", NULL, (MTX_SPIN | MTX_RECURSE)); ++ mtx_init(&filep->event_mtx, "event_mtx", NULL, (MTX_DEF | MTX_RECURSE)); + STAILQ_INIT(&filep->event_queue); + + nv_lock_api(nv); +@@ -123,7 +123,7 @@ + if (status != 0) + return status; + +- mtx_lock_spin(&filep->event_mtx); ++ mtx_lock(&filep->event_mtx); + et = STAILQ_FIRST(&filep->event_queue); + + if (et == NULL) +@@ -131,7 +131,7 @@ + else + mask = (events & (POLLIN | POLLPRI | POLLRDNORM)); + +- mtx_unlock_spin(&filep->event_mtx); ++ mtx_unlock(&filep->event_mtx); + + return mask; + } +diff -ur NVIDIA-FreeBSD-x86-256.35-port-patched/src/nvidia_subr.c src/nvidia_subr.c +--- NVIDIA-FreeBSD-x86-256.35-port-patched/src/nvidia_subr.c 2010-06-16 18:36:40.000000000 -0700 ++++ src/nvidia_subr.c 2010-07-11 01:22:55.000000000 -0700 +@@ -987,9 +987,9 @@ + et->event.hObject = hObject; + et->event.index = index; + +- mtx_lock_spin(&filep->event_mtx); ++ mtx_lock(&filep->event_mtx); + STAILQ_INSERT_TAIL(&filep->event_queue, et, queue); +- mtx_unlock_spin(&filep->event_mtx); ++ mtx_unlock(&filep->event_mtx); + + selwakeup(&filep->event_rsel); + } +@@ -1004,7 +1004,7 @@ + struct nvidia_filep *filep = file; + struct nvidia_event *et; + +- mtx_lock_spin(&filep->event_mtx); ++ mtx_lock(&filep->event_mtx); + et = STAILQ_FIRST(&filep->event_queue); + + if (et != NULL) { +@@ -1013,13 +1013,13 @@ + STAILQ_REMOVE(&filep->event_queue, et, nvidia_event, queue); + *pending = !STAILQ_EMPTY(&filep->event_queue); + +- mtx_unlock_spin(&filep->event_mtx); ++ mtx_unlock(&filep->event_mtx); + free(et, M_NVIDIA); + + return RM_OK; + } + +- mtx_unlock_spin(&filep->event_mtx); ++ mtx_unlock(&filep->event_mtx); + return RM_ERROR; + } + --------------040401050506090804000701 Content-Type: text/plain; name="patch-nvidia-locking-195-36-15" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="patch-nvidia-locking-195-36-15" diff -ur NVIDIA-FreeBSD-x86-195.36.15-port-patched/src/nvidia_ctl.c src/nvidia_ctl.c --- NVIDIA-FreeBSD-x86-195.36.15-port-patched/src/nvidia_ctl.c 2010-03-12 09:21:51.000000000 -0800 +++ src/nvidia_ctl.c 2010-07-10 16:36:49.000000000 -0700 @@ -54,7 +54,7 @@ } filep->nv = nv; - mtx_init(&filep->event_mtx, "event_mtx", NULL, (MTX_SPIN | MTX_RECURSE)); + mtx_init(&filep->event_mtx, "event_mtx", NULL, (MTX_DEF | MTX_RECURSE)); STAILQ_INIT(&filep->event_queue); nv_lock_api(nv); @@ -126,7 +126,7 @@ if (status != 0) return status; - mtx_lock_spin(&filep->event_mtx); + mtx_lock(&filep->event_mtx); et = STAILQ_FIRST(&filep->event_queue); if (et == NULL) @@ -134,7 +134,7 @@ else mask = (events & (POLLIN | POLLPRI | POLLRDNORM)); - mtx_unlock_spin(&filep->event_mtx); + mtx_unlock(&filep->event_mtx); return mask; } diff -ur NVIDIA-FreeBSD-x86-195.36.15-port-patched/src/nvidia_dev.c src/nvidia_dev.c --- NVIDIA-FreeBSD-x86-195.36.15-port-patched/src/nvidia_dev.c 2010-03-12 09:21:51.000000000 -0800 +++ src/nvidia_dev.c 2010-07-10 16:36:49.000000000 -0700 @@ -52,7 +52,7 @@ } filep->nv = nv; - mtx_init(&filep->event_mtx, "event_mtx", NULL, (MTX_SPIN | MTX_RECURSE)); + mtx_init(&filep->event_mtx, "event_mtx", NULL, (MTX_DEF | MTX_RECURSE)); STAILQ_INIT(&filep->event_queue); nv_lock_api(nv); @@ -125,7 +125,7 @@ if (status != 0) return status; - mtx_lock_spin(&filep->event_mtx); + mtx_lock(&filep->event_mtx); et = STAILQ_FIRST(&filep->event_queue); if (et == NULL) @@ -133,7 +133,7 @@ else mask = (events & (POLLIN | POLLPRI | POLLRDNORM)); - mtx_unlock_spin(&filep->event_mtx); + mtx_unlock(&filep->event_mtx); return mask; } diff -ur NVIDIA-FreeBSD-x86-195.36.15-port-patched/src/nvidia_subr.c src/nvidia_subr.c --- NVIDIA-FreeBSD-x86-195.36.15-port-patched/src/nvidia_subr.c 2010-03-12 09:21:52.000000000 -0800 +++ src/nvidia_subr.c 2010-07-10 16:37:43.000000000 -0700 @@ -967,9 +967,9 @@ et->event.hObject = hObject; et->event.index = index; - mtx_lock_spin(&filep->event_mtx); + mtx_lock(&filep->event_mtx); STAILQ_INSERT_TAIL(&filep->event_queue, et, queue); - mtx_unlock_spin(&filep->event_mtx); + mtx_unlock(&filep->event_mtx); selwakeup(&filep->event_rsel); } @@ -984,7 +984,7 @@ struct nvidia_filep *filep = file; struct nvidia_event *et; - mtx_lock_spin(&filep->event_mtx); + mtx_lock(&filep->event_mtx); et = STAILQ_FIRST(&filep->event_queue); if (et != NULL) { @@ -993,13 +993,13 @@ STAILQ_REMOVE(&filep->event_queue, et, nvidia_event, queue); *pending = !STAILQ_EMPTY(&filep->event_queue); - mtx_unlock_spin(&filep->event_mtx); + mtx_unlock(&filep->event_mtx); free(et, M_NVIDIA); return RM_OK; } - mtx_unlock_spin(&filep->event_mtx); + mtx_unlock(&filep->event_mtx); return RM_ERROR; } @@ -1301,9 +1301,6 @@ for (i = 0; i < count; i++) { pte_array[i] = at->pte_array[i].physical_address; - vm_page_lock_queues(); - vm_page_wire(PHYS_TO_VM_PAGE(pte_array[i])); - vm_page_unlock_queues(); sglist_append_phys(at->sg_list, pte_array[i], PAGE_SIZE); } @@ -1365,9 +1362,6 @@ os_flush_cpu_cache(); for (i = 0; i < count; i++) { - vm_page_lock_queues(); - vm_page_unwire(PHYS_TO_VM_PAGE(at->pte_array[i].physical_address), 0); - vm_page_unlock_queues(); kmem_free(kernel_map, at->pte_array[i].virtual_address, PAGE_SIZE); malloc_type_freed(M_NVIDIA, PAGE_SIZE); --------------040401050506090804000701--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4C3A2634.5050003>