From owner-freebsd-current@FreeBSD.ORG Sat Aug 10 08:37:11 2013 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTP id C762D685 for ; Sat, 10 Aug 2013 08:37:11 +0000 (UTC) (envelope-from gljennjohn@googlemail.com) Received: from mail-bk0-x22b.google.com (mail-bk0-x22b.google.com [IPv6:2a00:1450:4008:c01::22b]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 3D9F12F16 for ; Sat, 10 Aug 2013 08:37:11 +0000 (UTC) Received: by mail-bk0-f43.google.com with SMTP id mz13so1321269bkb.2 for ; Sat, 10 Aug 2013 01:37:08 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlemail.com; s=20120113; h=date:from:to:cc:subject:message-id:in-reply-to:references:reply-to :mime-version:content-type:content-transfer-encoding; bh=ZUjXwOOr8JnrtSf26Jw4i9SAqNZVupz/dTuvweACKU8=; b=cqF/gzKa8wmXuhylWdZ3NIVMYghYjRdEnodZi+BUAuZGe/180UYhS78HM6/2xlQpYl alqT+1Kmb8tkqr10HXI5al+aECHd2gEDpW7U07mZV3FG9CvO8MWrDf1XaaYcTE5d3FYI EQNgrLAeqSSE2dcsYVkyQ9C/1VILIPQPU2BzUtGYbG3o4kxRdP78+yWr4JmyI7mNfixH 3+PXf981yn1BeCWJnliGABrahIqrD6pR+vBaflUHZ0+ZmSqTixv6GtOKVEjWh9FasRrG EOSwU20yv5+zrqCkE1zelGGh5XAyatv0DqbFRX8ysjtZE6lvYrO+PTdwBInLuNge5ntf rDMQ== X-Received: by 10.204.186.208 with SMTP id ct16mr2591954bkb.165.1376123828313; Sat, 10 Aug 2013 01:37:08 -0700 (PDT) Received: from ernst.home (p4FCA6A8C.dip0.t-ipconnect.de. [79.202.106.140]) by mx.google.com with ESMTPSA id 14sm2504929bkl.17.2013.08.10.01.37.06 for (version=TLSv1.2 cipher=RC4-SHA bits=128/128); Sat, 10 Aug 2013 01:37:07 -0700 (PDT) Date: Sat, 10 Aug 2013 10:37:05 +0200 From: Gary Jennejohn To: David Wolfskill Subject: Re: CURRENT crashes with nvidia GPU BLOB : vm_radix_insert: key 23c078 is already present Message-ID: <20130810103705.022ce7be@ernst.home> In-Reply-To: <20130809171237.GN1746@albert.catwhisker.org> References: <20130808201018.1215f733@munin.geoinf.fu-berlin.de> <1375997961.1451.3.camel@localhost> <20130809073251.376c9206@munin.geoinf.fu-berlin.de> <20130809171237.GN1746@albert.catwhisker.org> X-Mailer: Claws Mail 3.9.2 (GTK+ 2.24.17; amd64-portbld-freebsd10.0) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Cc: FreeBSD CURRENT X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list Reply-To: gljennjohn@googlemail.com List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 10 Aug 2013 08:37:11 -0000 On Fri, 9 Aug 2013 10:12:37 -0700 David Wolfskill wrote: > On Fri, Aug 09, 2013 at 07:32:51AM +0200, O. Hartmann wrote: > > ... > > > > On 8 August 2013 11:10, O. Hartmann > > > > wrote: > > > > > The most recent CURRENT doesn't work with the x11/nvidia-driver > > > > > (which is at 319.25 in the ports and 325.15 from nVidia). > > > > > > > > > > After build- and installworld AND successfully rebuilding port > > > > > x11/nvidia-driver, the system crashes immediately after a reboot > > > > > as soon the kernel module nvidia.ko seems to get loaded (in my > > > > > case, I load nvidia.ko via /etc/rc.conf.local since the nVidia > > > > > BLOB doesn't load cleanly everytime when loaded > > > > > from /boot/loader.conf). > > > > > > > > > > The crash occurs on systems with default compilation options set > > > > > while building world and with settings like -O3 -march=native. It > > > > > doesn't matter. > > > > > > > > > > FreeBSD and the port x11/nvidia-driver has been compiled with > > > > > CLANG. > > > > > > > > > > Most recent FreeBSD revision still crashing is r254097. > > > > > > > > > > When vmcore is saved, I always see something like > > > > > > > > > > savecore: reboot after panic: vm_radix_insert: key 23c078 is > > > > > already present > > > > > > > > > > > > > > > Does anyone has any idea what's going on? > > > > > > > > > > Thanks for helping in advance, > > > > > > > > > > Oliver > > > > > > I'm seeing a complete deadlock on my T520 with today's current and > > > latest portsnap'd versions of ports for the nvidia-driver updates. > > > > > > A little bisection and help from others seems to point the finger at > > > Jeff's r254025 > > > > > > I'm getting a complete deadlock on X starting, but loading the module > > > seems to have no ill effects. > > > > > > Sean > > > > Rigth, I loaded the module also via /boot/loader.conf and it loads > > cleanly. I start xdm and then the deadlock occurs. > > > > I tried recompiling the whole xorg suite via "portmaster -f xorg xdm", > > it took a while, but no effect, still dying. > > ..... > > Sorry to be rather late to the party; the Internet connection I'm using > at the moment is a bit flaky. (I'm out of town.) > > I managed to get head/i386 @r254135 built and booting ... by removing > the "options DEBUG_MEMGUARD" from my kernel. > > However, that merely prevented a (very!) early panic, and got me to the > point where trying to start xdm with the x11/nvidia-driver as the > display driver causes an immediate reboot (no crash dump, despite > 'dumpdev="AUTO"' in /etc/rc.conf). No drop to debugger, either. > > Booting & starting xdm with the nv driver works -- that's my present > environment as I am typing this. > > However, the panic with DEBUG_MEMGUARD may offer a clue. Unfortunately, > it's early enough that screen lock/scrolling doesn't work, and I only > had the patience to write down partof the panic information. (This is > on my laptop; no serial console, AFAICT -- and no device to capture the > output if I did, since I'm not at home.) > > The top line of the screen (at the panic) reads: > > s/kern/subr_vmem.c:1050 > > The backtrace has the expected stuff near the top (about kbd, panic, and > memguard stuff); just below that is: > > vmem_alloc(c1226100,6681000,2,c1820cc0,3b5,...) at 0xc0ac5673=vmem_alloc+0x53/frame 0xc1820ca0 > > Caveat: that was hand-transcribed from the screen to papaer, then > hand-transcribed from paper to this email message. And my highest grade > in "Penmanship" was a D+. > > Be that as it may, here's the relevant section of subr_vmem.c with line > numbers (cut/pasted, so tabs get munged): > > 1039 /* > 1040 * vmem_alloc: allocate resource from the arena. > 1041 */ > 1042 int > 1043 vmem_alloc(vmem_t *vm, vmem_size_t size, int flags, vmem_addr_t *addrp) > 1044 { > 1045 const int strat __unused = flags & VMEM_FITMASK; > 1046 qcache_t *qc; > 1047 > 1048 flags &= VMEM_FLAGS; > 1049 MPASS(size > 0); > 1050 MPASS(strat == M_BESTFIT || strat == M_FIRSTFIT); > 1051 if ((flags & M_NOWAIT) == 0) > 1052 WITNESS_WARN(WARN_GIANTOK | WARN_SLEEPOK, NULL, "vmem_alloc"); > 1053 > 1054 if (size <= vm->vm_qcache_max) { > 1055 qc = &vm->vm_qcache[(size - 1) >> vm->vm_quantum_shift]; > 1056 *addrp = (vmem_addr_t)uma_zalloc(qc->qc_cache, flags); > 1057 if (*addrp == 0) > 1058 return (ENOMEM); > 1059 return (0); > 1060 } > 1061 > 1062 return vmem_xalloc(vm, size, 0, 0, 0, VMEM_ADDR_MIN, VMEM_ADDR_MAX, > 1063 flags, addrp); > 1064 } > > > This is at r254025. > The REINPLACE_CMD at line 160 of nvidia-driver/Makefile is incorrect. How do I know that? Because I made a patch which results in a working nvidia-driver-319.32 with r254050. That's what I'm running right now. Here's the patch (loaded with :r in vi, so all spaces etc. are correct): --- src/nvidia_subr.c.orig 2013-08-09 11:32:26.000000000 +0200 +++ src/nvidia_subr.c 2013-08-09 11:33:23.000000000 +0200 @@ -945,7 +945,7 @@ return ENOMEM; } - address = kmem_alloc_contig(kernel_map, size, flags, 0, + address = kmem_alloc_contig(kmem_arena, size, flags, 0, sc->dma_mask, PAGE_SIZE, 0, attr); if (!address) { status = ENOMEM; @@ -994,7 +994,7 @@ os_flush_cpu_cache(); if (at->pte_array[0].virtual_address != NULL) { - kmem_free(kernel_map, + kmem_free(kmem_arena, at->pte_array[0].virtual_address, at->size); malloc_type_freed(M_NVIDIA, at->size); } @@ -1021,7 +1021,7 @@ if (at->attr != VM_MEMATTR_WRITE_BACK) os_flush_cpu_cache(); - kmem_free(kernel_map, at->pte_array[0].virtual_address, + kmem_free(kmem_arena, at->pte_array[0].virtual_address, at->size); malloc_type_freed(M_NVIDIA, at->size); @@ -1085,7 +1085,7 @@ } for (i = 0; i < count; i++) { - address = kmem_alloc_contig(kernel_map, PAGE_SIZE, flags, 0, + address = kmem_alloc_contig(kmem_arena, PAGE_SIZE, flags, 0, sc->dma_mask, PAGE_SIZE, 0, attr); if (!address) { status = ENOMEM; @@ -1139,7 +1139,7 @@ for (i = 0; i < count; i++) { if (at->pte_array[i].virtual_address == 0) break; - kmem_free(kernel_map, + kmem_free(kmem_arena, at->pte_array[i].virtual_address, PAGE_SIZE); malloc_type_freed(M_NVIDIA, PAGE_SIZE); } @@ -1169,7 +1169,7 @@ os_flush_cpu_cache(); for (i = 0; i < count; i++) { - kmem_free(kernel_map, + kmem_free(kmem_arena, at->pte_array[i].virtual_address, PAGE_SIZE); malloc_type_freed(M_NVIDIA, PAGE_SIZE); } The primary differences are 1) use kmem_arena instead of kernel_map everywhere. The REINPLACE_CMD uses kernel_arena 2) DO NOT use kva_free, but kmem_free as previously To use the patch Delete or comment out the 4 lines starting at 160 in Makefile Run ``make patch'' cd work/NVIDIA-FreeBSD-x86_64-319.32/src patch < [wherever the patch is] cd ../../.. make deinstall install clean kldunload the old nvidia.ko kldload the new nvidia.ko start X -- Gary Jennejohn