Date: Wed, 13 Feb 2013 17:41:14 +0100 From: Henri Hennebert <hlh@restart.be> To: freebsd-current@freebsd.org, freebsd-stable@freebsd.org Subject: Re: sysctl -a causes kernel trap 12 Message-ID: <511BC22A.1030701@restart.be> In-Reply-To: <511A25EC.8070000@restart.be> References: <50EB602F.9050300@delphij.net> <20130108000233.GZ82219@kib.kiev.ua> <50EB63A9.50903@delphij.net> <CALBk6yK_%2BpcSA_Rgioe-2ed8KujpDK79GMG8jX3GMeqGV8ifrA@mail.gmail.com> <50EB870D.3020306@delphij.net> <50EF3FEC.60605@delphij.net> <CALBk6y%2BgYuTt4tqUUzn=8HMijtEbeSohrVScxHQ0Tq5AhUQQHA@mail.gmail.com> <50F9B70A.5040305@delphij.net> <CALBk6yLZ7m=5-RAypz3C3DE2hjw8E8iTdXyOosfP8zMh%2Bmqubw@mail.gmail.com> <511A25EC.8070000@restart.be>
next in thread | previous in thread | raw e-mail | index | archive | help
On 02/12/2013 12:22, Henri Hennebert wrote: > On 01/19/2013 06:58, Brandon Gooch wrote: >> On Fri, Jan 18, 2013 at 2:56 PM, Xin Li <delphij@delphij.net> wrote: >> >>> -----BEGIN PGP SIGNED MESSAGE----- >>> Hash: SHA512 >>> >>> On 01/18/13 12:50, Brandon Gooch wrote: >>>> On Thu, Jan 10, 2013 at 4:25 PM, Xin Li <delphij@delphij.net >>>> <mailto:delphij@delphij.net>> wrote: >>>> >>>> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA256 >>>> >>>> To all: this became more and more hard to replicate lately. I've >>>> tried these options and the most important progress is that it's >>>> possible to get a crashdump when debug.debugger_on_panic=0 and I >>>> managed to get a backtrace which indicates the panic occur when >>>> trying to do mtx_lock(&Giant) -> __mtx_lock_sleep -> turnstile_wait >>>> -> propagate_priority, but after I've added some instruments to >>>> the surrounding code and enabled INVARIANT and/or WITNESS, it >>>> mysteriously went away. >>>> >>>> Reverting my instruments code and update to latest svn makes the >>>> issue disappear for one day. I've hit it again today but >>>> unfortunately didn't get a successful dump and after reboot I can't >>>> reproduce it again :( >>>> >>>> Still trying... >>>> >>>> >>>> Any updates Xin? >>> >>> No, it mysteriously disappeared for now. According to my >>> understanding to recent svn commits, I didn't see anybody committing >>> something that fixes it but I can no longer panic my system, with or >>> without debugging code :( >>> >>>> I was actually hitting what I believe to be exactly the same issue >>>> as you on one of my systems, and, as you've seen, adding any extra >>>> debugging or diagnostics seemed to eliminate the issue. >>>> >>>> I was able to generate quite a few vmcores and still have these >>>> sitting around in my filesystem (along with the kernels that helped >>>> produce them). >>>> >>>> I can recreate this crash on my system by compiling the NVIDIA >>>> driver with clang at -01 and above. Although it's been noted that >>>> this issue has been seen in scenarios without an NIVIDIA driver in >>>> the mix, whatever is happening in the kernel to cause the panic is >>>> somehow triggered by this, at least on my system. >>> >>> I'm not sure if this is the same problem. Could you please try using >>> gcc to compile the nVIdia driver and see if that "fixes" the problem? >>> >>> Cheers, >>> - -- >>> Xin LI <delphij@delphij.net> https://www.delphij.net/ >>> FreeBSD - The Power to Serve! Live free or die >>> >> >> Indeed, a gcc compiled NVIDIA module eliminates the issue, sorry if I >> hadn't mentioned this earlier. >> >> What was happening to me at first was that my system would just hang while >> booting. I was able to figure out that it was during /etc/rc.d/initrandom. >> I actually got to a point where I removed the call to sysctl -a from >> 'better_than_nothing()' in /etc/rc.d/initrandom to have a booting system. I >> finally had a situation where I could get a panic by adding SW_WATCHDOG to >> my kernel and running watchdogd(8). >> >> For me, this panic would come and go seemingly at random as well, and I >> couldn't fumble my way around in the debugger to learn much of anythingfreebsd-current@freebsd.org >> when I first started seeing it. I just started a process of modularizing >> everything I could in my kernel config, then loading modules 1-by-1 and >> booting over-and-over until I finally found what appeared to be the >> problem, which was the NVIDIA module compiled with clang. >> >> Oh, another thing: at times it seemed as though it was the number of >> modules loaded, as I could get the hang with 41 modules loaded, but not 40 >> or 42?! I admit, when I was seeing that behavior, I hadn't eliminated the >> NVIDIA driver from my loaded modules. I need to revisit the panic situation >> to confirm this particular strangeness. >> >> Here's the last panic I had: >> >> Unread portion of the kernel message buffer: >> = DPL 0, pres 1, long 1, def32 0, gran 1 >> processor eflags = interrupt enabled, resume, IOPL = 0 >> current process = 1175 (sysctl) >> >> (kgdb) bt >> #0 doadump (textdump=1694704112) at pcpu.h:229 >> #1 0xffffffff802fab82 in db_fncall (dummy1=<value optimized out>, >> dummy2=<value optimized out>, dummy3=<value optimized out>, dummy4=<value >> optimized out>) at /usr/src/sys/ddb/db_command.c:578 >> #2 0xffffffff802fa85a in db_command (last_cmdp=<value optimized out>, >> cmd_table=<value optimized out>, dopager=1) at >> /usr/src/sys/ddb/db_command.c:449 >> #3 0xffffffff802fa612 in db_command_loop () at >> /usr/src/sys/ddb/db_command.c:502 >> #4 0xffffffff802fcf60 in db_trap (type=<value optimized out>, code=0) at >> /usr/src/sys/ddb/db_main.c:231 >> #5 0xffffffff804a7b93 in kdb_trap (type=12, code=0, tf=<value optimized >> out>) at /usr/src/sys/kern/subr_kdb.c:654 >> #6 0xffffffff807157c5 in trap_fatal (frame=0xffffff8865032670, eva=<value >> optimized out>) at /usr/src/sys/amd64/amd64/trap.c:867 >> #7 0xffffffff80715adb in trap_pfault (frame=0x0, usermode=0) at >> /usr/src/sys/amd64/amd64/trap.c:698 >> #8 0xffffffff8071529b in trap (frame=0xffffff8865032670) at >> /usr/src/sys/amd64/amd64/trap.c:463 >> #9 0xffffffff806ff382 in calltrap () at exception.S:228 >> #10 0xffffffff8047bd50 in sysctl_sysctl_next_ls (lsp=<value optimized out>, >> name=0xffffff8865032a80, namelen=<value optimized out>, >> next=0xffffff8865032898, len=0xffffff8865032904, level=3) at >> /usr/src/sys/kern/kern_sysctl.c:759 >> #11 0xffffffff8047be5e in sysctl_sysctl_next_ls (lsp=0xfffffe000d3f0080, >> name=0xffffff8865032a7c, namelen=<value optimized out>, >> next=0xffffff8865032894, len=0xffffff8865032904, level=2) at >> /usr/src/sys/kern/kern_sysctl.c:786 >> #12 0xffffffff8047be5e in sysctl_sysctl_next_ls (lsp=0xfffffe000d3f0080, >> name=0xffffff8865032a78, namelen=<value optimized out>, >> next=0xffffff8865032890, len=0xffffff8865032904, level=1) at >> /usr/src/sys/kern/kern_sysctl.c:786 >> #13 0xffffffff8047bca3 in sysctl_sysctl_next (oidp=<value optimized out>, >> arg1=0xffffff8865032a78, arg2=4, req=0xffffff88650329a8) at >> /usr/src/sys/kern/kern_sysctl.c:808 >> #14 0xffffffff8047b03f in sysctl_root (arg1=<value optimized out>, >> arg2=<value optimized out>) at /usr/src/sys/kern/kern_sysctl.c:1513 >> #15 0xffffffff8047b5d8 in userland_sysctl (td=<value optimized out>, >> name=0xffffff8865032a70, namelen=<value optimized out>, old=<value >> optimized out>, oldlenp=<value optimized out>, inkernel=<value optimized >> out>, new=<value optimized out>, newlen=<value optimized out>, >> retval=<value optimized out>, flags=1694706064) at >> /usr/src/sys/kern/kern_sysctl.c:1623 >> #16 0xffffffff8047b3c4 in sys___sysctl (td=0xfffffe001e2d4900, >> uap=0xffffff8865032b80) at /usr/src/sys/kern/kern_sysctl.c:1549 >> #17 0xffffffff807160f7 in amd64_syscall (td=0xfffffe001e2d4900, traced=0) >> at subr_syscall.c:135 >> #18 0xffffffff806ff66b in Xfast_syscall () at exception.S:387 >> #19 0x000000080093697a in ?? () >> Previous frame inner to this frame (corrupt stack?) >> Current language: auto; currently minimal >> >> Any ideas on where to look through this vmcore? >> >> -Brandon > > FWIW > > Just going from 9.1-STABLE r245423M to 9.1-STABLE #0 r246457M trigger > this problem. > > I drop sysctl -a from /etc/rc.d/initrandom and all is back to normal. > > I have nvidia-driver-304.64 compiled with gcc as for all my ports. > > Henri Just a follow up: sysctl hw.nvidia generate a page fault: morzine.restart.bel dumped core - see /var/crash/vmcore.86 Wed Feb 13 17:29:14 CET 2013 FreeBSD morzine.restart.bel 9.1-STABLE FreeBSD 9.1-STABLE #0 r246457M: Thu Feb 7 15:09:16 CET 2013 root@morzine.restart.bel:/usr/obj/usr/src/sys/MORZINE i386 panic: page fault GNU gdb 6.1.1 [FreeBSD] Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "i386-marcel-freebsd"... Unread portion of the kernel message buffer: kernel trap 12 with interrupts disabled Fatal trap 12: page fault while in kernel mode cpuid = 1; apic id = 01 fault virtual address = 0x14 fault code = supervisor read, page not present instruction pointer = 0x20:0xa07647d4 stack pointer = 0x28:0xfd1f0ac8 frame pointer = 0x28:0xfd1f0aec code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, def32 1, gran 1 processor eflags = resume, IOPL = 0 current process = 2369 (sysctl) trap number = 12 panic: page fault cpuid = 1 KDB: stack backtrace: db_trace_self_wrapper(a0a44a64,0,70617200,46,fc,...) at db_trace_self_wrapper+0x2d/frame 0xfd1f07c0 kdb_backtrace(a0a760c5,1,a0a0bc3c,fd1f087c,1,...) at kdb_backtrace+0x30/frame 0xfd1f0828 panic(a0a0bc3c,a0a76eaf,cfcb0d74,1,1,...) at panic+0x1bb/frame 0xfd1f0870 trap_fatal(fd1f0900,a09d2ee1,a0b17130,b5592000,0,...) at trap_fatal+0x33a/frame 0xfd1f08c0 trap_pfault(14,c,1,ffffffff,fd1f0994,...) at trap_pfault+0x31d/frame 0xfd1f0940 trap(fd1f0a88) at trap+0x4ef/frame 0xfd1f0a7c calltrap() at calltrap+0x6/frame 0xfd1f0a7c --- trap 0xc, eip = 0xa07647d4, esp = 0xfd1f0ac8, ebp = 0xfd1f0aec --- turnstile_broadcast(0,0,a7b80b40,0,fd1f0b38,...) at turnstile_broadcast+0xa4/frame 0xfd1f0aec _mtx_unlock_sleep(a0b6a00c,0,0,0,fd1f0b58,...) at _mtx_unlock_sleep+0x57/frame 0xfd1f0b04 sysctl_root(fd1f0b58,fd1f0b64,4,a09c87fe,bfc0d450,...) at sysctl_root+0x248/frame 0xfd1f0b38 userland_sysctl(cfcb0bc0,fd1f0bd4,5,0,9fbfca2c,...) at userland_sysctl+0x1da/frame 0xfd1f0b9c sys___sysctl(cfcb0bc0,fd1f0cc8,1,fd1f0cb0,0,...) at sys___sysctl+0x95/frame 0xfd1f0c40 syscall(fd1f0d08) at syscall+0x452/frame 0xfd1f0cfc Xint0x80_syscall() at Xint0x80_syscall+0x21/frame 0xfd1f0cfc --- syscall (202, FreeBSD ELF32, sys___sysctl), eip = 0x33d65f6b, esp = 0x9fbfc9e4, ebp = 0x9fbfd2ac --- Uptime: 5h45m16s Physical memory: 3046 MB <CLIP> (kgdb) #0 doadump (textdump=1) at pcpu.h:249 #1 0xa071b78a in kern_reboot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:448 #2 0xa071bc17 in panic (fmt=<value optimized out>) at /usr/src/sys/kern/kern_shutdown.c:636 #3 0xa09cc21a in trap_fatal (frame=<value optimized out>, eva=20) at /usr/src/sys/i386/i386/trap.c:1043 #4 0xa09cc54d in trap_pfault (frame=0x0, usermode=<value optimized out>, eva=0) at /usr/src/sys/i386/i386/trap.c:858 #5 0xa09cbb3f in trap (frame=0xfd1f0a88) at /usr/src/sys/i386/i386/trap.c:555 #6 0xa09b5c0c in calltrap () at exception.s:169 #7 0xa07647d4 in turnstile_broadcast (ts=0x0, queue=0) at /usr/src/sys/kern/subr_turnstile.c:837 #8 0xa0707217 in _mtx_unlock_sleep (m=0xa0b6a00c, opts=-48297228, file=0xfd1f0af4 "", line=-48297228) at /usr/src/sys/kern/kern_mutex.c:715 #9 0xa0728418 in sysctl_root (arg1=<value optimized out>, arg2=<value optimized out>) at /usr/src/sys/kern/kern_sysctl.c:1515 #10 0xa072899a in userland_sysctl (td=0x4, old=<value optimized out>, oldlenp=<value optimized out>, inkernel=<value optimized out>, new=<value optimized out>, newlen=<value optimized out>, retval=<value optimized out>, flags=-1603107360) at /usr/src/sys/kern/kern_sysctl.c:1623 #11 0xa0728785 in sys___sysctl (uap=0xfd1f0cc8) at /usr/src/sys/kern/kern_sysctl.c:1549 #12 0xa09ccc22 in syscall (frame=<value optimized out>) at subr_syscall.c:135 #13 0xa09b5ca1 in Xint0x80_syscall () at exception.s:267 #14 0x00000033 in ?? () Previous frame inner to this frame (corrupt stack?) Current language: auto; currently minimal Henri
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?511BC22A.1030701>