From owner-freebsd-current@FreeBSD.ORG  Tue Feb 12 11:22:24 2013
Return-Path: <owner-freebsd-current@FreeBSD.ORG>
Delivered-To: freebsd-current@freebsd.org
Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115])
 by hub.freebsd.org (Postfix) with ESMTP id 23B72F49
 for <freebsd-current@freebsd.org>; Tue, 12 Feb 2013 11:22:24 +0000 (UTC)
 (envelope-from hlh@restart.be)
Received: from tignes.restart.be (tignes.restart.be
 [IPv6:2001:41d0:8:bdbe:0:1::])
 by mx1.freebsd.org (Postfix) with ESMTP id A0DCF804
 for <freebsd-current@freebsd.org>; Tue, 12 Feb 2013 11:22:23 +0000 (UTC)
Received: from restart.be (avoriaz.tunnel.bel [IPv6:2001:41d0:8:bdbe:1:ffff::])
 (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits))
 (Client CN "smtp.restart.be", Issuer "CA master" (verified OK))
 by tignes.restart.be (Postfix) with ESMTPS id 3Z51g20pmxzHYl;
 Tue, 12 Feb 2013 12:22:22 +0100 (CET)
DKIM-Filter: OpenDKIM Filter v2.7.4 tignes.restart.be 3Z51g20pmxzHYl
DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=restart.be; s=tignes;
 t=1360668142; bh=JzKS+0pEVPlBZKXja+RjesOgbugF4lgtcZt0siPtPJg=;
 h=Date:From:To:CC:Subject:References:In-Reply-To;
 z=Date:=20Tue,=2012=20Feb=202013=2012:22:20=20+0100|From:=20Henri=2
 0Hennebert=20<hlh@restart.be>|To:=20Brandon=20Gooch=20<jamesbrando
 ngooch@gmail.com>|CC:=20d@delphij.net,=20Konstantin=20Belousov=20<
 kostikbel@gmail.com>,=0D=0A=20=20=20=20=20=20=20=20freebsd-current
 @freebsd.org|Subject:=20Re:=20sysctl=20-a=20causes=20kernel=20trap
 =2012|References:=20<50EB602F.9050300@delphij.net>=20<201301080002
 33.GZ82219@kib.kiev.ua>=20<50EB63A9.50903@delphij.net>=20<CALBk6yK
 _+pcSA_Rgioe-2ed8KujpDK79GMG8jX3GMeqGV8ifrA@mail.gmail.com>=20<50E
 B870D.3020306@delphij.net>=20<50EF3FEC.60605@delphij.net>=20<CALBk
 6y+gYuTt4tqUUzn=3D8HMijtEbeSohrVScxHQ0Tq5AhUQQHA@mail.gmail.com>=2
 0<50F9B70A.5040305@delphij.net>=20<CALBk6yLZ7m=3D5-RAypz3C3DE2hjw8
 E8iTdXyOosfP8zMh+mqubw@mail.gmail.com>|In-Reply-To:=20<CALBk6yLZ7m
 =3D5-RAypz3C3DE2hjw8E8iTdXyOosfP8zMh+mqubw@mail.gmail.com>;
 b=fSW4hyDjEyAbKmFw9ie6YRRqfNuC2+SAeDaq058bI7mzj93TAM4EjJPmb671w3YC+
 yKrAPacpWmlpu1BR+ysR6BekGjhdBJk62PNtSICAWWAwNi/5jDAJM4NX2asboEi6q6
 FzVQOe61mdhR4tswxzQ1HYpzN9ZLPn+7wS8DU8qQsVEiaYmlkjYpXkL/vGc2GuCxAr
 eIheeZk0d2+sbA6VuliuDA3pB5ZlI/QmT+7duQ3VyFjp4rkpgsrAn26/B3zu1I9YKe
 BL0nm1W1AM8nlxdt6jySJ70kMXUbecPpouzSn9crKQnT5hfeRZM3Zr9xThPfsqIvmG
 X/Ln4JdyNANQw==
Received: from morzine.restart.bel (morzine.restart.be
 [IPv6:2001:41d0:8:bdbe:1:2::]) (authenticated bits=0)
 by restart.be (8.14.6/8.14.5) with ESMTP id r1CBMKxZ027708;
 Tue, 12 Feb 2013 12:22:20 +0100 (CET) (envelope-from hlh@restart.be)
Message-ID: <511A25EC.8070000@restart.be>
Date: Tue, 12 Feb 2013 12:22:20 +0100
From: Henri Hennebert <hlh@restart.be>
Organization: RestartSoft
User-Agent: Mozilla/5.0 (X11; FreeBSD i386;
 rv:17.0) Gecko/20130207 Thunderbird/17.0.2
MIME-Version: 1.0
To: Brandon Gooch <jamesbrandongooch@gmail.com>
Subject: Re: sysctl -a causes kernel trap 12
References: <50EB602F.9050300@delphij.net>
 <20130108000233.GZ82219@kib.kiev.ua> <50EB63A9.50903@delphij.net>
 <CALBk6yK_+pcSA_Rgioe-2ed8KujpDK79GMG8jX3GMeqGV8ifrA@mail.gmail.com>
 <50EB870D.3020306@delphij.net> <50EF3FEC.60605@delphij.net>
 <CALBk6y+gYuTt4tqUUzn=8HMijtEbeSohrVScxHQ0Tq5AhUQQHA@mail.gmail.com>
 <50F9B70A.5040305@delphij.net>
 <CALBk6yLZ7m=5-RAypz3C3DE2hjw8E8iTdXyOosfP8zMh+mqubw@mail.gmail.com>
In-Reply-To: <CALBk6yLZ7m=5-RAypz3C3DE2hjw8E8iTdXyOosfP8zMh+mqubw@mail.gmail.com>
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Cc: Konstantin Belousov <kostikbel@gmail.com>, freebsd-current@freebsd.org,
 d@delphij.net
X-BeenThere: freebsd-current@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Discussions about the use of FreeBSD-current
 <freebsd-current.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-current>, 
 <mailto:freebsd-current-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-current>
List-Post: <mailto:freebsd-current@freebsd.org>
List-Help: <mailto:freebsd-current-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>,
 <mailto:freebsd-current-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 12 Feb 2013 11:22:24 -0000

On 01/19/2013 06:58, Brandon Gooch wrote:
> On Fri, Jan 18, 2013 at 2:56 PM, Xin Li <delphij@delphij.net> wrote:
> 
>> -----BEGIN PGP SIGNED MESSAGE-----
>> Hash: SHA512
>>
>> On 01/18/13 12:50, Brandon Gooch wrote:
>>> On Thu, Jan 10, 2013 at 4:25 PM, Xin Li <delphij@delphij.net
>>> <mailto:delphij@delphij.net>> wrote:
>>>
>>> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA256
>>>
>>> To all: this became more and more hard to replicate lately.  I've
>>> tried these options and the most important progress is that it's
>>> possible to get a crashdump when debug.debugger_on_panic=0 and I
>>> managed to get a backtrace which indicates the panic occur when
>>> trying to do mtx_lock(&Giant) -> __mtx_lock_sleep -> turnstile_wait
>>> -> propagate_priority, but after I've added some instruments to
>>> the surrounding code and enabled INVARIANT and/or WITNESS, it
>>> mysteriously went away.
>>>
>>> Reverting my instruments code and update to latest svn makes the
>>> issue disappear for one day.  I've hit it again today but
>>> unfortunately didn't get a successful dump and after reboot I can't
>>> reproduce it again :(
>>>
>>> Still trying...
>>>
>>>
>>> Any updates Xin?
>>
>> No, it mysteriously disappeared for now.  According to my
>> understanding to recent svn commits, I didn't see anybody committing
>> something that fixes it but I can no longer panic my system, with or
>> without debugging code :(
>>
>>> I was actually hitting what I believe to be exactly the same issue
>>> as you on one of my systems, and, as you've seen, adding any extra
>>> debugging or diagnostics seemed to eliminate the issue.
>>>
>>> I was able to generate quite a few vmcores and still have these
>>> sitting around in my filesystem (along with the kernels that helped
>>> produce them).
>>>
>>> I can recreate this crash on my system by compiling the NVIDIA
>>> driver with clang at -01 and above. Although it's been noted that
>>> this issue has been seen in scenarios without an NIVIDIA driver in
>>> the mix, whatever is happening in the kernel to cause the panic is
>>> somehow triggered by this, at least on my system.
>>
>> I'm not sure if this is the same problem.  Could you please try using
>> gcc to compile the nVIdia driver and see if that "fixes" the problem?
>>
>> Cheers,
>> - --
>> Xin LI <delphij@delphij.net>    https://www.delphij.net/
>> FreeBSD - The Power to Serve!           Live free or die
>>
> 
> Indeed, a gcc compiled NVIDIA module eliminates the issue, sorry if I
> hadn't mentioned this earlier.
> 
> What was happening to me at first was that my system would just hang while
> booting. I was able to figure out that it was during /etc/rc.d/initrandom.
> I actually got to a point where I removed the call to sysctl -a from
> 'better_than_nothing()' in /etc/rc.d/initrandom to have a booting system. I
> finally had a situation where I could get a panic by adding SW_WATCHDOG to
> my kernel and running watchdogd(8).
> 
> For me, this panic would come and go seemingly at random as well, and I
> couldn't fumble my way around in the debugger to learn much of anything
> when I first started seeing it. I just started a process of modularizing
> everything I could in my kernel config, then loading modules 1-by-1 and
> booting over-and-over until I finally found what appeared to be the
> problem, which was the NVIDIA module compiled with clang.
> 
> Oh, another thing: at times it seemed as though it was the number of
> modules loaded, as I could get the hang with 41 modules loaded, but not 40
> or 42?! I admit, when I was seeing that behavior, I hadn't eliminated the
> NVIDIA driver from my loaded modules. I need to revisit the panic situation
> to confirm this particular strangeness.
> 
> Here's the last panic I had:
> 
> Unread portion of the kernel message buffer:
>                         = DPL 0, pres 1, long 1, def32 0, gran 1
> processor eflags        = interrupt enabled, resume, IOPL = 0
> current process         = 1175 (sysctl)
> 
> (kgdb) bt
> #0  doadump (textdump=1694704112) at pcpu.h:229
> #1  0xffffffff802fab82 in db_fncall (dummy1=<value optimized out>,
> dummy2=<value optimized out>, dummy3=<value optimized out>, dummy4=<value
> optimized out>) at /usr/src/sys/ddb/db_command.c:578
> #2  0xffffffff802fa85a in db_command (last_cmdp=<value optimized out>,
> cmd_table=<value optimized out>, dopager=1) at
> /usr/src/sys/ddb/db_command.c:449
> #3  0xffffffff802fa612 in db_command_loop () at
> /usr/src/sys/ddb/db_command.c:502
> #4  0xffffffff802fcf60 in db_trap (type=<value optimized out>, code=0) at
> /usr/src/sys/ddb/db_main.c:231
> #5  0xffffffff804a7b93 in kdb_trap (type=12, code=0, tf=<value optimized
> out>) at /usr/src/sys/kern/subr_kdb.c:654
> #6  0xffffffff807157c5 in trap_fatal (frame=0xffffff8865032670, eva=<value
> optimized out>) at /usr/src/sys/amd64/amd64/trap.c:867
> #7  0xffffffff80715adb in trap_pfault (frame=0x0, usermode=0) at
> /usr/src/sys/amd64/amd64/trap.c:698
> #8  0xffffffff8071529b in trap (frame=0xffffff8865032670) at
> /usr/src/sys/amd64/amd64/trap.c:463
> #9  0xffffffff806ff382 in calltrap () at exception.S:228
> #10 0xffffffff8047bd50 in sysctl_sysctl_next_ls (lsp=<value optimized out>,
> name=0xffffff8865032a80, namelen=<value optimized out>,
> next=0xffffff8865032898, len=0xffffff8865032904, level=3) at
> /usr/src/sys/kern/kern_sysctl.c:759
> #11 0xffffffff8047be5e in sysctl_sysctl_next_ls (lsp=0xfffffe000d3f0080,
> name=0xffffff8865032a7c, namelen=<value optimized out>,
> next=0xffffff8865032894, len=0xffffff8865032904, level=2) at
> /usr/src/sys/kern/kern_sysctl.c:786
> #12 0xffffffff8047be5e in sysctl_sysctl_next_ls (lsp=0xfffffe000d3f0080,
> name=0xffffff8865032a78, namelen=<value optimized out>,
> next=0xffffff8865032890, len=0xffffff8865032904, level=1) at
> /usr/src/sys/kern/kern_sysctl.c:786
> #13 0xffffffff8047bca3 in sysctl_sysctl_next (oidp=<value optimized out>,
> arg1=0xffffff8865032a78, arg2=4, req=0xffffff88650329a8) at
> /usr/src/sys/kern/kern_sysctl.c:808
> #14 0xffffffff8047b03f in sysctl_root (arg1=<value optimized out>,
> arg2=<value optimized out>) at /usr/src/sys/kern/kern_sysctl.c:1513
> #15 0xffffffff8047b5d8 in userland_sysctl (td=<value optimized out>,
> name=0xffffff8865032a70, namelen=<value optimized out>, old=<value
> optimized out>, oldlenp=<value optimized out>, inkernel=<value optimized
> out>, new=<value optimized out>, newlen=<value optimized out>,
>     retval=<value optimized out>, flags=1694706064) at
> /usr/src/sys/kern/kern_sysctl.c:1623
> #16 0xffffffff8047b3c4 in sys___sysctl (td=0xfffffe001e2d4900,
> uap=0xffffff8865032b80) at /usr/src/sys/kern/kern_sysctl.c:1549
> #17 0xffffffff807160f7 in amd64_syscall (td=0xfffffe001e2d4900, traced=0)
> at subr_syscall.c:135
> #18 0xffffffff806ff66b in Xfast_syscall () at exception.S:387
> #19 0x000000080093697a in ?? ()
> Previous frame inner to this frame (corrupt stack?)
> Current language:  auto; currently minimal
> 
> Any ideas on where to look through this vmcore?
> 
> -Brandon

FWIW

Just going from 9.1-STABLE r245423M to 9.1-STABLE #0 r246457M trigger
this problem.

I drop sysctl -a from /etc/rc.d/initrandom and all is back to normal.

I have nvidia-driver-304.64 compiled with gcc as for all my ports.

Henri