From owner-freebsd-current@freebsd.org Sun Jul 8 02:55:08 2018 Return-Path: Delivered-To: freebsd-current@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id CF346FDD284 for ; Sun, 8 Jul 2018 02:55:07 +0000 (UTC) (envelope-from pete@nomadlogic.org) Received: from vps-mail.nomadlogic.org (mail.nomadlogic.org [IPv6:2607:f2f8:a098::2]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 3DE417BAA3; Sun, 8 Jul 2018 02:55:07 +0000 (UTC) (envelope-from pete@nomadlogic.org) Received: from [192.168.1.106] (cpe-23-243-162-239.socal.res.rr.com [23.243.162.239]) by vps-mail.nomadlogic.org (OpenSMTPD) with ESMTPSA id 033e05da TLS version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NO; Sat, 7 Jul 2018 19:55:04 -0700 (PDT) Subject: Re: atomic changes break drm-next-kmod? To: Hans Petter Selasky , Johannes Lundberg , Konstantin Belousov Cc: Niclas Zeising , Warner Losh , jhb@freebsd.org, ohartmann@walstatt.org, freebsd-current References: <4c5411dd-9f6b-7245-6ade-e11040f74687@FreeBSD.org> <24f5d737-a205-6fcc-0a33-a84601d2ff7a@nomadlogic.org> <29ce4eab-6667-d2ca-b5d8-3deeef28f142@selasky.org> <20180705193646.GM5562@kib.kiev.ua> <5dc2a315-4b71-9ff0-0a37-576649e9144b@FreeBSD.org> <4797c607-c261-77f7-eccf-45056bf56694@daemonic.se> <20180706084729.GN5562@kib.kiev.ua> <1cd8c3c6-5d85-fbf3-cc06-1df8282216a1@selasky.org> From: Pete Wright Message-ID: Date: Sat, 7 Jul 2018 19:54:59 -0700 User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:52.0) Gecko/20100101 Thunderbird/52.9.0 MIME-Version: 1.0 In-Reply-To: <1cd8c3c6-5d85-fbf3-cc06-1df8282216a1@selasky.org> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit Content-Language: en-US X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.27 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 08 Jul 2018 02:55:08 -0000 On 07/06/2018 03:15, Hans Petter Selasky wrote: > On 07/06/18 11:14, Johannes Lundberg wrote: >> On Fri, Jul 6, 2018 at 9:49 AM Konstantin Belousov >> wrote: >> >>> On Fri, Jul 06, 2018 at 09:52:24AM +0200, Niclas Zeising wrote: >>>> On 07/06/18 00:02, Warner Losh wrote: >>>>> >>>>> >>>>> On Thu, Jul 5, 2018 at 1:44 PM, John Baldwin >>>> > wrote: >>>>> >>>>>      On 7/5/18 12:36 PM, Konstantin Belousov wrote: >>>>>       > On Thu, Jul 05, 2018 at 09:12:24PM +0200, Hans Petter Selasky >>> wrote: >>>>>       >> On 07/05/18 20:59, Hans Petter Selasky wrote: >>>>>       >>> On 07/05/18 19:48, Pete Wright wrote: >>>>>       >>>> >>>>>       >>>> >>>>>       >>>> On 07/05/2018 10:10, John Baldwin wrote: >>>>>       >>>>> On 7/3/18 5:10 PM, Pete Wright wrote: >>>>>       >>>>>> >>>>>       >>>>>> On 07/03/2018 15:56, John Baldwin wrote: >>>>>       >>>>>>> On 7/3/18 3:34 PM, Pete Wright wrote: >>>>>       >>>>>>>> On 07/03/2018 15:29, John Baldwin wrote: >>>>>       >>>>>>>>> That seems like kgdb is looking at the wrong CPU.  >>>>> Can >>>>>      you use >>>>>       >>>>>>>>> 'info threads' and look for threads not stopped in >>>>>      'sched_switch' >>>>>       >>>>>>>>> and get their backtraces?  You could also just do >>> 'thread >>>>>      apply >>>>>       >>>>>>>>> all bt' and put that file at a URL if that is >>>>> easiest. >>>>>       >>>>>>>>> >>>>>       >>>>>>>> sure thing John - here's a gist of "thread apply >>>>> all bt" >>>>>       >>>>>>>> >>>>>       >>>>>>>> >>>>> https://gist.github.com/gem-pete/d8d7ab220dc8781f0827f965f09d43ed >>>>> >>> >>>>>       >>>>>>> That doesn't look right at all.  Are you sure the >>>>> kernel >>>>>      matches the >>>>>       >>>>>>> vmcore?  Also, which kgdb version are you using? >>>>>       >>>>>>> >>>>>       >>>>>> yea i agree that doesn't look right at all.  here is my >>> setup: >>>>>       >>>>>> >>>>>       >>>>>> $ which kgdb >>>>>       >>>>>> /usr/bin/kgdb >>>>>       >>>>>> $ kgdb >>>>>       >>>>>> GNU gdb 6.1.1 [FreeBSD] >>>>>       >>>>>> $ ls -lh /var/crash/vmcore.1 >>>>>       >>>>>> -rw-------  1 root  wheel 1.6G Jul  3 15:03 >>>>>      /var/crash/vmcore.1 >>>>>       >>>>>> $ ls -l /usr/lib/debug/boot/kernel/kernel.debug >>>>>       >>>>>> -r-xr-xr-x  1 root  wheel 87840496 Jul  3 13:54 >>>>>       >>>>>> /usr/lib/debug/boot/kernel/kernel.debug >>>>>       >>>>>> >>>>>       >>>>>> and i invoke kgdb like so: >>>>>       >>>>>> $ sudo kgdb /usr/lib/debug/boot/kernel/kernel.debug >>>>>      /var/crash/vmcore.1 >>>>>       >>>>>> >>>>>       >>>>>> here's a gist of my full gdb session: >>>>>       >>>>>> http://termbin.com/krsn >>>>>       >>>>>> >>>>>       >>>>>> dunno - maybe i have a bad core dump?  regardless, more >>> than >>>>>      happy to >>>>>       >>>>>> help so let me know if i should try anything else or >>> patches >>>>>      etc.. >>>>>       >>>>> Can you try installing gdb from ports and using >>>>>      /usr/local/bin/kgdb? >>>>>       >>>>> >>>>>       >>>> >>>>>       >>>> that seems to have done the trick, at least the output >>>>> looks >>> more >>>>>       >>>> encouraging. >>>>>       >>>> >>>>>       >>>>   --- trap 0, rip = 0, rsp = 0, rbp = 0 --- >>>>>       >>>> KDB: enter: panic >>>>>       >>>> >>>>>       >>>> __curthread () at ./machine/pcpu.h:231 >>>>>       >>>> 231        __asm("movq %%gs:%1,%0" : "=r" (td) >>>>>       >>>> >>>>>       >>>> >>>>>       >>>> here's my full kgdb session: >>>>>       >>>> http://termbin.com/qa4f >>>>>       >>>> >>>>>       >>>> i don't see any threads not in "sched_switch" though :( >>>>>       >>> >>>>>       >>> Hi, >>>>>       >>> >>>>>       >>> The problem may be that the patch to enable atomic inlining >>> of all >>>>>       >>> macros forgot to set the SMP keyword which means SMP is not >>>>>      defined at >>>>>       >>> all for KLD's so all non-kernel atomic usage is with >>>>> MPLOCKED >>>>>      empty! >>>>>       > Problem is that out-of-tree modules build does not have >>>>> opt*.h >>> files >>>>>       > from the kernel.  UP config is a valid one, flipping some >>> option's >>>>>       > default value does not solve the problem. >>>>> >>>>>      Yes, but using the lock prefix in a generic module is ok (it >>>>> will >>> still >>>>>      work, just not quite as fast) whereas the lack of lock is >>>>> fatal on >>>>>      SMP.  I would amend Hans' patch slightly to honor the opt_* >>>>> setting >>>>>      for KLD_TIED (but that is only true if KLD_TIED means "built as >>> part of >>>>>      a kernel build, so has valid opt_foo.h headers" and not >>>>>      'a standalone module where someone put MODULES_TIED=1 on the >>> command >>>>>      line >>>>>      to make'). >>>>> >>>>> >>>>> I agree with this default. It's sensible to default to (a) the most >>>>> popular thing and (b) thing that always works, especially when (a) >>>>> and >>>>> (b) are identical. >>>>> >>>>> Don't make me start the "Do we really need an SMP option, why not >>>>> make >>>>> it always on" thread :) The number of relevant uniprocessor x86 boxes >>>>> that benefit from omitting SMP is so small as to be irrelevant, IMHO. >>> A >>>>> MP kernel runs just fine on them... >>>>> >>>>> Warner >>>> >>>> Where are we on this? >>>> It is important to get it fixed, it's already been 4 days, which >>>> means 4 >>>> days of all modern FreeBSD desktop systems being broken, and possibly >>>> other systems with kernel modules from ports as well. >>>> >>>> >>>> Another question, how hard would it be to expose how the kernel was >>>> built to modules built from ports, so that they can figure out stuff >>>> like SMP and others, that might affect the module build? >>> Point the KERNBUILDDIR variable to the directory of the kernel build. >>> This is the directory where *.o and opt*.h are located.  Then >>> everything >>> would just work. >>> >> >> Is the solution that we require everyone to build a kernel before >> they can >> build the standalone modules or am I missing something here? >> > > Hi, > > Here is a temporary fix: > https://svnweb.freebsd.org/changeset/base/336025 > > Like Konstantin says this issue needs to be revisited. > this patch has been stable for me for a couple days now after rebuilding drm-next under the new kernel containing this update.  we may want to kick-off an update of the drm-next pkg if that hasn't happened already.  the old package caused periodic kernel-panics on my end. cheers, -pete -- Pete Wright pete@nomadlogic.org @nomadlogicLA