From owner-freebsd-current@freebsd.org Fri Jul 6 10:16:18 2018 Return-Path: Delivered-To: freebsd-current@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 2EB561039DB8 for ; Fri, 6 Jul 2018 10:16:18 +0000 (UTC) (envelope-from hps@selasky.org) Received: from mail.turbocat.net (turbocat.net [88.99.82.50]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id AF99077DBB; Fri, 6 Jul 2018 10:16:17 +0000 (UTC) (envelope-from hps@selasky.org) Received: from hps2016.home.selasky.org (unknown [62.141.128.70]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.turbocat.net (Postfix) with ESMTPSA id D3CB3260192; Fri, 6 Jul 2018 12:16:14 +0200 (CEST) Subject: Re: atomic changes break drm-next-kmod? To: Johannes Lundberg , Konstantin Belousov Cc: Niclas Zeising , Warner Losh , jhb@freebsd.org, Pete Wright , ohartmann@walstatt.org, freebsd-current References: <4c5411dd-9f6b-7245-6ade-e11040f74687@FreeBSD.org> <24f5d737-a205-6fcc-0a33-a84601d2ff7a@nomadlogic.org> <29ce4eab-6667-d2ca-b5d8-3deeef28f142@selasky.org> <20180705193646.GM5562@kib.kiev.ua> <5dc2a315-4b71-9ff0-0a37-576649e9144b@FreeBSD.org> <4797c607-c261-77f7-eccf-45056bf56694@daemonic.se> <20180706084729.GN5562@kib.kiev.ua> From: Hans Petter Selasky Message-ID: <1cd8c3c6-5d85-fbf3-cc06-1df8282216a1@selasky.org> Date: Fri, 6 Jul 2018 12:15:55 +0200 User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:52.0) Gecko/20100101 Thunderbird/52.8.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.27 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 06 Jul 2018 10:16:18 -0000 On 07/06/18 11:14, Johannes Lundberg wrote: > On Fri, Jul 6, 2018 at 9:49 AM Konstantin Belousov > wrote: > >> On Fri, Jul 06, 2018 at 09:52:24AM +0200, Niclas Zeising wrote: >>> On 07/06/18 00:02, Warner Losh wrote: >>>> >>>> >>>> On Thu, Jul 5, 2018 at 1:44 PM, John Baldwin >>> > wrote: >>>> >>>> On 7/5/18 12:36 PM, Konstantin Belousov wrote: >>>> > On Thu, Jul 05, 2018 at 09:12:24PM +0200, Hans Petter Selasky >> wrote: >>>> >> On 07/05/18 20:59, Hans Petter Selasky wrote: >>>> >>> On 07/05/18 19:48, Pete Wright wrote: >>>> >>>> >>>> >>>> >>>> >>>> On 07/05/2018 10:10, John Baldwin wrote: >>>> >>>>> On 7/3/18 5:10 PM, Pete Wright wrote: >>>> >>>>>> >>>> >>>>>> On 07/03/2018 15:56, John Baldwin wrote: >>>> >>>>>>> On 7/3/18 3:34 PM, Pete Wright wrote: >>>> >>>>>>>> On 07/03/2018 15:29, John Baldwin wrote: >>>> >>>>>>>>> That seems like kgdb is looking at the wrong CPU. Can >>>> you use >>>> >>>>>>>>> 'info threads' and look for threads not stopped in >>>> 'sched_switch' >>>> >>>>>>>>> and get their backtraces? You could also just do >> 'thread >>>> apply >>>> >>>>>>>>> all bt' and put that file at a URL if that is easiest. >>>> >>>>>>>>> >>>> >>>>>>>> sure thing John - here's a gist of "thread apply all bt" >>>> >>>>>>>> >>>> >>>>>>>> >>>> https://gist.github.com/gem-pete/d8d7ab220dc8781f0827f965f09d43ed >>>> >> >>>> >>>>>>> That doesn't look right at all. Are you sure the kernel >>>> matches the >>>> >>>>>>> vmcore? Also, which kgdb version are you using? >>>> >>>>>>> >>>> >>>>>> yea i agree that doesn't look right at all. here is my >> setup: >>>> >>>>>> >>>> >>>>>> $ which kgdb >>>> >>>>>> /usr/bin/kgdb >>>> >>>>>> $ kgdb >>>> >>>>>> GNU gdb 6.1.1 [FreeBSD] >>>> >>>>>> $ ls -lh /var/crash/vmcore.1 >>>> >>>>>> -rw------- 1 root wheel 1.6G Jul 3 15:03 >>>> /var/crash/vmcore.1 >>>> >>>>>> $ ls -l /usr/lib/debug/boot/kernel/kernel.debug >>>> >>>>>> -r-xr-xr-x 1 root wheel 87840496 Jul 3 13:54 >>>> >>>>>> /usr/lib/debug/boot/kernel/kernel.debug >>>> >>>>>> >>>> >>>>>> and i invoke kgdb like so: >>>> >>>>>> $ sudo kgdb /usr/lib/debug/boot/kernel/kernel.debug >>>> /var/crash/vmcore.1 >>>> >>>>>> >>>> >>>>>> here's a gist of my full gdb session: >>>> >>>>>> http://termbin.com/krsn >>>> >>>>>> >>>> >>>>>> dunno - maybe i have a bad core dump? regardless, more >> than >>>> happy to >>>> >>>>>> help so let me know if i should try anything else or >> patches >>>> etc.. >>>> >>>>> Can you try installing gdb from ports and using >>>> /usr/local/bin/kgdb? >>>> >>>>> >>>> >>>> >>>> >>>> that seems to have done the trick, at least the output looks >> more >>>> >>>> encouraging. >>>> >>>> >>>> >>>> --- trap 0, rip = 0, rsp = 0, rbp = 0 --- >>>> >>>> KDB: enter: panic >>>> >>>> >>>> >>>> __curthread () at ./machine/pcpu.h:231 >>>> >>>> 231 __asm("movq %%gs:%1,%0" : "=r" (td) >>>> >>>> >>>> >>>> >>>> >>>> here's my full kgdb session: >>>> >>>> http://termbin.com/qa4f >>>> >>>> >>>> >>>> i don't see any threads not in "sched_switch" though :( >>>> >>> >>>> >>> Hi, >>>> >>> >>>> >>> The problem may be that the patch to enable atomic inlining >> of all >>>> >>> macros forgot to set the SMP keyword which means SMP is not >>>> defined at >>>> >>> all for KLD's so all non-kernel atomic usage is with MPLOCKED >>>> empty! >>>> > Problem is that out-of-tree modules build does not have opt*.h >> files >>>> > from the kernel. UP config is a valid one, flipping some >> option's >>>> > default value does not solve the problem. >>>> >>>> Yes, but using the lock prefix in a generic module is ok (it will >> still >>>> work, just not quite as fast) whereas the lack of lock is fatal on >>>> SMP. I would amend Hans' patch slightly to honor the opt_* setting >>>> for KLD_TIED (but that is only true if KLD_TIED means "built as >> part of >>>> a kernel build, so has valid opt_foo.h headers" and not >>>> 'a standalone module where someone put MODULES_TIED=1 on the >> command >>>> line >>>> to make'). >>>> >>>> >>>> I agree with this default. It's sensible to default to (a) the most >>>> popular thing and (b) thing that always works, especially when (a) and >>>> (b) are identical. >>>> >>>> Don't make me start the "Do we really need an SMP option, why not make >>>> it always on" thread :) The number of relevant uniprocessor x86 boxes >>>> that benefit from omitting SMP is so small as to be irrelevant, IMHO. >> A >>>> MP kernel runs just fine on them... >>>> >>>> Warner >>> >>> Where are we on this? >>> It is important to get it fixed, it's already been 4 days, which means 4 >>> days of all modern FreeBSD desktop systems being broken, and possibly >>> other systems with kernel modules from ports as well. >>> >>> >>> Another question, how hard would it be to expose how the kernel was >>> built to modules built from ports, so that they can figure out stuff >>> like SMP and others, that might affect the module build? >> Point the KERNBUILDDIR variable to the directory of the kernel build. >> This is the directory where *.o and opt*.h are located. Then everything >> would just work. >> > > Is the solution that we require everyone to build a kernel before they can > build the standalone modules or am I missing something here? > Hi, Here is a temporary fix: https://svnweb.freebsd.org/changeset/base/336025 Like Konstantin says this issue needs to be revisited. --HPS