From owner-freebsd-current@freebsd.org Fri Jul 6 01:32:35 2018 Return-Path: Delivered-To: freebsd-current@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 2F9D9102616A for ; Fri, 6 Jul 2018 01:32:35 +0000 (UTC) (envelope-from pete@nomadlogic.org) Received: from vps-mail.nomadlogic.org (mail.nomadlogic.org [IPv6:2607:f2f8:a098::2]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 9FA91817F7; Fri, 6 Jul 2018 01:32:34 +0000 (UTC) (envelope-from pete@nomadlogic.org) Received: from [192.168.1.106] (cpe-23-243-162-239.socal.res.rr.com [23.243.162.239]) by vps-mail.nomadlogic.org (OpenSMTPD) with ESMTPSA id 44fcffee TLS version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NO; Thu, 5 Jul 2018 18:32:32 -0700 (PDT) Subject: Re: atomic changes break drm-next-kmod? To: Hans Petter Selasky , John Baldwin , Niclas Zeising , "O. Hartmann" , FreeBSD Current References: <20180703170223.266dbf5b@thor.intern.walstatt.dynvpn.de> <845aca10-8c01-fa3b-087f-f957df4e7531@nomadlogic.org> <063ae5c3-0584-1284-dd9d-ab8b5790baf1@FreeBSD.org> <0bf8e57b-fdb4-4c1a-3d0d-a734f8187ca8@nomadlogic.org> <4c5411dd-9f6b-7245-6ade-e11040f74687@FreeBSD.org> <24f5d737-a205-6fcc-0a33-a84601d2ff7a@nomadlogic.org> <29ce4eab-6667-d2ca-b5d8-3deeef28f142@selasky.org> From: Pete Wright Message-ID: <4aee6f32-a3b9-7e8e-8741-2309639ecce0@nomadlogic.org> Date: Thu, 5 Jul 2018 18:32:27 -0700 User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:52.0) Gecko/20100101 Thunderbird/52.8.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit Content-Language: en-US X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.27 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 06 Jul 2018 01:32:35 -0000 On 07/05/2018 12:12, Hans Petter Selasky wrote: > On 07/05/18 20:59, Hans Petter Selasky wrote: >> On 07/05/18 19:48, Pete Wright wrote: >>> >>> >>> On 07/05/2018 10:10, John Baldwin wrote: >>>> On 7/3/18 5:10 PM, Pete Wright wrote: >>>>> >>>>> On 07/03/2018 15:56, John Baldwin wrote: >>>>>> On 7/3/18 3:34 PM, Pete Wright wrote: >>>>>>> On 07/03/2018 15:29, John Baldwin wrote: >>>>>>>> That seems like kgdb is looking at the wrong CPU.  Can you use >>>>>>>> 'info threads' and look for threads not stopped in 'sched_switch' >>>>>>>> and get their backtraces?  You could also just do 'thread apply >>>>>>>> all bt' and put that file at a URL if that is easiest. >>>>>>>> >>>>>>> sure thing John - here's a gist of "thread apply all bt" >>>>>>> >>>>>>> https://gist.github.com/gem-pete/d8d7ab220dc8781f0827f965f09d43ed >>>>>> That doesn't look right at all.  Are you sure the kernel matches the >>>>>> vmcore?  Also, which kgdb version are you using? >>>>>> >>>>> yea i agree that doesn't look right at all.  here is my setup: >>>>> >>>>> $ which kgdb >>>>> /usr/bin/kgdb >>>>> $ kgdb >>>>> GNU gdb 6.1.1 [FreeBSD] >>>>> $ ls -lh /var/crash/vmcore.1 >>>>> -rw-------  1 root  wheel   1.6G Jul  3 15:03 /var/crash/vmcore.1 >>>>> $ ls -l /usr/lib/debug/boot/kernel/kernel.debug >>>>> -r-xr-xr-x  1 root  wheel  87840496 Jul  3 13:54 >>>>> /usr/lib/debug/boot/kernel/kernel.debug >>>>> >>>>> and i invoke kgdb like so: >>>>> $ sudo kgdb /usr/lib/debug/boot/kernel/kernel.debug >>>>> /var/crash/vmcore.1 >>>>> >>>>> here's a gist of my full gdb session: >>>>> http://termbin.com/krsn >>>>> >>>>> dunno - maybe i have a bad core dump?  regardless, more than happy to >>>>> help so let me know if i should try anything else or patches etc.. >>>> Can you try installing gdb from ports and using /usr/local/bin/kgdb? >>>> >>> >>> that seems to have done the trick, at least the output looks more >>> encouraging. >>> >>>   --- trap 0, rip = 0, rsp = 0, rbp = 0 --- >>> KDB: enter: panic >>> >>> __curthread () at ./machine/pcpu.h:231 >>> 231        __asm("movq %%gs:%1,%0" : "=r" (td) >>> >>> >>> here's my full kgdb session: >>> http://termbin.com/qa4f >>> >>> i don't see any threads not in "sched_switch" though :( >> >> Hi, >> >> The problem may be that the patch to enable atomic inlining of all >> macros forgot to set the SMP keyword which means SMP is not defined >> at all for KLD's so all non-kernel atomic usage is with MPLOCKED empty! >> >> /* >>   * For userland, always use lock prefixes so that the binaries will run >>   * on both SMP and !SMP systems. >>   */ >> #if defined(SMP) || !defined(_KERNEL) >> #define MPLOCKED        "lock ; " >> #else >> #define MPLOCKED >> #endif >> >> Can you try to recompile the LinuxKPI /sys/modules/linuxkpi with >> DEBUG_FLAGS="-DSMP" ? >> >> and similarly the drm-next package? >> > > Also please find attached a patch for amd64. i have been running this patch for about 4hours.  previous uptime before this patch was under 1hr.  i've attached and detached HDMI displays and gone through several suspend/resume cycles as well without any issues. to be clear - since i'm not sure this is was your intent - i applied the patch, rebuilt/installed a new kernel.  i did *not* use the "-DSMP" flags for linuxkpi or the drm-next module. cheers, -pete -- Pete Wright pete@nomadlogic.org @nomadlogicLA