Date: Fri, 6 Jul 2018 11:47:30 +0300 From: Konstantin Belousov <kostikbel@gmail.com> To: Niclas Zeising <zeising+freebsd@daemonic.se> Cc: Warner Losh <imp@bsdimp.com>, John Baldwin <jhb@freebsd.org>, Hans Petter Selasky <hps@selasky.org>, Pete Wright <pete@nomadlogic.org>, "O. Hartmann" <ohartmann@walstatt.org>, FreeBSD Current <freebsd-current@freebsd.org> Subject: Re: atomic changes break drm-next-kmod? Message-ID: <20180706084729.GN5562@kib.kiev.ua> In-Reply-To: <4797c607-c261-77f7-eccf-45056bf56694@daemonic.se> References: <4c5411dd-9f6b-7245-6ade-e11040f74687@FreeBSD.org> <24f5d737-a205-6fcc-0a33-a84601d2ff7a@nomadlogic.org> <c459a76c-21a2-2510-54b1-d7edee6eaa1e@FreeBSD.org> <eb84c2ed-1cd8-794f-9d5e-9454edeba4e4@nomadlogic.org> <29ce4eab-6667-d2ca-b5d8-3deeef28f142@selasky.org> <df73594c-785a-663d-6c76-bf95466a7aa3@selasky.org> <20180705193646.GM5562@kib.kiev.ua> <5dc2a315-4b71-9ff0-0a37-576649e9144b@FreeBSD.org> <CANCZdfqGyANQ5uUz_Ebc3i5HDLvkWocDs=J2p5xuj=1OttGWYQ@mail.gmail.com> <4797c607-c261-77f7-eccf-45056bf56694@daemonic.se>
next in thread | previous in thread | raw e-mail | index | archive | help
On Fri, Jul 06, 2018 at 09:52:24AM +0200, Niclas Zeising wrote: > On 07/06/18 00:02, Warner Losh wrote: > > > > > > On Thu, Jul 5, 2018 at 1:44 PM, John Baldwin <jhb@freebsd.org > > <mailto:jhb@freebsd.org>> wrote: > > > > On 7/5/18 12:36 PM, Konstantin Belousov wrote: > > > On Thu, Jul 05, 2018 at 09:12:24PM +0200, Hans Petter Selasky wrote: > > >> On 07/05/18 20:59, Hans Petter Selasky wrote: > > >>> On 07/05/18 19:48, Pete Wright wrote: > > >>>> > > >>>> > > >>>> On 07/05/2018 10:10, John Baldwin wrote: > > >>>>> On 7/3/18 5:10 PM, Pete Wright wrote: > > >>>>>> > > >>>>>> On 07/03/2018 15:56, John Baldwin wrote: > > >>>>>>> On 7/3/18 3:34 PM, Pete Wright wrote: > > >>>>>>>> On 07/03/2018 15:29, John Baldwin wrote: > > >>>>>>>>> That seems like kgdb is looking at the wrong CPU.š Can > > you use > > >>>>>>>>> 'info threads' and look for threads not stopped in > > 'sched_switch' > > >>>>>>>>> and get their backtraces?š You could also just do 'thread > > apply > > >>>>>>>>> all bt' and put that file at a URL if that is easiest. > > >>>>>>>>> > > >>>>>>>> sure thing John - here's a gist of "thread apply all bt" > > >>>>>>>> > > >>>>>>>> > > https://gist.github.com/gem-pete/d8d7ab220dc8781f0827f965f09d43ed > > <https://gist.github.com/gem-pete/d8d7ab220dc8781f0827f965f09d43ed> > > >>>>>>> That doesn't look right at all.š Are you sure the kernel > > matches the > > >>>>>>> vmcore?š Also, which kgdb version are you using? > > >>>>>>> > > >>>>>> yea i agree that doesn't look right at all.š here is my setup: > > >>>>>> > > >>>>>> $ which kgdb > > >>>>>> /usr/bin/kgdb > > >>>>>> $ kgdb > > >>>>>> GNU gdb 6.1.1 [FreeBSD] > > >>>>>> $ ls -lh /var/crash/vmcore.1 > > >>>>>> -rw-------š 1 rootš wheelšš 1.6G Julš 3 15:03 > > /var/crash/vmcore.1 > > >>>>>> $ ls -l /usr/lib/debug/boot/kernel/kernel.debug > > >>>>>> -r-xr-xr-xš 1 rootš wheelš 87840496 Julš 3 13:54 > > >>>>>> /usr/lib/debug/boot/kernel/kernel.debug > > >>>>>> > > >>>>>> and i invoke kgdb like so: > > >>>>>> $ sudo kgdb /usr/lib/debug/boot/kernel/kernel.debug > > /var/crash/vmcore.1 > > >>>>>> > > >>>>>> here's a gist of my full gdb session: > > >>>>>> http://termbin.com/krsn > > >>>>>> > > >>>>>> dunno - maybe i have a bad core dump?š regardless, more than > > happy to > > >>>>>> help so let me know if i should try anything else or patches > > etc.. > > >>>>> Can you try installing gdb from ports and using > > /usr/local/bin/kgdb? > > >>>>> > > >>>> > > >>>> that seems to have done the trick, at least the output looks more > > >>>> encouraging. > > >>>> > > >>>> šš--- trap 0, rip = 0, rsp = 0, rbp = 0 --- > > >>>> KDB: enter: panic > > >>>> > > >>>> __curthread () at ./machine/pcpu.h:231 > > >>>> 231ššš ššš __asm("movq %%gs:%1,%0" : "=r" (td) > > >>>> > > >>>> > > >>>> here's my full kgdb session: > > >>>> http://termbin.com/qa4f > > >>>> > > >>>> i don't see any threads not in "sched_switch" though :( > > >>> > > >>> Hi, > > >>> > > >>> The problem may be that the patch to enable atomic inlining of all > > >>> macros forgot to set the SMP keyword which means SMP is not > > defined at > > >>> all for KLD's so all non-kernel atomic usage is with MPLOCKED > > empty! > > > Problem is that out-of-tree modules build does not have opt*.h files > > > from the kernel.š UP config is a valid one, flipping some option's > > > default value does not solve the problem. > > > > Yes, but using the lock prefix in a generic module is ok (it will still > > work, just not quite as fast) whereas the lack of lock is fatal on > > SMP.š I would amend Hans' patch slightly to honor the opt_* setting > > for KLD_TIED (but that is only true if KLD_TIED means "built as part of > > a kernel build, so has valid opt_foo.h headers" and not > > 'a standalone module where someone put MODULES_TIED=1 on the command > > line > > to make'). > > > > > > I agree with this default. It's sensible to default to (a) the most > > popular thing and (b) thing that always works, especially when (a) and > > (b) are identical. > > > > Don't make me start the "Do we really need an SMP option, why not make > > it always on" thread :) The number of relevant uniprocessor x86 boxes > > that benefit from omitting SMP is so small as to be irrelevant, IMHO. A > > MP kernel runs just fine on them... > > > > Warner > > Where are we on this? > It is important to get it fixed, it's already been 4 days, which means 4 > days of all modern FreeBSD desktop systems being broken, and possibly > other systems with kernel modules from ports as well. > > > Another question, how hard would it be to expose how the kernel was > built to modules built from ports, so that they can figure out stuff > like SMP and others, that might affect the module build? Point the KERNBUILDDIR variable to the directory of the kernel build. This is the directory where *.o and opt*.h are located. Then everything would just work.
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20180706084729.GN5562>